USS Clueless - It's OK to be wrong

Stardate 20021016.1616

(On Screen): As a general rule, new hire engineers in software engineering tend to be nearly useless for the first 3-6 months after they come on board. Part of the reason why is that they have to go through a rather abrupt process of unlearning a lot of things they'd been taught at the university where they studied.

It's not so much that the subject matter they studied was wrong, as that the university environment teaches habits and procedures which are diametrically opposite to those software engineers in embedded software actually use. Being a student teaches them virtually every possible lesson in how to be a failed engineer, and unless such students have been through an internship, they spend the first three months or more of their employment frustrated and confused and totally lost.

There are a lot of things wrong with how they did projects in school, but by far the most important was that they were graded in terms of the quality of their result (i.e. their assignments), and the more problems that the grading grad student found in their project, the worse grade they got. The university experience rewarded them for concealing problems.

In the industry, the exact opposite is the case. The most fundamental rule in engineering, even more basic than Murphy's Law, is: Everyone fucks up.

Everyone makes mistakes. It's a fact of life. It isn't a cause for shame, it's just reality. Just as engineers are in the business of producing successful designs which can be fabricated out of less-than-ideal components, the engineering process is designed to produce successful designs out of a team made up of engineers every one of which screws up routinely. The point of the process is not to prevent errors (because that's impossible) but rather to try to detect them and correct them as early as possible.

There's nothing wrong with making a mistake. It's not that you want to be sloppy; everyone should try to do a good job, but we don't flog people for making mistakes.

What's wrong is not detecting the mistake until after you ship.

I'm quite the oldtimer in software engineering. I started working professionally in 1976 and I was part of the generation of programmers who collectively developed processes which helped to convert software into a reasonable engineering discipline. (It was a standing joke in the industry at the time that "Computer Science" would more properly be described as "Computer Craft", because there was precious little scientific about it.) Before that point, software was a major product business but it wasn't treated with the same kind of rigor and attention to quality as other kinds of engineering. Most commercial software prior to 1975 ran on medium sized or large mainframes. The hardware of such computers was expensive to fix once shipped, so the development engineers took a great deal of care to make sure that they got it right. However, software for those mainframes wasn't developed to the same schedule, and it was routine in the industry to send updates to the software to customers on an ongoing basis. Since software could so easily be updated in the field, this permitted a certain amount of sloppiness.

It was the development of the microprocessor and the ROM which inspired the change. With cheap small computers and software burned into hardware, computer software could be used as part of larger products, and fixing software bugs was just as expensive as fixing any other kind of product flaw. To a far greater extent than any other programmers, those working on embedded software were compelled to develop procedures which would result in far greater quality than the industry norm, because the consequences of failure were far higher.

This first began to happen at some companies who sold extremely expensive products at very high price with a reputation for very high quality, such as Hewlett Packard or Tektronix. I worked at Tek, and Tek had routinely been spending quite large amounts of money on testing and product validation during the engineering cycle. It was natural that many of those procedures already developed for electronics and mechanical designs should be adapted to software, and they were. From the very first at Tek we were assigned independent test engineers at a ratio of about one for every five designers, for instance, and shipment required their approval. The test engineers worked for the manufacturing manager and were politically insulated from the engineering manager. These things did help, but not enough.

It became clear very early that software as an engineering discipline had unique characteristics and that it couldn't be treated as just another form of EE. It was clear that the design methodology itself had to change.

In the kind of electronics being developed there at the time, it was usually pretty easy to modularize the system and define straightforward interfaces, and so you would assign one engineer to each module who would largely work alone, though they did do occasional large scale design reviews. When their designs were integrated after the first prototype build, then if they didn't work together you could test what happened at the interface to see who wasn't in compliance. For instance, on one project I participated in (the 7D02 logic analyzer) the physical implementation was a series of cards plugged into a backplane, and the interface was