USS Clueless - Open Source, application-specific knowledge

Stardate 20020730.0546

(On Screen): Eric Raymond has responded to my long post from Sunday about the issue of attempting to apply Open Source methodology to things like vertical apps and embedded software. Much was covered, and I'm going to want to talk about the rest of it later, but I wanted to concentrate now on one specific issue, because I think it wasn't understood.

In my original article, one of the main problems I brought up with respect to OSS in the embedded world (and in other places) was learning curve. My point was that before a contributor could be valuable, there was a non-trivial amount of knowledge he had to gain, and no easy way to acquire it without overloading the core team with training duties to the detriment of their ability to continue working on the project. In other words, there didn't seem to be any way to avoid Brook's Law.

Eric writes:

He's correct when he says that most contributors are self-selected and self-motivated. He overestimates the cost of training newbies, though. They self-train; normally, the first time a core developer hears from a newbie is typically when the newbie sends a patch -- self-evidence that the newbie has already acquired a critical level of knowledge about the software. The "sink or swim" method turns out to work, and work well.

I'm afraid it's not that simple.

There are three steps in acquiring sufficient knowledge about a piece of OSS to be able to diagnose and correct bugs. First, the person must acquire core skill in the art. Second, they must have competent knowledge of the general problem that the program is trying to solve. Third, they must understand specifically what the program actually does to try to solve that problem.

Then they will try to figure out why it is failing to do so, and how to fix it.

Taking operating systems (Linux) as the obvious example of an OSS effort, the first level represents competence in general programming skills. This means understanding the language(s) being used and being comfortable with them, along with a general knowledge of the normal tools and tricks of computer science (e.g. how linked lists work) and a general understanding of the process of writing and testing programs.

The second level requires understanding the general problem of operating systems, and the kinds of approaches various ones of them use.

The third level involves directly reading portions of the source of Linux itself in order to try to determine whether the code has problems.

Suppose that the Linux Powers That Be receive an email from newbie George, identifying a problem, describing why it happens, and recommending a fix. The reason that George can do this is because George already had both skill in the art and an understanding of how operating systems work, because both of those are part of a general education in Computer Science. George was therefore already prepared to dive into the Linux code; the only thing that was required was the motivation to do so.

But for other kinds of applications, that foundation of knowledge may not be generally available. However, a fish is not aware of water, and Eric and other OSS advocates may not realize just how much knowledge of that kind they already bring to the table. Most of them will fully understand the following description of an operating system activity, for example:

The application has generated a page fault. It is trying to use part of its own memory space which is currently swapped out, so we need to suspend the task, locate and swap out a relatively unused piece of memory belonging to it or some other task, bring in the memory it needs, reprogram the MMU and then restore the task and let it try again.

The reason he'll understand that is because he already understands how virtual memory works, a form of level-2 knowledge in Operating System theory. Therefore, if he suspected that the system had, for example, some sort of problem in how it decided what to swap out which was leading to excessive thrashing, he'd know where to start looking. (Even understanding what I mean by "thrashing" indicates that the reader has a substantial degree of level-2 knowledge.)

But consider a comparable description about something that a CDMA cell phone has to accomplish:

On three successive slot cycles, the Viterbi Decoder has reported an unacceptably high BER on the paging channel, suggesting a signal fade. A preliminary search at the PN Offsets recommended by the cell has not discovered a suitable candidate for idle handoff, so it will be necessary to remain awake during the next slot cycle so we can initiate a general search of the PN phase space.

If you understand CDMA technology, that will make perfect sense to you. But if you don't, it will be complete gibberish. And in that case you don't have a hope in hell of determining whether the code works even if you had it in front of you.

You can't determine whether something works properly if you don't understand what it is trying to do.

That is the knowledge to which I referred which is not easily picked up. In the case of existing OSS projects, the problem that the applications are trying to solve are widely understood. People like George (and Eric and me) already know how operating systems wor