USS Clueless Stardate 20011001.0603

USS Clueless

Voyages of a restless mind

Main:
normal
long
no graphics

Contact
Log archives
Best log entries
Other articles

Site Search

Stardate 20011001.0603 (On Screen): Twenty years ago I was involved in the design of logic analyzers. This was, at the time, a relatively new kind of test-and-measurement equipment, and we were just beginning to figure out what they were and what they needed to be. For a while we concentrated on trying to capture as much as we could, and then we realized something rather profound: the value of a piece of test equipment is measured not by how much it can capture, but by how much it can exclude. For example, our logic analyzer was capable of capturing data at a rate of a hundred megasamples per second (slow by modern standards but blazingly fast at the time) and had a memory which was 512 samples deep. Somewhere out there, in the next few minutes, some event is going to take place, and our user wants to see the details of it. At full sample rate, we can capture a 5 millisecond window. If we capture the wrong 5 milliseconds, then we have failed our user. Even if we do get the right one, the user doesn't really want to plow through that much data; he just wants to see the particular event. In that 512 samples, there may only be five or six which are really critical. After that realization, we stopped concentrating so heavily on faster capture and deeper memory, and paid more attention to smart triggering, programmable storage control and data post processing. The user didn't want us to drown him in data, he wanted us to show him the precise thing he was interested in and nothing else.

I had a similar experience at Qualcomm. There was a very obscure window-of-vulnerability bug which ahd manifested intermittently causing the phone to lock up. It took a few weeks for it to annoy someone enough to make them start to hunt for it, a few days for us to figure out a suitable trigger which permitted an emulator to capture the critical events, about three hours for me to wade through the resulting execution trace to diagnose the problem -- and five minutes to fix it. This is far more common than you might think. Diagnosing problems is nearly always far more difficult than fixing them. The difficulty is that there's just too damned much going on, and nearly all of it is irrelevant.

That has only gotten worse. In nearly every aspect of the digital world now, the quantity of data moving around has gotten immense. No-one can process it all; they need assistance to find the pieces of it which are important. This is a difficult problem; it may have no permanent solution. Filtering and organizing data will continue to get more complex as the data itself expands and the filtration criteria get more sophisticated.

Heinlein once observed that library science may well be the most important one of them all, and I think he may have been on to something: it does no good for the information you need to have been preserved if you can't locate it amidst all the dross. It may as well not exist. So it isn't too surprising to learn that the US intelligence community may have collected enough clues to have sniffed out the WTC attack plot ahead of time, but didn't put them together. On retrospect, knowing what to look for, it might be easy to see what pieces were relevant. But doing that going forward is a non-trivial problem. What good is Echelon (or Carnivore) if you don't know what to do with the data you've collected? The signal-to-noise ratio is microscopic: a full implementation of Carnivore might well flood US intelligence agencies with a terabyte per second, nearly all of which will be completely useless. After all, how many copies of the Pam-and-Tommy-Lee video would they really need? Capturing the data is the easy part: what's hard will be throwing away most of it, while not throwing away anything which is valuable. (discuss)