USS Clueless - What is information?

Stardate 20040602.1521

(On Screen): Responding to my article about sources of bias in news reporting (and its followup here), Andrew Olmstead points out that the military has a similar problem. If all intelligence was dumped on senior commanders they'd drown in data, but when the data is filtered by subordinates there's always a degree of bias and distortion involved.

The fundamental problem of "drowning in data" is an interesting one. It's a consequence of developments in basic technologies relating to storage and transmission of information, because as those have advanced we have ended up with an embarrassment of riches. It's increasingly easy to know something important without realizing you do, because it is a very small needle lost in a very big haystack. Librarians were among the first to have to deal with this seriously, and the Dewey Decimal System is one of the great unheralded achievements of the modern era.

Sometimes there are several pieces of information each of which appears uninteresting but which taken together are critically important. It seems to be the case that various intelligence groups in the US government had enough hints collectively to have discovered and foiled the 9/11 hijacking plot, but did not do so because the significance of that information wasn't recognized, and the pieces weren't put together before the fact. There's been a lot of criticism of that by outsiders using 20:20 hindsight, but most of them don't realize just how tough a problem this is.

It comes up all kinds of places. I spent most of my career as an engineer designing tools for other engineers to use. I spent six years at Tektronix designing logic analyzers in the late 1970's and early 1980's. A logic analyzer (LA) is an instrument which permits an engineer to observe the behavior of digital circuits similar to the way an oscilloscope permits an engineer to observe the behavior of analog circuits. During that six years I participated in two complete product development cycles (the 7D02 and the 1240), and while working on the second one I came to the realization that the true power of test and measurement equipment (such as LAs) was not a function of how much useful data they could capture so much as it was a function of how much useless data they could exclude and discard. The ideal T&M instrument would capture and display exactly the data which was most interesting and useful to the engineer without displaying anything else. Like all ideals, this is unachievable, but the better the instrument is at this, the more useful it will be perceived to be. On the other hand, if the critical data you need is lost in a huge sea of irrelevance, then it's much less useful.

Claude Shannon rigorously examined the basic question, "What is information?" in the late 1940's while working at Bell Labs. He developed what we now call "Information Theory", and there may be no single theoretical work which is more important and less well known. For many electrical engineers and computer programmers it's central and vital, but few laymen have ever heard of Shannon and quite a lot of programmers don't know his name.

One of Shannon's fundamental insights was that transmission is not the same as information. He concentrated particularly on the fundamental properties of bit streams (he was the first to use the word "bit" to refer to binary digits) and concluded that information was a function of surprise or unpredictability. When someone receives a message encoded as string of bits, if based on the value of the bit stream up to a given point the receiver has no better than a 50:50 chance of predicting the next bit, then that bit contains maximal information. At the other extreme, if the receiver can predict the next bit unfailingly, then that bit contains no information at all.

That can be demonstrated by providing sequences of letters and asking the reader to guess the letter which comes next. Try these two:

I WENT TO THE STORE AND BOUGHT _

THE PRESIDENT OF THE UNITED STA_

The first one is very difficult to predict; the second one is extremely easy. The actual letter which will be received next in the first message will contain a lot of information; the next letter received for the second message will contain almost none.

Shannon developed a mathematically rigorous way of evaluating the amount of information present in a given bit stream, although it was actually all turned upside down from how I am describing it. What he analyzed and rigorously described was redundancy rather than information, and he developed a way of calculating what he called the "entropy" of a bitstream, which was a measure of the extent to which that bit stream carried less than the maximum amount of information possible for a bitstream of that length.

Shannon's work provides the theoretical basis for file compression algorithms. Compression algorithms convert long bit sequences which have a relatively high entropy into shorter bit sequences with much lower entropy in such a way that they can be interpreted later to reproduce the original higher entropy bit sequence. But there's a limit to compression which you reach when the compressed bit stream has en