Stardate
20030710.1710 (On Screen): JGW relates an elaborate tale sent him by a reader named "Steve" (not me) which tries to claim that the word shit is actually an acronym from the early industrial era.
I'm afraid not. It's actually derived from an old Anglo-Saxon word. It comes to us via Middle English scitte. It's cognate to the German verb scheissen ("to defecate"). It's probably also related to the Danish word shitjen.
In modern English when we talk about certain eternal and somewhat rude subjects, there are short words which we think of as being rather vulgar or base, and longer words for the same things which are generally thought of as being more cultured, more acceptable in polite company. It turns out that the short vulgar words almost all come to us from Anglo-Saxon words derived from old German, while the polite words derive from Latin via Norman. This doubled vocabulary happened during the first couple hundred years after the Norman conquest of England, during which period the nobles who ruled England still primarily spoke Norman (a Romance language related to French, from Normandy) while the commoners primarily spoke old English, which was a Germanic language based on the language the Saxons spoke when they had invaded England.
The modern language most similar to that original Saxon language is Friesian, which is spoken in parts of the Netherlands and northern Germany, and is generally thought of as being the closest relative to English. As one of my friends once told me: "Good beer and good cheese" is good English and good Fries. ("Friesian" is pronounced FREE-zhun. "Fries" rhymes with "cheese".)
During that period of a couple of centuries, there was a merger of the Norman spoken by the nobles and the language spoken by the commoners, but the Germanic language largely dominated. Probably that's due to the fact that the nobles had to learn how to speak to their peasants using the language the peasants already were using, since the peasants pretty much didn't have the time or opportunity to learn Norman.
Out of this emerged Middle English. The words that the commoners (i.e. the Saxons) used were thought of as being vulgar and rude; after all, they were peasants and scum. The equivalent words used by the nobles (i.e. the Normans) were considered highbrow.
Middle English has large amounts of vocabulary from both Norman and Saxon, but retains a deep structure which is primarily Germanic, and English is considered to be a member of the Germanic group (which includes the Scandanavian languages, as well as Dutch, German Afrikaans, Friesian and Yiddish). That's clearly demonstrated by the fact that English, like all Germanic languages, has three genders ("he/she/it" in English, "er/sie/es" in German) The Romance languages – those derived from Latin, spoken by the Romans – only have two genders (he, she, "il/elle" in French, "él/ella" in Spanish).
Tracking this kind of thing can be extremely complicated, because the old Germanic root language 1700 years ago borrowed words from Latin when in contact with the Roman Empire (doing a lot of trading with them, as well as fighting the odd war). That's why the English and German and French words for the number 1 are all cognate (one/ein/un) but the words for number 4 are not the same, with English four being cognate to German vier but different from the French quatre. (And sometimes the process of figuring out where a given word came from can be extraordinarily complex.)
Actually, having its vocabulary built out of two largely different languages gave English a particularly rich set of words to begin with, and English has been borrowing words from other languages ever since. That's why English speakers generally use a larger number of words in common speech than speakers of any other language. And it's this truly immense vocabulary which is the biggest challenge facing people who attempt to learn English. Grammar and verb conjugation isn't actually all that hard because English has simplified a lot of those things. (That was another result of that merger; a lot of crap was dropped, such as gender articles. In some languages there are dozens of different words all of which translate to "the" in English but which are used in different contexts with different words. Equally, in German endings of adjectives are conjugated according to the gender of the noun to which they apply; English dropped that, too.)
It's particularly clear in cases where there are what we think of as polite and vulgar synonyms for words referring to rude subjects that we still see that linguistic heritage of the Norman conquest of England. The polite word for shit is feces, which derives from Latin faeces, plural of faex (“sediment, dregs”).
And now for a rant: Usually when English speakers borrow a word, we discard the linguistic categories associated with it. For example, sometimes we produce a plural for a word according to the original language, but usually we form a regularized English plural (which is to say, append an "s" sound). That's why viruses is the usual plural for virus, and why you don't run into virii very often. Likewise, when the word is a noun, we almost always discard the original language's gender choice for the word, using the pronouns "he" or "she" if it refers to a living being which actually has a sex, and as "it" if not. That's why we refer to an omelette as "it", and not as "he" or "she" (whichever gender the word takes in French).
Indeed, given the generally easy way in which verbs can be converted into nouns and adjectives and vice versa, we often don't even retain the original part of speech of the word. Some borrowed words may have been verbs originally but have become nouns in English, or vice versa. I believe that blitz is an adjective in German, but as a borrowed word in English it's usually a noun or a verb (though it's also used as an adjective in some cases e.g. Blitz Chess.) So the word may be borrowed, but once it's ours we do with it what we want.
And so, the rant: That's the reason why I contend that the word data is a collective singular and not a plural. I'm aware that in Latin data is the plural of datum, but I don't consider that important because I'm not speaking or writing in Latin. In English common usage (except in some anal-retentive publications) it's a collective singular referring to a fluid quantity, similar in usage to fire, water, and money.
We refer to collective singular words using the word much rather than the word many. We think of them as an amorphous quantity, not as a count. We refer to things as manies, in plural, if it's reasonable to be able to count them and assign a cardinal number to their quantity. "Many" is a placeholder; it indicates that we could have provided an exact count but didn't bother, and that the number would have been useful. When generating such a count makes no sense because we can't identify individual items (how do you count water or fire?) we refer to such things as muchnesses and as a singular.
There are many cattle, and many trees but there is much fire, and much water and much money. The only plausible singular water is a single molecule; any quantity of water including two or more molecules would be plural, and the number of molecules in a glass of water is gargantuan. (18 grams of water, about a teaspoon full, exceeds 1023 molecules). Counting money is far more straightforward, but we still refer to it as a quantity, not as a count. Part of the reason for that is that the same quantity of money can be represented as a small number of large bills, or a huge number of small-value coins; the count as such is unimportant and pretty useless. Does it matter whether the $100 in my wallet is one $100 bill or five $20 bills? Not really, most of the time.
Things which are muchnesses are singular irrespective of quantity; things which are manies are plurals. Plurals refer to things we know how to count in a fashion which is reasonable and useful and unambiguous. If that isn't possible, we refer to them collectively as a singular.
In fact, we do use moneys, fires and waters. But we use them to refer to separated instances. "Two fires" are physically separated from one another. If they merge, it becomes one fire (even if the merger continues to grow and dwarfs the prior two fires).
I contend that there is no reasonable way to count data, no reasonable and useful way to assign a cardinal number to the quantity present. Doing so is just as useless as counting water. The only plausible datum is a single bit, and that's as useless an observation as noticing that the only singular water is a single molecule. It's all the more useless since data often isn't represented as bits, or if it is the digital representation is unimportant and arbitrary.
One kind of data is photographs. Does the physical file size of a JPG or TIFF file actually indicate anything about how much data that photograph includes? Is there more data in a picture just because its file is larger? Is the exact same photograph more meaningful if it's represented as a huge uncompressed BMP file rather than as a much more compact PNG? (Note that PNG is a lossless format; compressing the file into a PNG does not cause distortion of the image. The visual representation of the two images would be indistinguishable, but the PNG nearly always requires vastly less disk space.)
Of course not. The data in a photograph is embodied in what it shows us, not how large it is. A small photo can be extremely important; a big one can be completely unimportant. And the representation in bits of the photo is completely irrelevant as long as the representation doesn't excessively distort the subject of the photo so that we can no longer learn important things from it. And the same thing goes for all other kinds of data: the value of the information does not correlate to the size of the data, so any kind of count of the quantity is usually irrelevant.
I insist that there is much data. There's no way to count data; thus it is a collective singular, and it is wrong to say there are many data.
Whenever I run into sentences where the word data is treated as a plural and used with the word many, it always sets my teeth on edge (and I usually internally rewrite the sentence, reading it as a singular and not a plural). But a couple of months ago, something happened that set my teeth far more on edge.
James Taranto at the Wall Street Journal noticed a post here, and he requested permission to post it on the WSJ Online, which I eagerly granted. The original article was a bit rough (a peril of the blogging form; it isn't really worth the effort to polish every entry). It included some material which wasn't really needed that somewhat disturbed the flow, and I suggested a significant cut, which Taranto used. Taranto made some other minor editing changes, and posted it on May 19.
I'm very grateful that Taranto thought my post was worth wider coverage. And having my post appear on the WSJ Online site brought me one of the most memorable reader letters in the entire history of this site. The chief of staff of the 3rd Infantry Division saw the article on the WSJ and forwarded the URL to the officer commanding the signals battalion in the 3rd Division. He was motivated to follow the link from WSJ to my own site so he could locate my email address and write to me, thanking me.
My point in that article was that he and the other signals people are doing a job which is vital but also largely unglamorous and invisible. Signals is one of those jobs where the only time people really notice you is when you screw up, which can be a bit frustrating for those who are doing it; they get cursed at but don't really get praised. I wrote that post to try to put a spotlight on just how important they truly are and on just how good a job they've done. That signals group and all the other signals people involved in the recent war deserved the praise.
But it wasn't until after the article was online at the WSJ that I noticed that Taranto had changed my use of data from a singular to a plural. In my own article. Grumble, mutter...
Update: JGW has done something interesting with his Movable Type page: he's made it so that when comments are added to a post, they appear on the main page in a fixed-size scroll box. That's really nice.
JGW also has a really cute little girl who just turned 3. What a sweetie! I feel like reaching into the picture and giving her a really big hug.
|