USS Clueless - Polls and other lies

Stardate 20040128.1536

(Captain's log): Matt writes:

First of all, as usual, thank you for your fascinating articles! I fall on the political/historical interest side of your readers, and frequently find myself springboarding off your articles into deep research on the topic du jour.

I was looking up poll numbers for the Democratic primaries (for use in one of my articles) and it occurred to me that I use Zogby all the time, just out of habit. I wanted to know what sources you use for statistics, vague as they always are, to bolster your pursuit of truth? To be more specific, I'm interested in political statistics - like presidential popularity polls, or number of white democrats, or average salaries per state.

If your sources are ever-changing, or you just google-per-use, then nevermind - I just felt that I would be more correct more often if I had your sources to refer to.

I think you may be surprised by my answer. What I rely on for statistics is the results of such things as primaries.

I have a keen understanding of the extent to which unscrupulous pollsters can manipulate their results if they're trying to prove something. There are all kinds of ways in which that can be done. I've written about a few of those, but there are many more which I understand and the pollsters unquestionably understand, and since pollsters don't work for free, they try to do what their employers want. And it is rare for anyone willing to spend that kind of money to not have at least some sort of interest in the outcome, even if it is a nominally "non-partisan" organization, such as a major newspaper.

Some of them are amazingly subtle. There has been research done into some of this kind of thing by psychologists. One interesting experiment involved having a large number of people observe a film of some kind, and then being questioned about it later. But they didn't ask them all the same questions. To make up an example, if there was a boy in the film who was wearing a blue jacket, half of the people would be asked a question, "Did the boy wearing the green jacket do thus-and-so?" Green wasn't emphasized, and didn't have anything to do with what the question actually related to. And in fact, that initial questioning wasn't the point of the study.

The people who had watched the film were once again questioned a couple of weeks later, and among the other questions they were asked they were all asked what color the boy's jacket had been (using my synthetic example). What they found was that those people who had been asked the "green jacket" question were far more likely to remember the jacket as being green than the control group, even though the jacket had actually been blue.

That particular study was actually concentrating on a different question: to what extent does the story told by a witness to a crime change as a result of the way that witness is questioned by police? Can a witness be made to genuinely remember something that didn't happen by the questions they're asked about it? The answer was that they could be, but the result has broader implications.

That basic principle affects polling. It's not just that the way a question is phrased can alter the way that people answer it. You'll get different answers if you ask, "Was the US right to use military power to remove a murderous dictator from power in Iraq?" than if you ask, "Was the US right to preemptively conquer a sovereign nation which was not implicated in the planning or execution of the 9/11 attack?" And some pollsters are not above slanting their questions this badly.

But the questions which come before a given one in a poll will also affect how people answer it. The exact same question asked of an honest sample of people in exactly the same way can yield quite different results as a function of what other questions had previously been asked. An early question which includes a "green jacket" assumption can alter the responses to a later question about jacket color, for one thing, but that's not the only way.

This effects all such polls even if the pollster is not trying to manipulate the result. And the degree to which the subject and phrasing of earlier questions may affect the answers given to later ones is almost entirely unpredictable and impossible to easily analyze, even if the pollster is trying to be honest.

Of course, a lot of times they aren't. Less scrupulous pollsters can deliberately take advantage of this. At the beginning of this month I did write an article about the results of opinion polls in the US, inspired by release of a study by the American Enterprise Institute which collated the results of nearly every major poll they could find over the last year.

On page 28 of their report, they include a Newsweek poll from January of 2003 which included a question having to do with the broad issue of "unilateralism". The question went as follows:

Please tell me if you would support or oppose U.S. Military action against Iraq in each of the following circumstances. First, what if...?

The United States joined together with its major allies to attack Iraq, with the full support of the United Nations Security Council

The United States and one or two of its major allies attacked Iraq, without the support of the United Nations

The United States acted alone in attacking Iraq without the support of the United Nations

That kind of "piling weight on until the beam breaks" question is quite common in these polls, but it tends to bias the answer to the final question. If you were to ask someone directly, Would you support or oppose U.S. Military action against Iraq if the United States acted alone in attacking Iraq without the support of the United Nations?, generally you'd get a stronger result indicating support than if you creep up on it the way that the Newsweek poll did. The way that the question was phrased has a tendency to cause someone to feel less confident about it all by the time they get to the last question.

Which is why those who had been trying to collect poll results which would indicate that Americans wanted UN approval and "cooperation from European allies" (e.g. France) have tended to use this kind of creep-up-on-it question instead of asking that question cold.

When I wrote earlier this month about the AEI report, I noted that on some questions relating to long term support within the US of the occupation, the answers that different polls got were all over the map. One reason, which I cited at the time, was that the questions were phrased differently. But another reason was context.

When people were asked directly whether they supported the current policy, a strong majority supported it. On the other hand, when they were asked how things seemed to be going, the result was far less positive. And when the question was loaded, the result changed a great deal.

But when a question was multiple-choice, the result depended enormously on what those choices were. I'm not trying to pick on Newsweek specifically here, but they provide another beautiful example of this, reported on page 49 of the AEI report.

In April and May, they asked:

For how long after the fighting stops would you support keeping U.S. military personnel in Iraq to help maintain order and establish a new government there...no more than a week or so, several weeks, several months, one to two years, three to five years, or more than five years?

In July, they asked a related but subtly different question:

For how long after the fighting stops would you support keeping U.S. military personnel in Iraq to help establish security and rebuild the country...less than one year, one to two years, three to five years, six to ten years, or more than ten years?

One of the interesting things which has been discovered about polling is that it is a social encounter for the person being questioned, and we all modify our behavior and our expression of attitudes to some extent because we want social approval. The simple fact is that some people lie to pollsters, not because they're malicious, but because they think their true opinions are unpopular. (That's why we have secret polling in elections.)

When a question is multiple choice, the way the brackets are laid out will be interpreted by many of those hearing it as indicating something about "what other people think". There's a strong tendency for people to try to give an answer which is not extreme; even if they believe it, they're far more likely to lie about what they think if they believe that their answer places them well outside the norm.

Suppose that a person believed that the right answer was "8 years". The first question offered six choices but didn't reach a period of one year until the fourth choice, and the person believing "8 years" would have had to choose "more than five years" if they were honest. But in part because this will make them uncomfortable because it seems to indicate that they're outside the norm, and also because the question includes the same kind of "creep up on it" that I mentioned above, there's a tendency for people to change their answer and give one of the other, shorter choices.

The second version of the question offered five choices, and a period of one year appears as the second choice five instead of the fourth choice of six, thus implying "we expect most people to opt for a long occupation". Our 8-year person can choose the fourth answer. So it feels more comfortable to answer honestly to the second question.

Someone believing in a short occupation would be in a similar situation. If someone believed it should be 9 months, in the first question they could give the third answer of six and feel comfortable for being right in the middle of the expected answers, but in the second poll they had to choose the first one of five, and there would be a tendency to feel they were on the fringe.

Would that change distort the result? The existing research suggests it would, but you can't directly tell from looking at the results from those questions because time and events had passed between them, and because this was certainly not the only thing which might have affected the result. But if you look at the sum of all answers for periods of 3 years or longer, the April 10 poll found 22%, and the May 1 poll found 20%, when using the first version of the question. The July 10 poll, using the second question, found 26% and the July 24 poll found 29%. The difference is much greater than the usual ±3 percentage points they usually cite as the potential error, but that isn't very comforting.

Did public support for a long occupation increase between April and July last year? Damned if I know. The only thing I'm sure of is that we can't tell from this data.

There are other sources of bias which affect those polls even when the pollsters are not trying to manipulate the results. As I mentioned, reports on opinion polls usually include a comment that "the result is accurate to plus or minus three points", but that's not really true. That claim of accuracy comes from statistics and derives from standard formulas for evaluating the chance that a sample of a given size is representative of the whole.

But that statistical calculation is based on the assumption that there is no inherent bias in the means by which the members of the sample are chosen, and in these kinds of polls there actually turn out to be a lot of such biases.

I guess one reason why I'm skeptical about polls is because when I was in high school I read a small tome called How to lie with statistics for the first, but by no means the last time. It was originally published in the 1950's, and was out of print for years, but now it's back in print. What the author, Darrell Huff, does is to show all the ways in which people deliberately misuse statistics in order to deceive us, but though his goal is to reveal their tricks, he actually writes the book as a tongue-in-cheek manual on how to deceive. (The pretense is always thin and never convincing, and he drops the pretense entirely in the last chapter.)

Huff's book is short and hilarious, and when reading it you have so much fun that you don't notice just how much you're learning. In fact, it is packed with insight and information and I recommend it unconditionally. Fifty years on, it is just as relevant as it ever was.

Huff dedicates a chapter in his book to opinion polling, and talks about many ways in which polling samples can be biased. And what he says rings true to this day, for those polls are still biased.

Pollsters have to find the people who answer the poll, and there's no single way they can do that which does not implicitly bias the sample. (Huff gives the example from the 1930's, back when trains were major forms of both short-range and long range transportation. One pollster back then said that she looked for people at train stations, since "all kinds of people ride trains". Others then pointed out to her that mothers of small children and handicapped shut-ins were likely to be underrepresented.)

Given that there's no single place pollsters can go where the people they'd encounter would be a statistically valid sample of the nation as a whole, pollsters use what's called a "stratified sample". Rather than trying to come up with a single representative sample, they try to mix together several biased samples so that their biases cancel.

The pollsters use census data to learn things like the current racial breakdown of the population, as well as things like education levels and affluence, and they also use the census data to identify areas where certain kinds of combinations of those things are particularly common. The phone numbers in those areas are known, and the pollsters will make calls into those areas to make sure to include enough people of that kind in the overall sample. So when a poll is based on interviews with a thousand people, it isn't the first thousand people they reached by dialing phone numbers at random.

Even honest pollsters have to use stratified samples. But if a pollster is trying to cheat then that process of creating a stratified sample is subject to considerable abuse. For instance, if they're trying to include a certain number of well-educated whites in the sample, they would still get different results from calling Raleigh NC ("Research Triangle") than if they called San Jose CA. "Poor urban blacks" in Oakland CA won't necessarily give the same answers as those in East St. Louis IL.

But there are other ways in which the sampling process is invalid, to at least some extent.

Modern opinion polls are conducted by phone. Some of the earliest phone-based opinion polls, such as the notorious prediction that Dewey would defeat Truman in the 1948 presidential election, turned out to be affected by the fact that people with phones in their homes averaged much more affluent than those who did not, and were more likely to vote Republican.

For the last 30 years or so in the US, virtually everyone living at any level beyond the most desperate poverty has had a phone, so that particular bias is no longer a factor. But in the last ten years or so, a new one has crept in.

We're all aware (or damned well should be) of the fact that online polls are completely meaningless. Respondents to such polls are a self-selected sample and self-selected samples are nearly always unrepresentative. But even telephone-based opinion polling is subject to a degree of bias due to self-selection. The pollsters choose the people who are invited to participate, but those people then decide whether they will or will not do so. Usually most don't.

The pollsters call people and try to convince them to spend twenty minutes or half an hour answering strange questions. They're using the same kind of equipment to do this as telemarketers use, and in fact in the early seconds of the call it sounds exactly the same as a telemarketing call. (Indeed, the same person asking these poll questions may tomorrow call you and ask if you're interested in changing your long distance carrier.)

Some of those that the pollsters call will hang up on them before the pollsters even get a chance to say that they're not selling anything. Others may listen to the initial pitch but will decline to burn half an hour that way. A relatively small percentage will agree to participate.

Are those who are willing to do that truly a representative sample of the population as a whole? Those who tell the pollsters to buzz off, or who are simply too busy to spare the time, are already clearly different than those willing to answer such questions and who have the time to spare. How many other differences are there between them?

Do they have the same statistical distribution of opinions as the group which is willing to participate? Are they equally likely to vote, or more likely, or less likely?

In fact, I'm not sure anyone really knows the answer to that. Given that those who tell pollsters to buzz off would also tell researchers trying to answer that question to buzz off, and those too busy to talk to pollsters will also be too busy for researchers, it's a bit difficult to say. Ultimately the only way to find out is to see if the poll accurately predicts such things as election results, where even the "buzz-off" group participates.

How do I find poll data? Mostly I don't.

If I look at the results of opinion polls at all, I tend to try to find several on the same subject and to compare their results. If they wildly disagree, then it means some, and perhaps all of them were trying to manipulate their results, and I won't assume that any of them are accurate. But even if they all closely agree, I don't take poll results very seriously unless the result is extremely lopsided. The pollsters try to claim ±3 points accuracy, but I generally assume it's no better than ±15 points.

The polls have consistently shown that about three times as many Americans support the war in Iraq as oppose it. That's a large enough difference so that it is significant even with all the problems in the polling process. But if the poll shows opposing positions within about 10 percentage points of one another, I assume that there's no way of knowing which position is actually more popular until it's put to the test. When it comes to politics, that means elections.

I've always been extremely cynical about the real significance of the Iowa caucus and the NH primary, for a large number of reasons I don't care to rant about right now. I've always thought that they are given far more attention than they deserve. That said, I do confess that this year I am grateful to the people in those states, because they've permitted me to feel a degree of satisfaction in the consternation of the press about the way that the actual result was so different from what had been predicted. Among other things, the Iowa caucus and NH primary results show that the opinion poll results ahead of them had been very badly wrong.

The press had begun to believe their own lies and distortion, and ended up looking foolish. There's more than a small amount of justice in the fact that the press seemingly was fooled by its own lies and distortion more badly than anyone else was. (The last couple of years have not been kind to the press.)

In his book, Huff at one point says (paraphrased) that when you've got a jar full of a mix of red jellybeans and blue jellybeans and you want to know the proportion of each in the jar, there's really only one way to find out: count 'em. But if you want an approximate answer, you can take a couple of handfuls of jellybeans out of the jar and count them, and the result will be similar to the whole.

But that's only true if you truly make those handfuls a representative sample, and if you use a process of identifying "blue" and "red" which is accurate. If you don't, then you won't learn anything. Modern opinion polling involves so many sources of both conscious and unconscious bias that what they find only bears a mild resemblance to reality, if any at all. Ultimately the only poll that matters