There are no products in your shopping cart.
Why Pollsters Were Completely and Utterly Wrong (hbr.org)
Author: Dan Cassino
Traditional political polling in America has been living on borrowed time, and the divergence of the actual votes in Tuesday’s election from what was expected in the polls may signal that its time is up. The fact that the polls apparently missed the preferences of a large portion of the American electorate indicates a larger, more systematic issue, one that’s unlikely to be fixed anytime soon.
The basic problem — and the reason pollsters have been nervous about just this sort of large-scale polling failure — comes from the low response rates that have plagued even the best polls since the widespread use of caller ID technology. Caller ID, more than any other single factor, means that fewer Americans pick up the phone when a pollster calls. That means it takes more calls for a poll to reach enough respondents to make a valid sample, but it also means that Americans are screening themselves before they pick up the phone.
So even as our ability to analyze data has gotten better and better, thanks to advanced computing and an increase in the amount of data available to analysts, our ability to collect data has gotten worse. And if the inputs are bad, the analysis won’t be any good either.
That self-screening is enormously problematic for pollsters. A sample is only valid to the extent that the individuals reached are a random sample of the overall population of interest. It’s not at all problematic for some people to refuse to pick up the phone, as long as their refusal is driven by a random process. If it’s random, the people who do pick up the phone will still be a representative sample of the overall population, and the pollster will just have to make more calls.
Similarly, it’s not a serious problem for pollsters if people refuse to answer the phone according to known characteristics. For instance, pollsters know that African-Americans are less likely to answer a survey than white Americans and that men are less likely to pick up the phone than women. Thanks to the U.S. Census, we know what proportion of these groups are supposed to be in our sample, so when the proportion of men, or African-Americans, falls short in the sample, pollsters can make use of weighting techniques to correct for the shortfall.
The real problem comes when potential respondents to a poll are systematically refusing to pick up the phone according to characteristics that pollsters aren’t measuring or can’t adjust to match what’s in the population. For instance, while Census numbers can tell us how many Asian-Americans live in a particular state, they can’t reliably tell us how many Republicans, or liberals, or evangelicals are in that state. As a result, if a group like evangelicals or conservatives systematically exclude themselves from polls at higher rates than other groups, there’s no easy way to fix the problem. Often, it may not even be clear that there is a problem, especially for characteristics that aren’t commonly measured on polls or that can fluctuate, like church membership and political preferences, respectively.
None of this would be a problem if response rates were at the levels they were at in the 1980s, or even the 1990s. But with response rates to modern telephone polls languishing below 15%, it becomes harder and harder to determine whether systematic nonresponse problems are even happening. These problems go from nagging to consequential when the characteristics that are leading people to exclude themselves from polls are correlated with the major outcome that the poll is trying to measure. For instance, if Donald Trump voters were more likely to decide not to participate in polls because they’re rigged, and did so in a way that wasn’t correlated with known characteristics like race and gender, pollsters would have no way of knowing.
Of course, if unobserved nonresponse is driving poll errors, it’s necessary to ask how polls have done so well up to this point. After all, response rates have been similarly low in at least the last four presidential elections, and the polls did well enough in those. Part of the problem, and what makes this election different, is a seeming failure of likely voter models.
One of the most difficult tasks facing any election pollster is determining who is and is not actually going to vote on Election Day. People tend to say they’re going to vote even when they won’t, so it’s necessary to ask more questions. Every major pollster has their own set of questions in the likely voter questions, but they typically include items about interest in the election, past vote behavior, and knowledge of where a polling place is. Using these questions to determine who is and is not going to vote is a tricky business; the failure of a complex likely voter model is why Gallup got out of the election forecasting business. As long as voter behavior stays stable, these models should work. But in elections in which past voter behavior is upended, such as President Barack Obama’s win in 2008, they can fail catastrophically.
It may be the case that standard sampling and weighting techniques are able to correct for sampling problems in a normal election — one in which voter turnout patterns remain predictable — but fail when the polls are missing portions of the electorate who are likely to turn out in one election but not in previous ones. Imagine that there’s a group of voters who don’t generally vote and are systematically less likely to respond to a survey. So long as they continue to not vote, there isn’t a problem. But if a candidate activates these voters, the polls will systematically underestimate support for the candidate. That seems to be what happened Tuesday night.
This doesn’t mean modern political polling is doomed. If pollsters and political analysts can identify what went wrong, they may be able to stop a repeat of such errors and build models that account for them. But unless they can find some way of fixing the underlying issue of low response rates, it’s not certain that they’ll be able to. Without strong polling, the public and the markets won’t know what’s likely to happen and candidates won’t know where to most effectively put their resources. At least for unusual elections, polling seems broken — and we have no way of knowing which elections are unusual until the votes actually come in.