In the wreckage the 2016 election unleashed on polls, a few things have quickly become clear. The entire country was working with incomplete or biased data. Past polling methods are flawed, but the question now as polling firms try to reestablish their reputations is how to adapt to the new paradigm.
A good start is FiveThirtyEight, which aggregates major polls into a predictive model. In their election postmortem, their conclusion is that Clinton’s lead was overstated in state polls by an average of 3.7 points (in either direction).
In the average state won by Trump, the polls missed by 7.4 points. The polls’ most impactful miss was voter turnout, especially in swing states. Compared to 2012, turnout dropped by 1.3% in Iowa, 3% in Michigan, and 4% in Ohio; Trump carried all of those states despite Obama winning them in 2012.
Clearly, the problems are industry-wide; that’s why the American Association for Public Opinion Research (AAPOR) has assembled a panel to investigate and report their findings in May.
It’s easy to lose yourself in an echo chamber on any side of the aisle, and Silicon Valley was completely floored by the election for this reason. Living, working, and socializing in the tech bubble makes it hard to empathize with the voters who chose Trump. The polls may have fallen into a similar trap of groupthink, reinforcing each other in the direction they expected to see.
The uniqueness of the 2016 election made it hard to predict accurately as well. One of the two major parties nominated the first female candidate in American history; the other nominated a businessman with no political record at all. There was no historical precedent for this matchup, which meant most models based on historical data had a significant bias.
Another reason polls may have struggled is because Trump supporters are not typical poll responders. Some data suggest Trump voters were less comfortable being honest over the phone; women especially felt less comfortable telling a pollster they supported Trump. Research suggested Trump’s appeal was rooted in his anti-establishment nature; his supporters’ mistrust of government institutions may have spilled over to mistrust of the polls. The polls had an unseen bias because pollsters had no way of knowing if someone’s answers were honest or dishonest (whether for reasons of political correctness or simply feeling intimidated).
Not all of the polls were landline phone-only, though. A poll run by the LA Times and USCconsistently pegged Trump as the leader in the final months of the election. One of the things that differentiated this poll was its intake form: instead of calling and asking a binary question about voters’ choice, the Internet-based poll requested voters rate their preferences on a scale out of 100. This novel approach allowed ambiguous voters to still contribute relevant data on issues but also reached out to a part of the population without a landline (e.g. mobile or VoIP only millennials) .
Polls tended to be least accurate in states that Trump won. Polling should refocus on non-urban areas and create more detailed sub-segments of voters. We need more diverse datasets and a renewed focus on the forgotten parts of the electorate either alienated by the polling selection, or polling method or uncomfortable freely expressing their opinion.
With call automation technology in abundance, pollsters should be able to remove the human “judgemental” aspect by having prospective voters talking to bots rather than humans. Pollsters should also consider ‘low tech’ options like snail mail surveys; the biggest electoral upsets came in some of the poorest areas of the country. Pollsters could also think through “high tech” options like Skype or mobile for millennials.
As pollsters accumulate vast datasets, machine learning and predictive analytics can become very useful to offset a lack of historical data. While Clinton was the first female candidate for president, she was certainly not the first female up for a CEO position or a governorship or a Senate seat. Ditto Trump; he was not the very first outsider to run for office. Can models used in other circumstances or industries be factored in for elections so that the historical bias is reduced and the predictions more accurate?
These new models, leveraging machine learning and predictive analytics, must also somehowmodelize emotional intelligence. It’s every bit as important as IQ in predicting human behavior--Donald Trump’s platform was not just a policy alternative to Hillary Clinton’s, but an emotional alternative as well. How much of Trump’s appeal was emotional to his voters? We can’t know for sure, because polls today don’t properly harness the millions of social media data voters generate on a daily basis. Perhaps the models could weight fearful or financially struggling voters more heavily, using machine learning as they are more probable to vote than others.
Maybe anonymized social data from Facebook can be leveraged by polling firms to remove the echo chambers bias that rendered Trump’s victory such a shock for some and so obvious to others.
Ultimately, we need to see what this is and what this is not. Data isn’t dead when it comes to polling; it’s merely behind. Modeling needs to become more nimble, able to adjust to new cultural norms and collect broader datasets. These elections, by breaking the old model, might be the catalyst to rethink and rebuild a new one.
(About the author: Isabelle Guis is the chief marketing and strategy officer at Egnyte)
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access