Thursday, November 10, 2016

"Trump's Win Isn't the Death of Data--It Was Flawed All Along"


I'm going to react at greater length and in greater detail to the surprise outcome of the American presidential election. In the meantime, I'd like to point readers to Cade Metz's Wired article "Trump's Win Isn't the Death of Data--It Was Flawed All Along". It raises a lot of interesting questions about statistics collection generally, not just political polling.

The lesson of Trump’s victory is not that data is dead. The lesson is that data is flawed. It has always been flawed—and always will be.

Before Donald Trump won the presidency on Tuesday night, everyone from Nate Silver to The New York Times to CNN predicted a Trump loss—and by sizable margins. “The tools that we would normally use to help us assess what happened failed,” Trump campaign reporter Maggie Haberman said in the Times. As Haberman explained, this happened on both sides of the political divide.

Appearing on MSNBC, Republican strategist Mike Murphy told America that his crystal ball had shattered. “Tonight, data died,” he said.

But this wasn’t so much a failure of the data as it was a failure of the people using the data. It’s a failure of the willingness to believe too blindly in data, not to see it for how flawed it really is. “This is a case study in limits of data science and statistics,” says Anthony Goldbloom, a data scientist who once worked for Australia’s Department of Treasury and now runs a Kaggle, a company dedicated to grooming data scientists. “Statistics and data science gets more credit than it deserves when it’s correct—and more blame than it deserves when it’s incorrect.”

With presidential elections, these limits are myriad. The biggest problem is that so little data exists. The United States only elects a president once every four years, and that’s enough time for the world to change significantly. In the process, data models can easily lose their way. In the months before the election, pollsters can ask people about their intentions, but this is harder than it ever was as Americans move away from old-fashioned landline phones towards cell phones, where laws limit such calls. “We sometimes fool ourselves into thinking we have a lot of data,” says Dan Zigmond, who helps oversee data science at Facebook and previously handled data science for YouTube and Google Maps. “But the truth is that there’s just not a lot to build on. There are very small sample sizes, and in some ways, each of these elections is unique.”

No comments: