What's In a Prediction? On Nate Silver and the Science of Probability
Purchase Book
The Signal and the Noise
author: Nate Silver
publisher: Penguin Press
pub date: 09.27.2012
pp: 544
tags: Politics & Economics , Science & Technology

Andrew Benedict-Nelson on The Signal and the Noise

What's In a Prediction? On Nate Silver and the Science of Probability

February 20th, 2013 reset - +

I PREDICT THAT ONE DAY Nate Silver will be remembered for doing something more consequential than forecasting the winners of presidential elections.

Do I have a sophisticated statistical model like Silver’s FiveThirtyEight backing my prediction? No, nor do I have a computer running thousands of scenarios a day. But Silver’s new book, The Signal and the Noise: Why So Many Predictions Fail — But Some Don’t, suggests that the true value of forecasting the future won’t be found among the high-powered processors that populate the promised land of “Big Data.” 

Instead, Silver portrays forecasting as a humanistic heuristic any of us can use to grow closer to truth. Indeed, he makes the case that we must. “The numbers have no way of speaking for themselves,” he argues from the outset of the book. “We speak for them. We imbue them with meaning.” 

Guided by this ethos, Silver uses The Signal and the Noise to discern the highest and best use of data in a variety of disciplines, from meteorology and epidemiology to baseball and poker. But notably absent is a critical discussion of what greater goal such data-driven divination serves in politics, the area in which Silver currently enjoys the greatest notoriety. 

“[M]en may construe things after their fashion / Clean from the purpose of the things themselves,” Shakespeare’s Cicero warned, and Silver repeats in his introduction. But what purpose is served by a program that perfectly predicts the outcome of a presidential race if such a forecast does not aid the republic it models?

¤

Silver explains in the introduction to The Signal and the Noise that he hopes his book will be more than another story of “nerds conquering the world” with numbers, à la Moneyball or Freakonomics. But it’s not hard to see the appeal of such a frame for Silver’s life story. 

Bored by a job at the accounting firm KPMG in 2003, Silver cooked up PECOTA, a new system for predicting the performance of baseball players (particularly pitchers) over the course of their lifetimes. A key differentiator of Silver’s model was that instead of making a single, straightforward prediction, it ranked the likelihood of several potential outcomes for each player — multiple scenarios, multiple probabilities.

This switch in methodology foreshadows all of the theoretical arguments that Silver’s book and subsequent career make. By developing a model that yielded a series of guesses, Silver was drawing upon a statistical tradition that started with Thomas Bayes, an 18th century English clergyman. Bayes is known for a treatise that proposed a mathematical method for making predictions about phenomena, then incorporating the outcomes into an ever-improving probabilistic model. Bayes’s ideas were later taken up by the French mathematician and astronomer Pierre-Simon Laplace, who used them to make scientific predictions about the cosmos in an era before orbital telescopes and supercomputers provided greater certainty.

Eventually Bayes’s ideas fell out of favor among mathematicians. The statistical method used by most scientists today, for example, is based instead on the ideas of the English biologist R. A. Fisher. Rather than assigning different probabilities to various outcomes, predictions based on Fisher’s ideas seek one best fit for the data, then assign a margin of error to the prediction. This is clearly the best way to go if you’re seeking to establish the truth of a scientific law or the efficacy of a new drug — both scenarios in which certainty is a must and data can be tightly controlled. But Silver makes the case that in most situations, we must concede that we are “imperfect creatures in an uncertain world” and admit that Bayes’s way is the better bet.

In fact, Silver’s clearest illustrations of how Bayesian reasoning works in everyday life involve games of chance. Poker, for example, is a relatively simple game from a mathematical point of view. American college campuses are filled with young men who failed statistics but have memorized the probabilities of the most important hands. But this data alone does not a maverick make; to win, a player must also rank his or her own guesses about the behavior of the other players (and their guesses about the others’ behavior, and their guesses about the guesses about the others’ behavior, and so forth). Some people do this with intuition. Silver describes how he did it with math, making a decent living for several years as an online card shark and part-time Baseball Prospectus writer.

The story of Silver’s years in the “poker bubble,” as he calls it, best dramatize the bettor’s beloved Bayesian reasoning. For example, he argues that one advantage of Bayes’s way is that by inviting us to make predictions even with imperfect data, it also forces us to make visible our biases (or, if we’re lucky, our unconscious common sense). Curiously, Silver explains, long-time gamblers understand this process even when they do not understand the math. With considerable empathy, Silver describes the phenomenon of “tilt,” by which poker players mean “a state of overaggressive play brought on by a loss of perspective” — an inability to account for our own biases. He quotes Tommy Angelo’s Elements of Poker to illustrate the various forms this loss of perspective can take:

I was a great tilter. I knew all the different kinds. I could do steaming tilt, simmering tilt, too loose tilt, too tight tilt, too aggressive tilt, too passive tilt, playing too high tilt, playing too long tilt, playing too tired tilt, entitlement tilt, annoyed tilt, injustice tilt, frustration tilt, sloppy tilt, revenge tilt, underfunded tilt, overfunded tilt, shame tilt, distracted tilt, scared tilt, envy tilt, this-is-the-worst-pizza-I’ve-ever-had tilt, I-just-got-showed-a-bluff tilt, and of course, the classics: I gottta-get-even tilt, and I-only-have-so-much-time-to-lose-this-money tilt, also known as demolition tilt. 

Silver tilted too. He says his own inaccuracies resulted from the fact that he relied on the game for his monthly nut. “I’d play mechanically, unthinkingly, for long stretches of time and often late into the night,” he writes. “I had given up, deep down, on really trying to beat the game.” He says he also developed a sense of entitlement with a parallel kind of resentment: “When I thought I had played particularly well — let’s say I correctly detected an opponent’s bluff, for instance — but then he caught a miracle card on the river and beat my hand anyway, that’s what could really tilt me. I thought I had earned the pot, but he made the money. By tilting, I could bring things back into a perverse equilibrium: I’d start playing badly enough that I deserved to lose.”

While these reflections might make for a fascinating memoir of redemption, Silver’s poker career had a more conventional end. As the United States government began to crack down on online poker, fewer and fewer unskilled “fish” were playing the game, making it harder for even skilled players to win often enough to pay their bills. The “poker bubble” had burst. 

But the politics that led to the crackdown prompted Silver to take a greater interest in the electoral process. That led to the development of the FiveThirtyEight model that has made Silver famous but which also raises the question of whether our political class is on permanent tilt.

¤

In November, Nate Silver joined the ranks of Chuck Norris and grumpy cats. He was not the inspiration for a single viral meme on the Internet; rather, he had a class of memes all to himself, passed on mainly by Obama supporters looking to keep their spirits up. 

In images shared on Facebook, the “Keep calm and carry on” of the London Blitz was reworked into “Keep calm and trust Nate Silver.” As Silver’s model predicted an Obama victory with greater and greater certainty, people circulated exasperated photos of his appearances on talk shows with the caption, “What part of 90.3% do you people not understand?”

After Obama’s win, Silver was celebrated as a sort of co-victor. A common post was a version of the electoral map that showed, instead of red and blue states, a solid block of deep purple states whose electoral outcomes Silver had predicted (it was all of them). His face showed up on a parody of the famous red-and-blue Shepard Fairey poster of Obama, which, instead of “Hope,” read “Math” (perhaps serving as a symbol of the transition from a campaign driven by idealism to one devoted to demographic data). 

After the acceptance speeches, things got really silly, with Twitter users offering Silver sexual favors or speculating about the kinds of predictions he might make when drunk. Others looked toward the next cycle. As one meme put it, “Cut out the middle man: Nate Silver in 2016.”

While that’s a bit much, I’ll happily admit membership in the FiveThirtyEight fandom. On days when election news seemed a distraction from the work that pays the bills, I resolved to read his daily update and nothing else. On days when pundits didn’t seem to have a clue, I waited impatiently for his interpretation of the day’s events — or, just as often, his certitude that the contretemps of the moment would have no bearing on the ultimate outcome.

But there were two species of Silver devotees. One valued his certainty, reinforcing the belief among Obama supporters that the election would be won by “math,” that it was the logical outcome of the “facts on the ground.” Others, like me, valued the nuances of his uncertainty, and saw in it a model for a more responsible form of journalism.

The idealized form of traditional journalism is similar to the kind of statistics developed by Fisher: a rigorous method (newsgathering) is backed up by a battery of tests designed to weed out mistakes (fact-checking). The occasional error in the corrections column serves the role of the outlier, the admitted exception that proves the method produces truth most of the time. Bias is regarded as a kind of pollution to be expelled whenever possible, and the process is imagined as proceeding toward a final product, a single curve that best describes the facts.

Silver was more Bayesian not just in his statistics but also in his storytelling. He admitted the assumptions of his model up front, explaining how he weighed various polls based on their past performance, as well as factoring in information like the state of the economy. Indeed, his blog entries included so much “inside baseball” information that they were sometimes hard to read; it was as if every quote in a reporter’s story also explained how many times she or he had to call the source to get it.

Still, this transparency told readers that the author’s objectivity proceeded not from an unbiased attitude but from objects that (in theory) anyone could access. And the overall effect gave Silver’s work a kind of wonky momentum. He presented each of his stories as an ever-more-educated guess rather than a final product. While pundits pondered the significance of events like Superstorm Sandy, Silver could explain the standards he would use to evaluate their meaning for (what we were told) was the only outcome that mattered: the results on Election Day.

On these terms, the story of any given day could turn out to be no story at all. Traditional reporters could never admit that, dedicated as they were to the news cycle’s value. But when Silver did, he reinforced his own brand of empiricist authority. While one talking head might say, “Obama won this news cycle,” and another might say, “This story will benefit Romney,” Silver would write things like, “Describing the race as a ‘toss-up’ reflects an uninformed interpretation of the evidence” (October 31).

By the time the election actually arrived, traditional reporters looked rather silly in comparison to Silver — not only because his model was so comprehensive, but because one could assume that the pundits parsing the latest gaffe were also reading FiveThirtyEight and knew better. Even National Public Radio was put to shame. I remember one afternoon when the avuncular voice of Robert Siegel announced that a new NPR poll showed Mitt Romney making significant gains. Panicked, I turned up the volume, only to discover that Siegel meant a major change had occurred from the previous NPR poll. It had been conducted weeks earlier, before Romney’s winning performance in the first debate. FiveThirtyEight and similar models had accounted for hundreds of polls since then — and Siegel surely knew it.

Yet in the The Signal and the Noise, Silver suggests that traditional journalists’ awareness of a better way may not matter. While the book is not primarily about FiveThirtyEight or electoral politics, Silver does poke fun at pundits’ poor predictive performance. On the panel show The McLaughlin Group, the visiting experts are asked to make predictions at the end of each episode. Silver explains how he tracked 1,000 predictions made on the show, then followed up on which ones came true. He discovered that of the 733 predictions that could be meaningfully evaluated, 285 turned out to be completely true and 268 turned out to be completely false, with the rest falling somewhere in between. “The panel may as well have been flipping coins,” Silver writes.

There’s a commonsense explanation for what’s happening on McLaughlin Group, of course: the panelists are not being asked to make serious predictions. Whether this is a cause for alarm depends on what role we expect them to play.

Days before the 2012 election, Silver (perhaps bruised by accusations of bias by conservative news outlets) indicted the pundit class thus: “If the state polls are right, then Mr. Obama will win the Electoral College. If you can’t acknowledge that after a day when Mr. Obama leads 19 out of 20 swing-state polls, then you should abandon the pretense that your goal is to inform rather than entertain the public.”

In the book, Silver offers a similarly damning evaluation of McLaughlin Group panelist Monica Crowley, a Fox News reporter who had predicted days before the 2008 election that John McCain would win by “half a point.” Since polls widely predicted an Obama victory and in the end he won by 10 million votes, Silver concludes that

[a]nyone who had rendered a prediction to the contrary had some explaining to do. There would be none of that on The McLaughlin Group when the same panelists gathered again the following week [...] There was no mention of the failed prediction — made on national television in contradiction to essentially all available evidence. In fact, the panelists made it sound as though the outcome had been inevitable all along; Crowley explained that it had been a ‘change election year’ and that McCain had run a terrible campaign — neglecting to mention that she had been willing to bet on that campaign just a week earlier.

Remarks like this make one wonder whether to clap Silver on the back and say, “You showed ’em!” or to rhetorically shake him by the shoulders shouting: “She’s a Fox News reporter! That’s her job!” And that ambivalence matters.

Programs like The McLaughlin Group are designed not just to report on the facts of an election, but also on its drama. It would have seemed extremely odd, even unfair, if on the eve of the 2008 election none of the panelists on such a show were “betting” on McCain. Yet by Silver’s standards the show was a failure.

This paradigmatic mismatch captures the deeper anxieties raised by the success of FiveThirtyEight. Even if Silver has a better method for presenting the score, we also have to ask what his models mean for the game we are all playing.

¤

In The Signal and the Noise, the game is prediction. But Silver’s job on the team is unclear. At times he sounds like a coach, urging all of us to be better forecasters in our own lives. At other times he is an umpire, surveying the fields in which predictive theory has been applied, calling balls and strikes as he goes. 

As a coach, Silver is nothing if not encouraging. Throughout the book, he uses the second person and the first person plural to rhetorically enroll readers into the predictive project. In a typical passage, he writes: 

When we make a forecast, we have a choice among many different methods. [...] The way to become more objective is to recognize the influence that our assumptions play in our forecasts and to question ourselves about them [...] You will need to learn how to express — and quantify — the uncertainty in your predictions. You will need to update your forecast as facts and circumstances change. You will need to recognize that there is wisdom in seeing the world from a different viewpoint.

Coach Silver’s main counsel to achieve this end is to avoid letting one idea dominate predictive models. Borrowing the ancient Greek saying that Isaiah Berlin made famous, he encourages readers to be foxes instead of hedgehogs. The fox has many ideas, while the hedgehog has one big idea. Foxes, Silver writes, are multidisciplinary, adaptable, self-critical, tolerant of complexity, cautious, and empirical. By contrast, hedgehogs are specialized, stalwart, stubborn, order seeking, confident, and ideological.

Many of the academics and experts we encounter in the world, Silver claims, are hedgehogs committed to a particular theory or method of inquiry. That may help them to advance in their particular specialty, he argues, but it makes them poor forecasters. For example, he points to the fact that most Western political scientists in the 1990s failed to predict the collapse of the Soviet Union. The Cold War had been so important to these experts’ worldviews that it was difficult for them to contemplate a thaw, much less a meltdown. Such failures naturally suggest opportunities for “amateur” forecasters like Silver, who can speak statistics but aren’t wedded to any particular data set. 

In dynamic systems, a hedgehog-like overconfidence can lead to real danger. At several points in the book, Silver compares his ideas about prediction with those developed to game the world’s most expensive prediction machine: the stock market. He points out that when computers are asked to maximize returns in simulations of the stock market, they tend to make decisions that are far more rational than those of human beings. But when the computers are programmed to be just slightly overconfident in their guesses about the market, the system develops the sorts of booms and crashes the real market does. Human beings display even greater overconfidence, since in the end they are in the market not to win a logic game, but to beat the system and make money. We end up with winners and losers because, as John Maynard Keynes said, and as Silver quotes, “The market can stay irrational longer than you can stay solvent.”

Arrogance in the stock market is a familiar tale, but Silver shows that a parallel hedgehog problem affects predictions in many other fields. It’s called “overfitting,” which he dubs “the most important scientific problem you’ve never heard of.” 

Overfitting occurs when a person looking at a set of data develops a model that perfectly explains the previous data points but has poor predictive power. For example, Silver points out that if you flip a coin a few times, then develop an equation to predict the next outcome based on previous results, you are unlikely to end up with a model that says the results are 50–50. The problem can be corrected by more data, but it can be fixed more quickly through real-world observations — observing the two-sided nature of a coin gives you a solid “theory” you can use to evaluate the information before you conduct a single test.

One advantage of the Bayesian way, Silver argues, is that it helps us incorporate such initial observations and guesses into our predictive models. The method calls for us to declare our “priors”: estimated probabilities of all potential outcomes. This, too, can protect us from the demon of overconfidence: just think how better off we would all be if a few more economic whiz kids had accounted for the possibility (however remote) that housing prices might not go up again next year. 

But The Signal and the Noise is not merely a guide to fine-tuning one’s forecasting finesse. Silver also steps outside the arena, observing how the game of prediction relates to our ultimate decisions and goals. The most effective examples of this type are drawn from medicine. As Silver observes, public health officials have very few chances to run “pure” epidemiological experiments. In fact, when it comes to diseases that are highly contagious but easily preventable, the most effective models fail by definition — word gets out, no one gets sick.

Silver writes that as a consequence of this distortion effect, as well as the significant potential consequences of their decisions, forecasters in public health are more conservative than most. And this he admires: “[B]ecause of medicine’s intimate connection with life and death, doctors tend to be appropriately cautious. In their field, stupid models kill people. It has a sobering effect.” In response, many medical model-makers have chosen to use their data to inform careful policy decisions rather than perpetually publishing in ways that could spark a panic.

The distorting effect of our predictions’ purposes does not always result in such high-minded compromises, though. For example, Silver tells a compelling story of how the science of meteorology has achieved more accurate results through just the sort of careful calculating he counsels. But there’s a twist: while forecasts issued by the National Weather Service and outlets like The Weather Channel have become more accurate over time, local forecasts show a surprising bias. The guy on channel 9 is more likely to tell you that it’s going to rain. Why? Because the public is more likely to hold him accountable if he fails to predict precipitation than if the forecast calls for rain and it turns out to be an unexpectedly sunny day.

Are these weathermen, like the pundits Silver impugned before the election, entertaining rather than informing the public? It seems like an unfair judgment given the fact that their jobs depend on a public willing to judge them so unfairly. The will of the people, however misinformed, has become a part of the model. All of the individual actors are behaving rationally — a change would require a coordinated effort by many different participants in the system.

And if we believe the same about America, we must view FiveThirtyEight more critically.

¤

Silver’s FiveThirtyEight rightly earned the loyalty of thousands of readers whose first thought when they woke up each day was, “Who is going to win the election?” But now that 2012 is in the history books, it seems worthwhile to ask what purpose the model and those like it ultimately served. 

Consider one of the much-vaunted features of Silver’s model and accompanying blog: the ranking of “tipping point states.” In addition to constructing a model that could predict the winner of each state, Silver’s model also suggested which states were more likely to put the incumbent or the challenger over the top. By the time Election Day came around, the clear favorite was Ohio. Much of Silver’s analysis consisted of defending his case that President Obama’s significant lead in this one state would outweigh any other variations in the polls elsewhere. And of course, he turned out to be right. 

But just as medical model-makers ask if their data will “first do no harm,” we must also consider the consequences of building our democratic discourse around such predictions. While Silver’s forecasting was incredible, it was not exactly a secret that Ohio would be a decisive state in the election. Significantly more time was spent discussing the minutiae of the auto bailout (a meaningful policy move for Ohio and other swing states) than the financial bailout, a much more expensive and consequential piece of legislation. Issues like poverty and gun violence — significant to large urban centers in the “blue” states — were all but ignored. In fact, all discussion of where we were headed as a nation was weighed down by the assumption that a handful of states must decide our fate. And that assumption itself depends on the continued existence of the Electoral College, a puzzle Silver has to solve to seem impressive — after all, there would not be much point to “tipping point states” if the president were elected by a simple popular vote.

Is any of this Nate Silver’s fault? Of course not. But by foregrounding players’ competing plans to game the system, the story Silver tells obscures the idea that a different kind of politics might be — ought to be — possible. If the big story we are telling is, “The American people will make a significant choice about their future,” the drama of The McLaughlin Group may be more effective. Indeed, even if a model like Silver’s is predicting that your guy will win, there is something in the democratic spirit that rebels against it. As you follow quantitative analysts like Silver down the rabbit hole into comparing previous polls from outlying counties in the Buckeye State, something in you wants to shout, “No! We all have free will! Maybe none of this will matter! A million Texans could just spontaneously change their minds … couldn’t they?”

This disorientation continues after the election. Indeed, when we stop thinking about winners and losers and instead think about the greater project of building a better society, FiveThirtyEight seems like something of a false god. Let’s say that you want to reduce poverty in America or create a truly meritocratic education system or end the death penalty. Does it really help to know a few months in advance who will win the presidential election and how? Wouldn’t it better serve your cause to operate in a world in which no one knew what would happen until November and we could continue to indulge the illusion that debates over the common good actually matter? On Election Night, I, like many Americans, saw Ohio turn blue and thought, “Wow, Nate did it.” But weren’t the people who actually did it — the thousands of volunteers on the ground — the ones who had invested their entire selves in a cause regardless of whether the data said their work was necessary? 

But of course those believers had been placed by campaign managers looking at models just like FiveThirtyEight — and this is why Silver and his methods will in fact remain relevant. Campaigns were building strategies around data long before 2012; blogs like FiveThirtyEight have just made that decision-making process visible and accessible to the public in a way that it never was before. It’s not a conspiracy — with Silver’s perspective, it’s easy to see how, just like the local weathermen, politicians and those who report on them are behaving in their rational self-interest. But if we want our politics to be about more than demographic conniving, about more than tilting the swing states, about more than overfitting for Ohio, we also need a place to push back.

To truly serve the Republic, models like Silver’s need a way to envision not just the math of the past but also the possibilities of the future — and I don’t just mean the demographic shift toward Latinos, but also the possibilities opened by organizing and persuading, the true spirit of “¡Sí, se puede!” Such models should help not just politicians; all of us imagine paths to meaningful change. The intellectual tools to build this more purposeful model of modeling are already present in The Signal and the Noise — in fact, they are its premise.

Silver writes:

[The] exponential growth in information is sometimes seen as a cure-all, as computers were in the 1970s. Chris Anderson, an editor of Wired magazine, wrote in 2008 that the sheer volume of data would obviate the need for theory, and even the scientific method. This is an emphatically pro-science and pro-technology book, and I think of it as a very optimistic one. But it argues that these views are badly mistaken [...] It is when we deny our role in the process that the odds of failure rise. Before we demand more of our data, we need to demand more of ourselves.

Silver’s model demanded more of political journalism, and we should demand more of it in turn. FiveThirtyEight has already disrupted electoral reporting, but to disrupt our democracy’s present rut, the political class as a whole should take a page from Silver’s book and ask to what end this data is ultimately gathered. 

Fortunately, there are signs that this is already occurring. The New York Times has continued to run FiveThirtyEight entries after the end of the election, some of them focusing on the ways in which data might inform the decisions presently being made by elected representatives in regard to issues like gun control and immigration. It seems to be an attempt to adapt a method of discerning the “signal” not just during electoral silly season, but in times of allegedly normal governance as well. I can’t be sure that that’s what Silver is up to next — but I’m willing to bet on it.

¤

print

Comments