Shortly after the 2016 presidential election, I told my students that I had a model that predicted the popular vote for the last ten presidential elections (1980–2016) perfectly. For example, my model’s prediction of Hillary Clinton’s share of the 2016 two-party vote was exactly equal to the actual 51.11% share she received. I didn’t look at the economy, the candidates’ personalities, or any of the other factors that you and my students might think are important. Instead, I used the high temperatures on election day in these nine cities: Bozeman, Montana; Broken Bow, Nebraska; Burlington, Vermont; Caribou, Maine; Cody, Wyoming; Dover, Delaware; Elkins, West Virginia; Fargo, North Dakota; and Pocatello, Idaho. How did I select these nine cities? They were in small states and I liked their names.
Many students were puzzled. Some were suspicious. Did I make up the results? How could the temperature in Bozeman or Broken Bow have any meaningful effect on presidential elections? Others were so smitten by the perfect forecasting record that they started making up theories about how the weather might affect elections.
Anticipating this temptation, I showed my students an even more preposterous model. I had used a computer software program to generate values for nine random variables that have nothing at all to do with the real world, let alone what is going on in the United States during presidential election years. And again the model predicted the popular vote perfectly in all ten elections.
Despite some students’ suspicions, I did not make any of this up, but I did have a secret. Any ten observations can always be predicted perfectly by a model with nine imperfectly correlated explanatory variables. Period. There is nothing special about the nine explanatory variables. Any nine will do. The important thing is that I used nine variables to predict ten elections.
Despite the perfect fit, the model is perfectly useless — which can be demonstrated by looking at the ten presidential elections before 1980. The figure shows that, yes, the model fits the data that were used to estimate the model, but performs terribly when predicting election outcomes in other years. For example, the model predicts that in the 1940 election, Franklin Roosevelt gets negative 11 percent of the vote instead of the 55 percent he actually received.
Figure 7 Using 1980-2016 Weather to Predict Presidential Elections
My nine-variable model is an example of overfitting, the relentless addition of complexity intended solely to make a model fit the data better.
Disparaging overfitting, the great mathematician John von Neumann (1903–1957) once said, “With four parameters I can fit an elephant and with five I can make him wiggle his trunk.” The figure shows a four-parameter model that does indeed look like an elephant, though his trunk is not wiggling.
Overfitting is not the only pitfall in constructing presidential election models. Another is data mining — sifting through a very large number of possible explanatory variables in search of high correlations. Here, the correlation between the weather model’s predictions and the actual presidential vote in the 1980–2016 elections is 0.94 even with only five explanatory variables (the weather in Burlington, Cody, Dover, Elkins, and Pocatello). How did I choose these particular cities? I had daily data on the high and low temperatures in 25 cities and I used data-mining software to consider all 2,118,760 possible five-variable combinations of these 50 variables. I could identify the combinations that fit the presidential election results the closest.
The problem with ransacking data for explanatory variables that fit one set of data is that if there no logical basis for the data-mined model, it is likely to do a terrible job with fresh data. As expected, the figure below shows that my five-variable weather model did great job fitting the 1980–2016 data and an awful job with the 1940–1976 data.
These two pitfalls — overfitting and data mining — make many, if not most, presidential election models entertaining but unreliable. They are very fragile in that they predict past elections astonishingly well and then do poorly with new elections and must be tweaked, after the fact, to “correct” for these mispredictions.
For example, on the eve of the 2016 presidential election, Alan Lichtman, a history professor at American University, boldly predicted a Trump victory. Lichtman’s prediction was based on a model that used 13 true/false questions. It was said to have predicted the winner of the popular vote in every presidential election since 1984.
After Trump’s surprising victory (even Trump seemed surprised, as he had not prepared a victory speech and was planning to fly to Scotland to play golf), Lichtman’s “correct” forecast was widely reported, even though a final vote count showed that it was wrong — Hillary Clinton won the popular vote.
Lichtman’s model is surely overfit and fragile, using 13 variables (“keys”) to predict 8 elections. The model has considerable wiggle-room in that some of the true/false questions are extremely subjective (“The incumbent administration effects major changes in national policy”). In addition, Lichtman’s original model, reported in 1981, had 12 keys. The current model dropped one key (“Has the incumbent party been in office more than a single term?”), added two foreign policy/military keys, and changed one key (from “Did the incumbent party gain more than 50% of the vote cast in the previous election?” to “After the midterm elections, the incumbent party holds more seats in the U.S. House of Representatives than after the previous midterm elections”). The large number of changes after a handful of election results is a sure sign that the model is better at predicting the past than predicting the future. Lichtman has not made a prediction for the 2020 election even though the answers to the 13 keys should be known by now.
Another model that has been in the news recently says that Trump has a 91–95 percent chance of winning re-election and will most likely win 362 electoral votes to Biden’s 176 votes. This model, constructed by Helmut Norpoth, a political science professor at Stony Brook University, is based on three flexible variables: a presidential voting cycle, long-term trends in partisanship, and performance in the primaries. For elections prior to 1952, the primaries variable incorporates all presidential primaries; for the 1952–2004 elections, only the New Hampshire primary is used; and for the 2008, 2012, and 2016 elections, the New Hampshire and South Carolina primaries are used. Norpoth seems blissfully unaware of the dangers of tweaking models to fit past data. We can anticipate more tweaking after the 2020 election.
Not all presidential election models are overfit and data mined. For example, The Economist has constructed a model with the help of Andrew Gelman, a professor of statistics and political science and director of the Applied Statistics Center at Columbia University, and Merlin Heidemanns, a Columbia PhD student. Gelman is a prominent critic of sloppy statistics, including what he calls “the garden of forking paths.”
If you wander through a garden making random choices every time you come to a fork in the road, your final destination will seem almost magical. What are the chances that you would come to this very spot? Yet you had to end up somewhere. If a model that correctly predicted your path had been specified before you started your walk, that would have been amazing. However, identifying your path after you finished your walk is distinctly not amazing.
The Economist model is based on sound theory, specifies the variables used, shows the source code used to make the calculations, and updates the predictions daily. It currently gives Biden a 99% chance of winning the popular vote and a 91% chance of winning the electoral vote.
True, the Economist model is based on assumptions, which may well be wrong, but models based on plausible assumptions are generally more reliable than models based solely on past statistical patterns.
Models constructed to fit the past are not reliable predictors of the future. You can seldom see where you are going by looking in a rear-view mirror.
You may also enjoy: Stocks are not a Ponzi scheme and here’s why not: It is not the absence or presence of dividends that determine whether a stock is or is not a Ponzi scheme but the absence or presence of real profits.