Gov 1347: Election Analytics - Final Prediction

Posted on Nov 1, 2020

This blog is part of a series related to Gov 1347: Election Analytics, a course at Harvard University taught by Professor Ryan D. Enos.


With election night rounding the corner, it is time for me to make one final attempt to forecast the 2020 presidential election. For my final forecast, I decided to use two different types of models: a multiple linear regression model for my two-party popular vote forecast and a binomial logistic regression model for my electoral college forecast. These models are very similar to the ones created in the Harvard Political Review’s forecast that I co-authored. However, I extend the electoral college model to use a binomial logistic regression to try and do a better job of accounting for fundamental uncertainty.


Electoral College Forecast

For my state electoral college models, I used the same set of factors that I used for my national models. However, instead of using the previous election two-party popular vote to account for the previous election, I used the voter eligible population and number of state votes in past elections to create binomial logistic regression models for each state and the District of Columbia. One thing to note is that I treated Nebraska and Maine as winner-take-all states in my models because there was not enough historical data on each district (Nebraska and Maine use a split electoral vote system).

I compared this model (model 3) with other models that only used fewer predictors (just average polling for model 1 and average polling + job approval for model 2). In my in-sample testing, model 3 had the lowest rmse scores across the states and DC. Model 3 had lowest rmse scores compared to the other two models in nearly every state by 0.05 at most (for the North Carolina models). But, other states like Hawaii saw nearly identical scores across the three models. In my out-of-sample testing, I conducted a leave-one-out validation on just 2016. It was difficult to repeat this on models for every election year given that I was creating models for each state. However, the 2016 test prediction incorrectly forecasted some of the key battleground states that tipped the election in favor of Trump in 2016: Florida, Michigan, Nevada, Pennsylvania, and Wisconsin. However, pollsters are aware of the reason that caused this in 2016 - under-counting white, non-college voters. So, many of them have adjusted their polling to account for this in 2020. If they adjusted correctly, this shouldn’t be an issue for the 2020 forecast.


My 2020 electoral college forecast map is shown above. I define strong as a prediction that gives either candidate greater than a 5% margin-of-victory, lean as between 2-5%, and toss-up as any margin within 2%. Looking at the states that are considered toss-ups in my forecast, Texas and Iowa lean slightly in favor of Trump while Georgia leans slightly towards Biden. Texas has been a historically safe Republican state, however, Texas has already seen record-breaking early voting turnout - surpassing the total number of voters in 2016 already in early voting. So, it’d be interesting to see if Texas will flip Blue for the first time since 1976. The other two toss-up states: Georgia and Iowa are just as interesting. Georgia is also a historically red state, so it’d be interesting to see if it flips blue for the first time since 1992. Iowa flipped red in 2016, but went to Obama in 2008 and 2012. However, the latest poll from the Des Moines Register is favoring Trump.



In my 1,000 simulations of the data, taking into account the fundamental uncertainty and also the standard error, Biden wins an average of 343 electoral votes in the simulations and Trump wins an average of 195 votes. In these simulated elections, Trump won a greater share of electoral college votes in 7 elections and there was an electoral college tie once. However, there are still many factors that this model does not take into account - such as the COVID-19 pandemic and the increased number of mail-in and early voting. These are uncertainties that no model can foresee, so we must be cautious as to not take these forecasts as truths. That being said, based only on these models, I expect to see Joe Biden winning the electoral college.


The Data

All the code, data, and graphics for this blog post are available on GitHub.

Share Tweet