Using Machine Learning Techniques to Predict United States House of Representatives Elections

Authors

  • Sanatan Mishra Homestead High School
  • John Lee University of Chicago

DOI:

https://doi.org/10.47611/jsrhs.v12i3.4697

Keywords:

Election prediction, House of Representatives, Artificial Intelligence, Two-tier ML model, LSTM, Ridge Regression

Abstract

We predict the results of the United States House of Representatives elections using machine learning techniques. We started by collecting and preprocessing data on the partisan lean of districts, the state of the economy, the national political environment, candidate political stances, news headlines about each candidate in each race, and the past election results. Then, we selected, designed, and trained the models we would use to predict those election results. We used single-tier models that took either only news headline text data or only numerical data as inputs and two-tier models that used both news headlines and numerical data. Our best-performing model was a two-tier model with a GRU as the first tier followed by a Ridge Regressor as the second tier, with a root mean squared error of under 2 percentage points. The vote share predicted by our best model was within 2 percentage points of the actual observed vote share.

Downloads

Download data is not yet available.

References or Bibliography

Wikimedia Foundation. (2023, May 19). 2018 United States House of Representatives elections in California. Wikipedia. https://en.wikipedia.org/wiki/2018_United_States_House_of_Representatives_elections_in_California

Wikimedia Foundation. (2023a, April 8). 2018 United States House of Representatives elections in Virginia. Wikipedia. https://en.wikipedia.org/wiki/2018_United_States_House_of_Representatives_elections_in_Virginia

Borisov, V., Leemann, T., Sessler, K., Haug, J., Pawelczyk, M., & Kasneci, G. (2022). Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems, 1–21. https://doi.org/10.1109/tnnls.2022.3229161

Brownlee, J. (2019, August 12). Overfitting and underfitting with machine learning algorithms. MachineLearningMastery.com. https://machinelearningmastery.com/overfitting-and-underfitting-with-machine-learning-algorithms/

Brownlee, J. (2022, August 15). When to use MLP, CNN, and RNN Neural Networks. MachineLearningMastery.com. https://machinelearningmastery.com/when-to-use-mlp-cnn-and-rnn-neural-networks/

Bradlee, D., Rinn, D., Ramsay, A., & Crowley, T. (2021, November 9). CA 2020 Congressional. Dave’s Redistricting. https://davesredistricting.org/maps#stats::d6ccadfa-243e-4ecc-9bb2-716b3f82afee

Reader, T. C. (2021, October 4). Decision tree regression. The Click Reader. https://www.theclickreader.com/decision-tree-regression/

Castillo, D. (2023, April 7). Decision trees in machine learning explained. Seldon. https://www.seldon.io/decision-trees-in-machine-learning

Grinsztajn, L., Oyallon, E., & Varoquaux, G. (2022, July 18). Why do tree-based models still outperform deep learning on tabular data?. arXiv.org. https://doi.org/10.48550/arXiv.2207.08815

St. Louis, F. R. B. of. (2023, April 27). Gross domestic product. FRED. https://fred.stlouisfed.org/series/GDP

Isotalo, V., Saari, P., Paasivaara, M., Steineker, A., & Gloor, P. A. (2016). Predicting 2016 US presidential election polls with online and media variables. Designing Networks for Innovation and Improvisation, 45–53. https://doi.org/10.1007/978-3-319-42697-6_5

Joby, A. (2021, July 19). What is K-nearest neighbor? an ML algorithm to classify data. Learn Hub. https://learn.g2.com/k-nearest-neighbor

Jose, R., & Chooralil, V. S. (2016). Prediction of election result by enhanced sentiment analysis on Twitter data using classifier ensemble approach. 2016 International Conference on Data Mining and Advanced Computing (SAPIENCE), 64–67. https://doi.org/10.1109/sapience.2016.7684133

Joseph, F. J. (2019). Twitter based outcome predictions of 2019 Indian general elections using decision tree. 2019 4th International Conference on Information Technology (InCIT), 50–53. https://doi.org/10.1109/incit.2019.8911975

Jones-Rooy, A., Mehta, D., Radcliffe, M., Rakich, N., Shan, D., & Wolfe, J. (2023, May 11). Generic Ballot : 2018 polls. FiveThirtyEight. https://projects.fivethirtyeight.com/polls/generic-ballot/2018/

Pedamallu, H. (2020, November 30). Rnn Vs Gru VS LSTM. Medium. https://medium.com/analytics-vidhya/ rnn-vs-gru-vs-lstm-863b0b7b1573

Raj, A. (2021, June 11). A quick and dirty guide to random forest regression. Medium. https://towardsdatascience.com/a-quick-and-dirty-guide-to-random-forest-regression-52ca0af157f8

Ramsay, A. (2020, February 7). Election Composites. Medium. https://medium.com/dra-2020/election-composites-13d05ed07864

Engati. (2023, May 18). Ridge regression. Engati. https://www.engati.com/glossary/ridge-regression

Carnegie Mellon University Statistics & Data Science. (2021, July 8). Supervised Learning. Carnegie Mellon Sports Analytics. https://www.stat.cmu.edu/cmsac/sure/2021/materials/lectures/slides/18-KNN-kernel.html#1

TensorFlow. (2023, March 23). Tf.keras.preprocessing.text.tokenizer : tensorflow V2.12.0. TensorFlow. https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/text/Tokenizer

Tsai, M.-H., Wang, Y., Kwak, M., & Rigole, N. (2019). A machine learning based strategy for election result prediction. 2019 International Conference on Computational Science and Computational Intelligence (CSCI), 1408–1410. https://doi.org/10.1109/csci49370.2019.00263

US PCE inflation rate (I:USPCEQIR). YCharts. (2023, April 27). https://ycharts.com/indicators/us_pce_quarterly_inflation_rate

US unemployment rate (I:USURSQ). YCharts. (2023b, April 7). https://ycharts.com/indicators/us_unemployment_rate_quarterly

Bradlee, D., Rinn, D., Ramsay, A., & Crowley, T. (2022, June 12). VA 2020 Congressional. Dave’s Redistricting. https://davesredistricting.org/maps#stats::28033d62-2027-4661-95d0-557621f823e9

Zach. (2021, August 26). When to use Ridge & Lasso regression. Statology. https://www.statology.org/when-to-use-ridge-lasso-regression/

Zolghadr, M., Niaki, S. A., & Niaki, S. T. (2017). Modeling and forecasting US presidential election using learning algorithms. Journal of Industrial Engineering International, 14(3), 491–500. https://doi.org/10.1007/s40092-017-0238-2

Gunjal, S. (2021). What is root mean square error (RMSE): Data Science and Machine Learning. Kaggle. https://www.kaggle.com/general/215997

pandas. (2023, April 24). Pandas.dataframe. pandas 2.0.1 documentation. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

scikit-learn. scikit-learn: machine learning in Python — scikit-learn 1.2.2 documentation. (2023, March 9). https://scikit-learn.org/stable/

Aggarwal, R., & Ranganathan, P. (2016). Common pitfalls in statistical analysis: The use of correlation techniques. Perspectives in clinical research, 7(4), 187–190. https://doi.org/10.4103/2229-3485.192046

Published

08-31-2023

How to Cite

Mishra, S., & Lee, Y. K. J. (2023). Using Machine Learning Techniques to Predict United States House of Representatives Elections. Journal of Student Research, 12(3). https://doi.org/10.47611/jsrhs.v12i3.4697

Issue

Section

HS Research Articles