Evaluating Baseball Statistics by Predicting Playoff Teams

Authors

  • Rohan Nakra Huntington Beach High School
  • Ryan Kimes Huntington Beach High School

DOI:

https://doi.org/10.47611/jsrhs.v12i4.5763

Keywords:

baseball analytics, xgboost, logistic regression, sabermetrics, moneyball

Abstract

In this paper, we explore how different baseball statistics correlate to an entry to the playoffs. We use LogisticRegression and XGBoost to evaluate if a baseball statistic has a high correlation with whether or not a team makes the playoffs. We set up three models: the Moneyball model (which uses moneyball statistics), the All Stats Model (which uses moneyball statistics and additional common statistics), and the XGBoost model (which uses the same dataset of the All Stats model, but the structure of the model is different). We compare these models, evaluating their accuracy, variable coefficients, and confusion matrix. From these tests, we find that the Moneyball Model has similar accuracies to the All Stats Model, revealing that moneyball statistics are still a relevant and accurate way to predict if a team makes the playoffs. The variable coefficients test highlights that moneyball statistics have the highest importance in the model's ability to predict if a team makes the playoffs. While the tests provide a foundation for the evaluation of moneyball and common baseball statistics, there remains future opportunities to use different models and a larger dataset.

Downloads

Download data is not yet available.

References or Bibliography

Taylor, B. (2014, June 8). Theo Epstein wants playoffs or suck, nothing in between, and other bullets. Bleacher Nation | Chicago Sports News, Rumors, and Obsession. https://www.bleachernation.com/cubs/2013/02/27/theo-epstein-wants-playoffs-or-suck-nothing-in-between-and-other-bullets/

Wade, C. (2020). Getting Started with XGBoost in scikit-learn. Medium. https://towardsdatascience.com/getting-started-with-xgboost-in-scikit-learn- f69f5f470a97

Cabral, C. (2020, June 23). Bill James: How sabermetrics changed baseball. Shortform Books. https://www.shortform.com/blog/bill-james-moneyball/

Apostolou, K., & Tjortjis, C. (2019). Sports analytics algorithms for performance prediction. 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA). https://doi.org/10.1109/iisa.2019.8900754

Mizels, J., Erickson, B., & Chalmers, P. (2022). Current state of data and analytics research in baseball. Current Reviews in Musculoskeletal Medicine, 15(4), 283–290. https://doi.org/10.1007/s12178-022-09763-6

Appelman, D. (2008, March 15). Get to know: Runs created. FanGraphs Baseball. https://blogs.fangraphs.com/get-to-know-runs-created/

Chicago Cubs playoff history: 1885 - 2023. Playoff History | 1885 - 2023. (n.d.). https://champsorchumps.us/team/mlb/chicago-cubs

Longest MLB postseason droughts, active and all-time. , Active and All-Time. (n.d.). https://champsorchumps.us/drought/longest-mlb-playoff-drought#tab-drought-historic

Published

11-30-2023

How to Cite

Nakra, R., & Kimes, R. (2023). Evaluating Baseball Statistics by Predicting Playoff Teams. Journal of Student Research, 12(4). https://doi.org/10.47611/jsrhs.v12i4.5763

Issue

Section

HS Research Projects