Machine Learning for Policy Guidance

Authors

DOI:

https://doi.org/10.47611/jsrhs.v11i3.3597

Keywords:

Machine Learning, Artificial Intelligence, Preprocessing, Feature Selection, Model Selection, Model Interpretation, Linear Regression, Ridge Regression, Bayesian Ridge, Decision Tree, Random Forest, Supply-side Policy, Government Policy, Economics

Abstract

This paper leverages machine learning algorithms and techniques to create models that can assist in a country's policy guidance. The machine learning process used to conduct research is discussed with steps such as preprocessing, feature selection, model selection, and model interpretation. Specifically, using datasets from the CIA's World Factbook and the United Nations' Human Development Index (HDI), machine learning models are created that use select features from several counties (e.g., real gross domestic product (GDP), population, and area). Then, the models make predictions on the countries' HDI scores. Model interpretation methods are used to find the most important features in predicting a country's score. This paper argues that important features can be derived through machine learning and guide government policy relevant to human development. Supply-side policies are discussed based on the results from the machine learning models. The use of machine learning with other indexes is also explored.

Downloads

Download data is not yet available.

Author Biography

Timothy Raines, Inglemoor High School

Social Studies and Language Arts Teacher, Department Head

References or Bibliography

1. Cross-validation: Evaluating estimator performance. (n.d.). [User Guide]. Scikit-Learn. Retrieved August 29, 2022, from https://scikit-learn/stable/modules/cross_validation.html

2. Permutation feature importance. (n.d.). [User Guide]. Scikit-Learn. Retrieved August 29, 2022, from https://scikit-learn/stable/modules/permutation_importance.html

Ardila, D., Kiraly, A. P., Bharadwaj, S., Choi, B., Reicher, J. J., Peng, L., Tse, D., Etemadi, M., Ye, W., Corrado, G., Naidich, D. P., & Shetty, S. (2019). End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nature Medicine, 25(6), 954–961. https://doi.org/10.1038/s41591-019-0447-x

Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324

Brownlee, J. (2020a, May 26). How to Scale Data With Outliers for Machine Learning. Machine Learning Mastery. https://machinelearningmastery.com/robust-scaler-transforms-for-machine-learning/

Brownlee, J. (2020b, July 30). How to Configure k-Fold Cross-Validation. Machine Learning Mastery. https://machinelearningmastery.com/how-to-configure-k-fold-cross-validation/

Central Intelligence Agency. (2022, August 18). The World Factbook. https://www.cia.gov/the-world-factbook/

Dhaduk, H. (2021, June 25). EDA | Exploratory Data Analysis With Python | What is EDA. https://www.analyticsvidhya.com/blog/2021/06/eda-exploratory-data-analysis-with-python/#h2_3

Elliott, T. (2019, January 24). The State of the Octoverse: Machine learning. The GitHub Blog. https://github.blog/2019-01-24-the-state-of-the-octoverse-machine-learning/

Gonfalonieri, A. (2019, May 17). 5 Ways to Deal with the Lack of Data in Machine Learning. KDnuggets. https://www.kdnuggets.com/5-ways-to-deal-with-the-lack-of-data-in-machine-learning.html/

Happy Planet Index – How happy Is the planet. (2021). https://happyplanetindex.org/

Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del Río, J. F., Wiebe, M., Peterson, P., … Oliphant, T. E. (2020). Array programming with NumPy. Nature, 585(7825), 357–362. https://doi.org/10.1038/s41586-020-2649-2

Helliwell, J. F., Layard, R., Sachs, J. D., Neve, J.-E. D., Aknin, L. B., & Wang, S. (2022). World Happiness Report 2022. https://worldhappiness.report/ed/2022/

Kapoor, A., & Debroy, B. (2019, October 4). GDP Is Not a Measure of Human Well-Being. Harvard Business Review. https://hbr.org/2019/10/gdp-is-not-a-measure-of-human-well-being

Khanna, C. (2020, December 5). Multicollinearity—Why is it bad? Medium. https://towardsdatascience.com/multicollinearity-why-is-it-bad-5335030651bf

Koehrsen, W. (2018, April 20). Introduction to Bayesian Linear Regression. Medium. https://towardsdatascience.com/introduction-to-bayesian-linear-regression-e66e60791ea7

McKinney, W. (2010). Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in Science Conference, 56–61. https://doi.org/10.25080/Majora-92bf1922-00a

Ng, A. (2022). Machine Learning Specialization. Coursera. https://www.coursera.org/specializations/machine-learning-introduction

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(85), 2825–2830.

Pettinger, T. (2017, October 8). The importance of supply-side policies. Economics Help. https://www.economicshelp.org/blog/31/supply-side/supply-side-policies/

Pettinger, T. (2019, October 30). Supply Side Policies. Economics Help. https://www.economicshelp.org/macroeconomics/economic-growth/supply-side-policies/

Radečić, D. (2022, March 21). Data Scaling for Machine Learning—The Essential Guide. Medium. https://towardsdatascience.com/data-scaling-for-machine-learning-the-essential-guide-d6cfda3e3d6b

Seabold, S., & Perktold, J. (2010). Statsmodels: Econometric and Statistical Modeling with Python. Proceedings of the 9th Python in Science Conference, 92–96. https://doi.org/10.25080/Majora-92bf1922-011

Singh Chauhan, N. (2022, February 9). Decision Tree Algorithm, Explained. KDnuggets. https://www.kdnuggets.com/decision-tree-algorithm-explained.html/

Swalin, A. (2018, July 10). Choosing the Right Metric for Evaluating Machine Learning Models—Part 1. USF-Data Science. https://medium.com/usf-msds/choosing-the-right-metric-for-machine-learning-models-part-1-a99d7d7414e4

Tavares, E. (2017, March 8). Variance Inflation Factor (VIF) Explained—Python. https://etav.github.io/python/vif_factor_python.html

United Nations. (2020). Human Development Report 2020. In Human Development Reports. United Nations. https://hdr.undp.org/content/human-development-report-2020

Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., Choi, D. H., Powell, R., Ewalds, T., Georgiev, P., Oh, J., Horgan, D., Kroiss, M., Danihelka, I., Huang, A., Sifre, L., Cai, T., Agapiou, J. P., Jaderberg, M., … Silver, D. (2019). Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782), 350–354. https://doi.org/10.1038/s41586-019-1724-z

Published

08-31-2022

How to Cite

Chae, J., & Raines, T. . (2022). Machine Learning for Policy Guidance. Journal of Student Research, 11(3). https://doi.org/10.47611/jsrhs.v11i3.3597

Issue

Section

HS Research Articles