A Meta-Analysis Evaluating the Performance of Machine Learning Models on Probability of Loan Default


  • Ely Hahami The Lawrenceville School
  • Mr. Piper The Lawrenceville School




Machine Learning, Statistical Analysis, Mortage Lending


There has been a recent increase in the implementation of machine learning algorithms to predict the credit risk of prospective loan applicants. This meta-analysis aims to contribute to the small but growing research on the effects of algorithmic lending. Specifically, we compare the performance of the Logistic Regression (LR) model and Random Forest (RF) model in predicting loan default (PD). Using the area under the receiver operating characteristic curve as a measure of aggregate machine learning model performance, we ultimately find convincing evidence that the RF model is more accurate than the logit model in PD (p-value=0.029, α = 0.01). These results have major implications for banks and financial firms as mortgage lending transitions into the FinTech era. 


Download data is not yet available.

Author Biography

Mr. Piper, The Lawrenceville School


References or Bibliography

Agrawal, A., J. Gans & Goldfarb, 2018, Prediction machines: the simple economics of artificial intelligence (Boston, MA: Harvard BusinessBusiness Review Press).

Chen, W., & Samuelson, F. W. (2014). The average receiver operating characteristic curve in multireader multicase imaging studies. The British Journal of Radiology, 87(1040), 20140016. https://doi.org/10.1259/bjr.20140016

Classification: Roc curve and auc | machine learning crash course. (n.d.). Google Developers. Retrieved March 21, 2022, from https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc

Fuster, A., Goldsmith, Pinkham, P., Ramadorai, T., & Walther, A. (2022). Predictably unequal? The effects of machine learning on credit markets. The Journal of Finance, 77(1), 5–47. https://doi.org/10.1111/jofi.13090

Lee, M. S. A., & Floridi, L. (2021). Algorithmic fairness in mortgage lending: From absolute conditions to relational trade-offs. Minds and Machines, 31(1), 165–191. https://doi.org/10.1007/s11023-020-09529-4

Steil, J. P., Albright, L., Rugh, J. S., & Massey, D. S. (2018). The social structure of mortgage discrimination. Housing Studies, 33(5), 759–776. https://doi.org/10.1080/02673037.2017.1390076

Zhang, Q. (2015). Modeling the probability of mortgage default via logistic regression and survival analysis. Open Access Master’s Theses. https://doi.org/10.23860/thesis-zhang-qingfen-2015

Zhu, L., Qiu, D., Ergu, D., Ying, C., & Liu, K. (2019). A study on predicting loan default based on the random forest algorithm. Procedia Computer Science, 162, 503–513. https://doi.org/10.1016/j.procs.2019.12.017



How to Cite

Hahami, E., & Piper, D. (2022). A Meta-Analysis Evaluating the Performance of Machine Learning Models on Probability of Loan Default. Journal of Student Research, 11(2). https://doi.org/10.47611/jsrhs.v11i2.2726



HS Research Articles