How Can Machine Learning Determine Whether a Women's Tennis Player Will Make it to Top 100?


  • Saina Deshpande Vikhe Patil Memorial School
  • Vanessa Klotzman University of California, Irvine



Age, AI, Bayes' Theorem, Factors, Height, Machine Learning, Naïve Bayes, Nationality, Pearson’s Correlation Coefficient, Ranking, Tennis, Top 100


There are a  lot of speculations within and outside of the tennis community about whether factors like height, age, and nationality play a role in the success of a tennis player. For this study, ‘success’ is defined as making it to the Top 100 ranked list. There have been studies in the past associating height of a tennis player with success, but this has primarily been done for men’s tennis players. In this study, we not only establish the relation between height and success of women tennis players but also consider two additional factors: age and nationality. We also mathematically conclude using Pearson’s correlation coefficient whether there is any statistical correlation between these three factors and success. Once we establish the relationship, we develop an AI model to predict future successful players based on historical tennis data. Since some of the earlier studies have already considered height as one of the success factors, our machine learning model uses Naïve Bayes’ to determine the probability of success using all three factors to predict success with an accuracy of 0.67 for dataset used. The individual Pearson correlation coefficients for height and age with success, demonstrating the applicability of factors in identifying a player’s potential for success are 0.23 and 0.19 respectively. Further research can be conducted by using more factors or larger dataset and could foster greater understanding of female success in tennis.

Keywords: Age, AI, Bayes’ Theorem, Factors, Height, Machine learning, Naïve Bayes, Nationality, Pearson’s Correlation Coefficient, Ranking, Tennis, Top 100


Download data is not yet available.

Author Biography

Vanessa Klotzman, University of California, Irvine


References or Bibliography

Bosscher, V. D., Knop, P. D. and Heyndels, P. (2004). Comparing tennis success among countries. JCMS Journal of Common Market studies, 25(1), 49-68, Retrieved from publication/239844205_Comparing_Tennis_Success_Among_Countries

Burns, E. (2021). Retrieved from searchenterpriseai/definition/machine-learning-ML

Gallo-Salazar, C., Salinero, J. J., Sanz, D., Areces, F. and Coso, J. D. (2015). Professional tennis is getting older: Age for the top 100 ranked tennis players. International Journal of Performance Analysis in Sport, 15(3), Retrieved from doi:

Glen, S. (n.d). Correlation Coefficient: Simple Definition, Formula, Easy Steps. Statistics How To, Retrieved from

Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in science & amp; engineering, 9(3), 90-95, Retrieved from doi:10.1109/MCSE.2007.55 DOI: doi:10.1109/MCSE.2007.55

Joyce, J. (2008). Bayes’ Theorem.,. Retrieved from rec/JOYBT

Li, P., Weissensteiner, J. R., Pion, J. and Bosscher, V. D. (2020). Predicting elite success: Evidence comparing the career pathways of top 10 to 300 professional tennis players. International Journal of Sports Science & Coaching, 15(5-6). DOI: doi:10.1177/1747954120935828

Mckinney, W. (2010). Data structures for statistical computing in python. Proceedings of the 9th Python in Science Conference, 51-56, Retrieved from 10.25080/Majora-92bf1922-00a

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M. and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of machine learning research, 12(October), 2825-2830, Retrieved from pedregosa11a.pdf

Ramamonjisoa, S. (2020). How height matters in professional tennis?. Retrieved from

Scikit-Learn. (2007). 1.9. Naive Bayes.,. Retrieved from

Sharma, P. (2021). Implementation of Gaussian Naïve Bayes in Python Sklearn.,. Retrieved from 11/implementation-of-gaussian-naive-bayes-in-python-sklearn/

Sipko, M. (2015). Machine Learning for the Prediction of Professional Tennis Matches. , Retrieved from distinguished-projects/2015/m.sipko.pdf

Virtanen, P., Gommers, R. and Oliphant, T. E. (2020). SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat Methods, 17, 261-272. DOI:

Waskom, M., Botvinnik, O., O'kane, D., Hobson, P., Lukauskas, S., Gemperline, D. C., Augspurger, T., Halchenko, Y., Cole, J. B., Warmenhoven, J., Ruiter, J. D., Pye, C., Hoyer, S., Vanderplas, J., Villalba, S., Kunter, G., Quintero, E., Bachant, P., Martin, M., … Qalieh, A. (2017). Seaborn: Statistical Data Visualization. Journal of Open Source Software, 6(60), Retrieved from zenodo.883859 doi: doi:10.21105/joss.03021

Women’s Tennis Association (n.d), available: [Accessed 21 Oct 2021]

Women’s Tennis Association (n.d), Active WTA Players, available:

Wood, R. (2016) Height of Wimbledon Players Over Time, Topend Sports Website, available:



How to Cite

Deshpande, S., & Klotzman, V. (2022). How Can Machine Learning Determine Whether a Women’s Tennis Player Will Make it to Top 100?. Journal of Student Research, 11(2).



HS Research Projects