Using Machine Learning to determine the most important features in exoplanet verification


  • Ved Srivathsa The International School, Bangalore
  • Rida Assaf University of Chicago



Exoplanet, Machine Learning, Random Forests, Feature Importance, kepler


Over a decade ago, NASA launched the Kepler Space Telescope in order to find earth-like planets revolving around sun-like stars in the hopes of finding habitable exoplanets. The Kepler pipeline picked up data for over 9000 astronomical bodies, out of which 52% were determined to be false positives, while the remaining 48% were candidates to be classified as exoplanets. The data collected from this mission can be used to assess and automatically classify Kepler Objects of Interest (KOIs) as exoplanets or false positives. Our goal in this work is to determine if some data features are more important than others in classifying an object as an exoplanet. To this end, we built 5 Machine Learning classification models (namely, logistic regression, support vector classifier, gradient boosting classifier, random forest classifier, and multilayer perceptron) and used 15 features to train and test them. We have included Machine Learning models that are explainable to help attain our goal, Our best predictor (random forests) achieved a prediction accuracy of 99% when evaluated with k-fold cross-validation. We evaluated the feature importances of our model and found that 5 of the features (Not Transit-like Flag, Centroid Offset Flag, Stellar Eclipse Flag, Ephemeris Match Indicate Contamination Flag, and Planetary Radius) out of the 15 selected ones make up roughly 75% of the overall feature importances. We hope that our findings can guide the selection of appropriate data to accurately predict exoplanet candidacy for future missions.


Download data is not yet available.

Author Biography

Rida Assaf, University of Chicago

I am mentoring a number of students on research projects that involve machine learning applications in a variety of domains (finance, medicine, law, education, and multi-agent reinforcement learning).

Throughout the program, I am working to provide the students with the skills necessary for their machine learning applications, backed by enough theory to deepen their understanding and guide their practice. I am also guiding them through a research project experience and the process of writing academic research papers.

References or Bibliography

- Goldilocks Zone. (2021, March 4). Exoplanet Exploration: Planets Beyond Our Solar System.

- Koch, D. G., Borucki, W., Dunham, E., Geary, J., Gilliland, R., Jenkins, J., ... & Weiss, M. (2004, October). Overview and status of the Kepler Mission. In Optical, Infrared, and Millimeter Space Telescopes (Vol. 5487, pp. 1491-1500). International Society for Optics and Photonics.

- Kepler Exoplanet Search Results. (2017, October 10). [Dataset].

- Kepler Objects of Interest. (2017–2018). [Dataset].

- Armstrong, D. J., Gamper, J., & Damoulas, T. (2021). Exoplanet validation with machine learning: 50 new validated Kepler planets. Monthly Notices of the Royal Astronomical Society, 504(4), 5327-5344

- Q1-Q17 DR25 TCE. (2017–2018). [Dataset].

- Shallue, C. J., & Vanderburg, A. (2018). Identifying exoplanets with deep learning: A five-planet resonant chain around kepler-80 and an eighth planet around kepler-90. The Astronomical Journal, 155(2), 94.

- Home. (n.d.). MAST.

- Thompson, S. E., Mullally, F., Coughlin, J., Christiansen, J. L., Henze, C. E., Haas, M. R., & Burke, C. J. (2015). A machine learning technique to identify transit shaped signals. The Astrophysical Journal, 812(1), 46.

- Kepler Objects of Interest (KOI) Activity Tables. (n.d.). NASA.

- Malik, A., Moster, B. P., & Obermeier, C. (2020). Exoplanet Detection using Machine Learning. arXiv preprint arXiv:2011.14135.

- McCauliff, S. D., Jenkins, J. M., Catanzarite, J., Burke, C. J., Coughlin, J. L., Twicken, J. D., ... & Cote, M. (2015). Automatic classification of Kepler planetary transit candidates. The Astrophysical Journal, 806(1), 6.

- Koch, D. G., Borucki, W. J., Basri, G., Batalha, N. M., Brown, T. M., Caldwell, D., ... & Wu, H. (2010). Kepler mission design, realized photometric performance, and early science. The Astrophysical Journal Letters, 713(2), L79.

- scikit-learn: machine learning in Python — scikit-learn 1.0 documentation. (n.d.). Scikit-Learn.

- Z. (2021, September 26). Logistic Regression Explained - Towards Data Science. Medium.

- Support Vector Machines: A Simple Explanation. (n.d.). KDnuggets.

- Hoare, J. (2020, December 8). Gradient Boosting Explained - The Coolest Kid on The Machine Learning Block. Displayr.

- Donges, N. (2021, September 17). A Complete Guide to the Random Forest Algorithm. Built In.

- Multilayer Perceptron - an overview | ScienceDirect Topics. (n.d.). Multilayer Perceptron.

-Brownlee, J. (2020, August 2). A Gentle Introduction to k-fold Cross-Validation. Machine Learning Mastery.

- Schmelzer, R. (2019, July 24). Understanding Explainable AI. Forbes.

- Data columns in Kepler Objects of Interest Table. (n.d.). NASA Exoplanet Archive.

- NASA. (n.d.). TESS - Transiting Exoplanet Survey Satellite.



How to Cite

Srivathsa, V., & Assaf, R. (2023). Using Machine Learning to determine the most important features in exoplanet verification. Journal of Student Research, 11(3).



Honors Research Articles