Evaluating Machine Learning Models on Predicting Change in Enzyme Thermostability

Authors

  • Avnith Vijayram Thomas Jefferson High School for Science and Technology
  • Jacklyn Luu Inspirit AI

DOI:

https://doi.org/10.47611/jsrhs.v12i2.4364

Keywords:

enzyme, thermostability, artificial intelligence, machine learning

Abstract

Enzymes are efficient catalysts for biological reactions and can potentially be designed to speed up non-biological reactions, such as reactions in industrial processes. However, physically experimenting with new protein designs is time consuming, and an efficient method to predict protein stability is needed. Our research problem is finding the best machine learning model to predict the change in enzyme thermostability after a single point mutation in the amino acid sequence. We trained several machine learning models and found that the XGBoost model had the best performance with an R2 score of 0.593 (R2 score is a metric where higher is better and a perfect model would have a score of 1).

Downloads

Download data is not yet available.

References or Bibliography

Beheshti, N. (2022, March 2). Random Forest Regression. Towards Data Science. Retrieved February 26, 2023, from https://towardsdatascience.com/random-forest-regression-5f605132d19d

Deotte, C. (2022, September). How to use Kaggle's train data. Kaggle. Retrieved February 26, 2023, from https://www.kaggle.com/competitions/novozymes-enzyme-stability-prediction/discussion/358320

Engelberger, F., Galaz-davison, P., Bravo, G., Rivera, M., & Ramírez-sarmiento, C. A. (2021). Developing and implementing cloud-based tutorials that combine bioinformatics software, interactive coding, and visualization exercises for distance learning on structural bioinformatics. Journal of Chemical Education, 98(5), 1801-1807. https://doi.org/10.1021/acs.jchemed.1c00022

Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., Žídek, A., Potapenko, A., Bridgland, A., Meyer, C., Kohl, S. A. A., Ballard, A. J., Cowie, A., Romera-paredes, B., Nikolov, S., Jain, R., Adler, J., . . . Silver, D. (2021). Highly accurate protein structure prediction with Alphafold. Nature, 596(7873), 583-589. https://doi.org/10.1038/s41586-021-03819-2

Martins, D. (2021, May 14). XGBoost: A complete guide to fine-tune and optimize your model. Towards Data Science. Retrieved February 26, 2023, from https://towardsdatascience.com/xgboost-fine-tune-and-optimize-your-model-23d996fab663

Mousavi, S. M., Hashemi, S. A., Iman moezzi, S. M., Ravan, N., Gholami, A., Lai, C. W., Chiang, W.-H., Omidifar, N., Yousefi, K., & Behbudi, G. (2021). Recent advances in enzymes for the bioremediation of pollutants. Biochemistry Research International, 2021, 1-12. https://doi.org/10.1155%2F2021%2F5599204

Novozymes Enzyme Stability Prediction. (2022). Retrieved from https://kaggle.com/competitions/novozymes-enzyme-stability-prediction

Coefficient of Determination - R2 score. (2023, January 10). GeeksforGeeks. Retrieved February 26, 2023, from https://www.geeksforgeeks.org/python-coefficient-of-determination-r2-score/

Types of Neural Network algorithms in Machine Learning. (2022, September 27). Omdena. Retrieved February 26, 2023, from https://omdena.com/blog/types-of-neural-network-algorithms-in-machine-learning

XGBoost. (2023, February 6). GeeksforGeeks. Retrieved February 26, 2023, from https://www.geeksforgeeks.org/xgboost/

Published

05-31-2023

How to Cite

Vijayram, A., & Luu, J. (2023). Evaluating Machine Learning Models on Predicting Change in Enzyme Thermostability. Journal of Student Research, 12(2). https://doi.org/10.47611/jsrhs.v12i2.4364

Issue

Section

HS Research Projects