Predicting Enzyme Commission Numbers Using Recurrent Neural Networks with Amino Acid Sequence Shift and Consistency Loss

Authors

  • Danny Paik Bergen County Academies
  • Giselle Gomes Bergen County Academies

DOI:

https://doi.org/10.47611/jsrhs.v14i1.8473

Keywords:

Enzyme Commission Number Prediction, Amino Acid Sequence, Classification

Abstract

Countless species of plants have been employed for the development of medications for diseases in all fields, ranging from viruses, cancer, diarrhea, and various skin diseases. However, the percentage of plant-based new molecular entity products compared to all types of medications has dropped by more than 50% since 1950, a major reason being the lack of information scientists and industries have for countless plant species. The majority of botanical medicines have been created off of plants that have already been known for their medicinal properties for centuries and have been utilized by specific cultures to treat illnesses. Therefore, the process of developing a plant-based medicine without much information about the plant can be incredibly expensive and time-consuming, leading to many industries avoiding doing so. Ergo, I proposed a method to predict the Enzyme Commission (EC) number of plant genomes, allowing scientists to gain an understanding of the functionality of the genes in various plant species. The amino acid sequences will be represented on a 2D matrix, and I proposed a method to shift the matrix multiple times followed by stacking these respective matrices so the convolutional neural network can analyze a larger part of the amino acid sequence. As the model will be predicting a sequence of numbers, I used a recurrent neural network to preserve the context in the sequence. Moreover, I proposed a method of adding a classifier for the prediction of the first digit of the EC number to decrease the sequential error.

Downloads

Download data is not yet available.

References or Bibliography

AI Hub. (2024, Sep 11). “Plant functionality prediction genomic data”: AI Hub.

https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=data&dataSetSn=71316

Arora, C., Verma, D. K., Aslam, J., & Mahish, P. K. (Eds.). (2023). Phytochemicals in Medicinal Plants: Biodiversity, Bioactivity and Drug Discovery. Walter de Gruyter GmbH & Co KG.

Han, S. R., Park, M., Kosaraju, S., Lee, J., Lee, H., Lee, J. H., ... & Kang, M. (2024). Evidential deep learning for trustworthy prediction of enzyme commission number. Briefings in Bioinformatics, 25(1), bbad401.

Kim, G. B., Kim, J. Y., Lee, J. A., Norsigian, C. J., Palsson, B. O., & Lee, S. Y. (2023). Functional annotation of enzyme-encoding genes using deep learning with transformer layers. Nature Communications, 14(1), 7370.

Kumar, A., P, N., Kumar, M., Jose, A., Tomer, V., Oz, E., ... & Oz, F. (2023). Major phytochemicals: recent advances in health benefits and extraction method. Molecules, 28(2), 887.

Mani, J. S., Johnson, J. B., Steel, J. C., Broszczak, D. A., Neilsen, P. M., Walsh, K. B., & Naiker, M. (2020). Natural product-derived phytochemicals as potential agents against coronaviruses: A review. Virus research, 284, 197989.

McDonald, A. G., & Tipton, K. F. (2023). Enzyme nomenclature and classification: The state of the art. The FEBS journal, 290(9), 2214-2231.

Robinson, P. K. (2015). Enzymes: principles and biotechnological applications. Essays in biochemistry, 59, 1.

Shara, M., & Stohs, S. J. (2015). Efficacy and safety of white willow bark (Salix alba) extracts. Phytotherapy Research, 29(8), 1112-1116.

Su, X. Z., & Miller, L. H. (2015). The discovery of artemisinin and the Nobel Prize in Physiology or Medicine.

Published

02-28-2025

How to Cite

Paik, D., & Gomes, G. (2025). Predicting Enzyme Commission Numbers Using Recurrent Neural Networks with Amino Acid Sequence Shift and Consistency Loss. Journal of Student Research, 14(1). https://doi.org/10.47611/jsrhs.v14i1.8473

Issue

Section

HS Research Articles