Deep Neural Network Classifier for Alzheimer’s Disease

Omics biomarker prediction for early and quantitative Alzheimer's Disease diagnosis

Authors

DOI:

https://doi.org/10.47611/jsrhs.v11i3.3553

Keywords:

Alzheimer's Disease, Deep Neural Network, Machine Learning, Omics Datasets, Alzheimer's Disease Diagnosis, Gene Expression

Abstract

Alzheimer's disease (AD) is a neurodegenerative disease characterized by dementia and, eventually, a loss of cognitive abilities. Two histopathological features are associated with AD, neurofibrillary tangles, and amyloid-beta plaque. Both contribute to neuron cell death, neuron dysfunction, and AD pathogenesis. Current methods to diagnose AD remain reliant on symptomatic diagnosis with interviews that can be time-consuming, costly, and inaccurate. Alternative methods such as brain imaging are expensive and require extensive laboratory setup for accurate results. Thus molecular-level quantitative approaches are necessary. Omics datasets and machine learning technology advancements have opened new avenues to diagnose AD. This paper proposes using statistical methods such as principal component analysis, t-distributed stochastic neighbor embedding, and Kolmogorov-Smirnov test combined with Benjamini-Hochberg correction through feature selection and dimensionality reduction to isolate significant features associated with AD. Furthermore, we developed machine learning models based on logistic regression, random forest classifier, and deep neural network (DNN) classifier to predict AD diagnosis. Eight unique genes (TGM2, NKIRAS1, SYK, GABARAPL2, ABCC12, NDEL1, TEP1) were identified as significant biomarkers of AD and confirmed previous works identifying prognoses' roles in AD. After extensive hyperparameter tuning, the DNN model showed the best prediction performance for AD diagnosis among the three machine learning algorithms. The DNN model and preprocessed dataset demonstrated a 5-fold cross-validation accuracy of 0.823 and AUC-ROC of 0.940. Its code is publicly available at https://www.kaggle.com/neobrando/ml-dnn.

Downloads

Download data is not yet available.

Author Biographies

Dr. Hayan Lee, Fox Chase Cancer Center

Advisor

Computational postdoctoral scholar at Snyder Lab, Genetics Department, Stanford University, CA

Assistant Professor, Epigenetics Institute,  Fox Chase Cancer Center, PA 

Dr. Michael Snyder, Stanford University

Advisor

Chair at the Stanford Department of Genetics and Director of Stanford Center for Genomics and Personalized Medicine

References or Bibliography

Alzheimer's Association. (2021). 2021 Alzheimer’s disease facts and figures. Alzheimer’s & Dementia, 17(3). https://doi.org/10.1002/alz.12328

Area-Gomez, E., del Carmen Lara Castillo, M., Tambini, M. D., Guardia-Laguarta, C., de Groof, A. J. C., Madra, M., Ikenouchi, J., Umeda, M., Bird, T. D., Sturley, S. L., & Schon, E. A. (2012). Upregulated function of mitochondria-associated ER membranes in Alzheimer disease. The EMBO Journal, 31(21), 4106–4123. https://doi.org/10.1038/emboj.2012.202

Aykac, A., & Sehirli, A. Ö. (2021). The Function and Expression of ATP-Binding Cassette Transporters Proteins in the Alzheimer’s Disease. Global Medical Genetics, 08(04), 149–155. https://doi.org/10.1055/s-0041-1735541

Barkved, K. (2022, March 9). How To Know if Your Machine Learning Model Has Good Performance | Obviously AI. Www.obviously.ai. https://www.obviously.ai/post/machine-learning-model-performance#:~:text=But%20in%20our%20opinion%2C%20anything

Battineni, G., Chintalapudi, N., Amenta, F., & Traini, E. (2020). A Comprehensive Machine-Learning Model Applied to Magnetic Resonance Imaging (MRI) to Predict Alzheimer’s Disease (AD) in Older Subjects. Journal of Clinical Medicine, 9(7), 2146. https://doi.org/10.3390/jcm9072146

Bekris, L. M., Yu, C.-E., Bird, T. D., & Tsuang, D. W. (2010). Review Article: Genetics of Alzheimer Disease. Journal of Geriatric Psychiatry and Neurology, 23(4), 213–227. https://doi.org/10.1177/0891988710383571

Bellenguez, C., Küçükali, F., Jansen, I. E., Kleineidam, L., Moreno-Grau, S., Amin, N., Naj, A. C., Campos-Martin, R., Grenier-Boley, B., Andrade, V., Holmans, P. A., Boland, A., Damotte, V., van der Lee, S. J., Costa, M. R., Kuulasmaa, T., Yang, Q., de Rojas, I., Bis, J. C., & Yaqub, A. (2022). New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nature Genetics. https://doi.org/10.1038/s41588-022-01024-z

Braak, H., & Braak, E. (1997). Frequency of Stages of Alzheimer-Related Lesions in Different Age Categories. Neurobiology of Aging, 18(4), 351–357. https://doi.org/10.1016/s0197-4580(97)00056-0

Brickell, K. L., Steinbart, E. J., Rumbaugh, M., Payami, H., Schellenberg, G. D., Van Deerlin, V., Yuan, W., & Bird, T. D. (2006). Early-Onset Alzheimer Disease in Families With Late-Onset Alzheimer Disease. Archives of Neurology, 63(9), 1307. https://doi.org/10.1001/archneur.63.9.1307

Caberlotto, L., Nguyen, T.-P., Lauria, M., Priami, C., Rimondini, R., Maioli, S., Cedazo-Minguez, A., Sita, G., Morroni, F., Corsi, M., & Carboni, L. (2019). Cross-disease analysis of Alzheimer’s disease and type-2 Diabetes highlights the role of autophagy in the pathophysiology of two highly comorbid diseases. Scientific Reports, 9(1), 3965. https://doi.org/10.1038/s41598-019-39828-5

Campion, D., Dumanchin, C., Hannequin, D., Dubois, B., Belliard, S., Puel, M., Thomas-Anterion, C., Michon, A., Martin, C., Charbonnier, F., Raux, G., Camuzat, A., Penet, C., Mesnage, V., Martinez, M., Clerget-Darpoux, F., Brice, A., & Frebourg, T. (1999). Early-Onset Autosomal Dominant Alzheimer Disease: Prevalence, Genetic Heterogeneity, and Mutation Spectrum. The American Journal of Human Genetics, 65(3), 664–670. https://doi.org/10.1086/302553

Carrington, A. M., Manuel, D. G., Fieguth, P. W., Ramsay, T., Osmani, V., Wernly, B., Bennett, C., Hawken, S., McInnes, M., Magwood, O., Sheikh, Y., & Holzinger, A. (2022). Deep ROC Analysis and AUC as Balanced Average Accuracy to Improve Model Selection, Understanding and Interpretation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–1. https://doi.org/10.1109/TPAMI.2022.3145392

Carter, J., & Lippa, C. (2001). β-Amyloid, Neuronal Death and Alzheimers Disease. Current Molecular Medicine, 1(6), 733–737. https://doi.org/10.2174/1566524013363177

Couronné, R., Probst, P., & Boulesteix, A.-L. (2018). Random forest versus logistic regression: a large-scale benchmark experiment. BMC Bioinformatics, 19(1). https://doi.org/10.1186/s12859-018-2264-5

D’Eletto, M., Rossin, F., Occhigrossi, L., Farrace, M. G., Faccenda, D., Desai, R., Marchi, S., Refolo, G., Falasca, L., Antonioli, M., Ciccosanti, F., Fimia, G. M., Pinton, P., Campanella, M., & Piacentini, M. (2018). Transglutaminase Type 2 Regulates ER-Mitochondria Contact Sites by Interacting with GRP75. Cell Reports, 25(13), 3573-3581.e4. https://doi.org/10.1016/j.celrep.2018.11.094

Zhang B, Gaiteri C, Bodea LG, Wang Z et al. Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer's disease. Cell 2013 Apr 25;153(3):707-20. PMID: 23622250

Du, P., Zhang, X., Huang, C.-C., Jafari, N., Kibbe, W. A., Hou, L., & Lin, S. M. (2010). Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics, 11(1). https://doi.org/10.1186/1471-2105-11-587

Feng, Y., Li, X., Zhou, W., Lou, D., Huang, D., Li, Y., Kang, Y., Xiang, Y., Li, T., Zhou, W., & Song, W. (2017). Regulation of SET Gene Expression by NFkB. Molecular Neurobiology, 54(6), 4477–4485. https://doi.org/10.1007/s12035-016-9967-2

Genecard. (n.d.-a). FAM131A Gene - GeneCards | F131A Protein | F131A Antibody. Www.genecards.org. Retrieved August 3, 2022, from https://www.genecards.org/cgi-bin/carddisp.pl?gene=FAM131A#diseases

Genecard. (n.d.-b). FAM234B Gene - GeneCards | F234B Protein | F234B Antibody. Www.genecards.org. https://www.genecards.org/cgi-bin/carddisp.pl?gene=FAM234B&keywords=KIAA1467#diseases

Genecard. (n.d.-c). NKIRAS1 Gene - GeneCards | KBRS1 Protein | KBRS1 Antibody. Www.genecards.org. Retrieved August 3, 2022, from https://www.genecards.org/cgi-bin/carddisp.pl?gene=NKIRAS1

Ghosh, S., & Geahlen, R. L. (2015). Stress Granules Modulate SYK to Cause Microglial Cell Dysfunction in Alzheimer’s Disease. EBioMedicine, 2(11), 1785–1798. https://doi.org/10.1016/j.ebiom.2015.09.053

Goedert, M., & Spillantini, M. G. (2006). A century of Alzheimer’s disease. Science (New York, N.Y.), 314(5800), 777–781. https://doi.org/10.1126/science.1132814

Google. (2019). Classification: Accuracy | Machine Learning Crash Course. Google Developers. https://developers.google.com/machine-learning/crash-course/classification/accuracy

Hajian-Tilaki, K. (2013). Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation. Caspian Journal of Internal Medicine, 4(2), 627–635. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3755824/

Harrington, L., McPhail, T., Mar, V., Zhou, W., Oulton, R., Program, A. E., Bass, M. B., Arruda, I., & Robinson, M. O. (1997). A Mammalian Telomerase-Associated Protein. Science, 275(5302), 973–977. https://doi.org/10.1126/science.275.5302.973

Hasin, Y., Seldin, M., & Lusis, A. (2017). Multi-omics approaches to disease. Genome Biology, 18(1). https://doi.org/10.1186/s13059-017-1215-1

IBM. (n.d.). What is Logistic regression? | IBM. Www.ibm.com. https://www.ibm.com/topics/logistic-regression#:~:text=Logistic%20regression%20estimates%20the%20probability

Inza, I., Calvo, B., Armañanzas, R., Bengoetxea, E., Larrañaga, P., & Lozano, J. A. (2009). Machine Learning: An Indispensable Tool in Bioinformatics. Methods in Molecular Biology, 25–48. https://doi.org/10.1007/978-1-60327-194-3_2

Iwatsubo, T., Odaka, A., Suzuki, N., Mizusawa, H., Nukina, N., & Ihara, Y. (1994). Visualization of Aβ42(43) and Aβ40 in senile plaques with end-specific Aβ monoclonals: Evidence that an initially deposited species is Aβ42(43). Neuron, 13(1), 45–53. https://doi.org/10.1016/0896-6273(94)90458-8

J. D. Hunter, "Matplotlib: A 2D Graphics Environment," in Computing in Science & Engineering, vol. 9, no. 3, pp. 90-95, May-June 2007, doi: 10.1109/MCSE.2007.55.

Johnson, P., Vandewater, L., Wilson, W., Maruff, P., Savage, G., Graham, P., Macaulay, L. S., Ellis, K. A., Szoeke, C., Martins, R. N., Rowe, C. C., Masters, C. L., Ames, D., & Zhang, P. (2014). Genetic algorithm with logistic regression for prediction of progression to Alzheimer’s disease. BMC Bioinformatics, 15(Suppl 16), S11. https://doi.org/10.1186/1471-2105-15-s16-s11

Kaitlin, Smith, T., & Sadler, B. (2018). Random Forest vs Logistic Regression: Binary Classification for Heterogeneous Datasets. SMU Data Science Review, 1(3), 9. https://scholar.smu.edu/cgi/viewcontent.cgi?article=1041&context=datasciencereview#:~:text=variables%20exceeds%20the%20number%20of

Kobak, D., & Berens, P. (2019). The art of using t-SNE for single-cell transcriptomics. Nature Communications, 10(1). https://doi.org/10.1038/s41467-019-13056-x

Kong, Y., & Yu, T. (2018). A Deep Neural Network Model using Random Forest to Extract Feature Representation for Gene Expression Data Classification. Scientific Reports, 8(1). https://doi.org/10.1038/s41598-018-34833-6

Krohn, M., Lange, C., Hofrichter, J., Scheffler, K., Stenzel, J., Steffen, J., Schumacher, T., Brüning, T., Plath, A.-S., Alfen, F., Schmidt, A., Winter, F., Rateitschak, K., Wree, A., Gsponer, J., Walker, L. C., & Pahnke, J. (2011). Cerebral amyloid-β proteostasis is regulated by the membrane transport protein ABCC1 in mice. Journal of Clinical Investigation, 121(10), 3924–3931. https://doi.org/10.1172/jci57867

Lee, H. J., Jung, Y. H., Choi, G. E., Kim, J. S., Chae, C. W., Lim, J. R., Kim, S. Y., Yoon, J. H., Cho, J. H., Lee, S.-J., & Han, H. J. (2021). Urolithin A suppresses high glucose-induced neuronal amyloidogenesis by modulating TGM2-dependent ER-mitochondria contacts and calcium homeostasis. Cell Death & Differentiation, 28(1), 184–202. https://doi.org/10.1038/s41418-020-0593-1

Mark Schmidt, Nicolas Le Roux, Francis Bach. Minimizing Finite Sums with the Stochastic Average Gradient. Mathematical Programming, Springer Verlag, 2017, 162 (1-2), pp.83-112. ff10.1007/s10107- 016-1030-6ff. Ffhal-00860051v2f

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Rafal Jozefowicz, Yangqing Jia, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Mike Schuster, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.

Ma, S., & Dai, Y. (2011). Principal component analysis based methods in bioinformatics studies. Briefings in Bioinformatics, 12(6), 714–722. https://doi.org/10.1093/bib/bbq090

Narayanan M, Huynh JL, Wang K, Yang X et al. Common dysregulation network in the human prefrontal cortex underlies two neurodegenerative diseases. Mol Syst Biol 2014 Jul 30;10:743. PMID: 25080494

National Institute on Aging. (2017, May 16). What Happens to the Brain in Alzheimer’s Disease? National Institute on Aging. https://www.nia.nih.gov/health/what-happens-brain-alzheimers-disease#:~:text=These%20tangles%20block%20the%20neuron

National Institute on Aging. (2021, July 8). Alzheimer’s Disease Fact Sheet. National Institute on Aging. https://www.nia.nih.gov/health/alzheimers-disease-fact-sheet

Natoli, G. (2009). When Sirtuins and NF-κB Collide. Cell, 136(1), 19–21. https://doi.org/10.1016/j.cell.2008.12.034

Paris, D., Ait-Ghezala, G., Bachmeier, C., Laco, G., Beaulieu-Abdelahad, D., Lin, Y., Jin, C., Crawford, F., & Mullan, M. (2014). The Spleen Tyrosine Kinase (Syk) Regulates Alzheimer Amyloid-β Production and Tau Hyperphosphorylation. The Journal of Biological Chemistry, 289(49), 33927–33944. https://doi.org/10.1074/jbc.M114.608091

Park, C. (2021, March 20). DNN_for_ADprediction/dataset at master · ChihyunPark/DNN_for_ADprediction. GitHub. https://github.com/ChihyunPark/DNN_for_ADprediction/tree/master/dataset

Park, C., Ha, J., & Park, S. (2020). Prediction of Alzheimer’s disease based on deep neural network by integrating gene expression and DNA methylation dataset. Expert Systems with Applications, 140, 112873. https://doi.org/10.1016/j.eswa.2019.112873

Piller, C. (2022, July 21). Potential fabrication in research images threatens key theory of Alzheimer’s disease. Www.science.org. https://www.science.org/content/article/potential-fabrication-research-images-threatens-key-theory-alzheimers-disease

Pereira, C. D., Martins, F., Wiltfang, J., da Cruz e Silva, O. A. B., & Rebelo, S. (2017). ABC Transporters Are Key Players in Alzheimer’s Disease. Journal of Alzheimer’s Disease, 61(2), 463–485. https://doi.org/10.3233/jad-170639

Plotly. (n.d.). Plotly Python Graphing Library. Plotly.com. https://plotly.com/python/

Rogers, A., & Weiss, S. (2017). False Discovery Rate - an overview | ScienceDirect Topics. Www.sciencedirect.com. https://www.sciencedirect.com/topics/neuroscience/false-discovery-rate

Sancesario, G. M., & Bernardini, S. (2018). Alzheimer’s disease in the omics era. Clinical Biochemistry, 59, 9–16. https://doi.org/10.1016/j.clinbiochem.2018.06.011

Scipy. (n.d.). scipy.stats.ks_2samp — SciPy v1.9.0 Manual. Docs.scipy.org. Retrieved August 2, 2022, from https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html#scipy.stats.ks_2samp

Sklearn. (2014). sklearn.manifold.TSNE — scikit-learn 0.21.3 documentation. Scikit-Learn.org. https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html

Sklearn. (2018). 3.2.4.3.2. sklearn.ensemble.RandomForestRegressor — scikit-learn 0.20.3 documentation. Scikit-Learn.org. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

sklearn.decomposition.PCA — scikit-learn 0.20.3 documentation. (2009). Scikit-Learn.org. https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

Smith RG, Hannon E, De Jager PL, Chibnik L et al. Elevated DNA methylation across a 48-kb region spanning the HOXA gene cluster is associated with Alzheimer's disease neuropathology. Alzheimers Dement 2018 Dec;14(12):1580-1588. PMID: 29550519

Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian Optimization of Machine Learning Algorithms. Neural Information Processing Systems; Curran Associates, Inc. https://papers.nips.cc/paper/2012/hash/05311655a15b75fab86956663e1819cd-Abstract.html

Statsmodels. (2019). statsmodels.stats.multitest.fdrcorrection — statsmodels. Www.statsmodels.org. https://www.statsmodels.org/stable/generated/statsmodels.stats.multitest.fdrcorrection.html

Su, Q., Wang, Y., Jiang, X., Chen, F., & Lu, W. (2017). A Cancer Gene Selection Algorithm Based on the K-S Test and CFS. BioMed Research International, 2017, 1–6. https://doi.org/10.1155/2017/1645619

Tabe-Bordbar, S., Emad, A., Zhao, S. D., & Sinha, S. (2018). A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models. Scientific Reports, 8. https://doi.org/10.1038/s41598-018-24937-4

University of Southern California. (2006, February 7). Alzheimer’s Found To Be Mostly Genetic: Largest Twin Study Ever Undertaken Confirms Highest Estimates Of Genetic Risk. ScienceDaily. http://www.sciencedaily.com/releases/2006/02/060206232300.htm

van Driel, M. A., & Brunner, H. G. (2006). Bioinformatics methods for identifying candidate disease genes. Human Genomics, 2(6), 429. https://doi.org/10.1186/1479-7364-2-6-429

Vastrad, B., & Vastrad, C. (2021). Bioinformatics analyses of significant genes, related pathways and candidate prognostic biomarkers in Alzheimer’s disease. https://doi.org/10.1101/2021.05.06.442918

Weidberg, H., Shvets, E., & Elazar, Z. (2011). Biogenesis and Cargo Selectivity of Autophagosomes. Annual Review of Biochemistry, 80(1), 125–156. https://doi.org/10.1146/annurev-biochem-052709-094552

Wikgren, M. et al. APOE epsilon4 is associated with longer telomeres, and longer telomeres among epsilon4 carriers predicts worse episodic memory. Neurobiol. Aging (2010). doi:10.1016/j.neurobiolaging.2010.03.004

Wilhelm, Jochen. (2021). Re: Can logistic regression be used as the initial baseline or something to start with for any data classification system?. Retrieved from: https://www.researchgate.net/post/Can_logistic_regression_be_used_as_the_initial_baseline_or_something_to_start_with_for_any_data_classification_system/60fc0ec263ef9768526143fe/citation/download.

Yang, H., Wang, H., Shu, Y., & Li, X. (2018). miR-103 Promotes Neurite Outgrowth and Suppresses Cells Apoptosis by Targeting Prostaglandin-Endoperoxide Synthase 2 in Cellular Models of Alzheimer’s Disease. Frontiers in Cellular Neuroscience, 12, 91. https://doi.org/10.3389/fncel.2018.00091

Yu, W., Yu, W., Yang, Y., & Lü, Y. (2021). Exploring the Key Genes and Identification of Potential Diagnosis Biomarkers in Alzheimer’s Disease Using Bioinformatics Analysis. Frontiers in Aging Neuroscience, 13. https://doi.org/10.3389/fnagi.2021.602781

Zhang, X.-H., Jin, G.-H., Li, W., Wang, S.-S., Shan, B.-Q., Qin, J.-B., Zhao, H.-Y., Tian, M.-L., He, H., & Cheng, X. (2022). miR-103-3p targets Ndel1 to regulate neural stem cell proliferation and differentiation. Neural Regeneration Research, 17(2), 401. https://doi.org/10.4103/1673-5374.317987

Zhu, H., Fu, W. & Mattson, M. P. The Catalytic Subunit of Telomerase Protects Neurons Against Amyloid β-Peptide-Induced Apoptosis. J. Neurochem. 75, 117–124 (2001).Google Scholar

Published

08-31-2022

How to Cite

Lin, J., Lee, H., & Snyder, M. (2022). Deep Neural Network Classifier for Alzheimer’s Disease: Omics biomarker prediction for early and quantitative Alzheimer’s Disease diagnosis. Journal of Student Research, 11(3). https://doi.org/10.47611/jsrhs.v11i3.3553

Issue

Section

HS Research Articles