Deep Neural Network Classifier for Alzheimer’s Disease

Omics biomarker prediction for early and quantitative Alzheimer's Disease diagnosis




Alzheimer's Disease, Deep Neural Network, Machine Learning, Omics Datasets, Alzheimer's Disease Diagnosis, Gene Expression


Alzheimer's disease (AD) is a neurodegenerative disease characterized by dementia and, eventually, a loss of cognitive abilities. Two histopathological features are associated with AD, neurofibrillary tangles, and amyloid-beta plaque. Both contribute to neuron cell death, neuron dysfunction, and AD pathogenesis. Current methods to diagnose AD remain reliant on symptomatic diagnosis with interviews that can be time-consuming, costly, and inaccurate. Alternative methods such as brain imaging are expensive and require extensive laboratory setup for accurate results. Thus molecular-level quantitative approaches are necessary. Omics datasets and machine learning technology advancements have opened new avenues to diagnose AD. This paper proposes using statistical methods such as principal component analysis, t-distributed stochastic neighbor embedding, and Kolmogorov-Smirnov test combined with Benjamini-Hochberg correction through feature selection and dimensionality reduction to isolate significant features associated with AD. Furthermore, we developed machine learning models based on logistic regression, random forest classifier, and deep neural network (DNN) classifier to predict AD diagnosis. Eight unique genes (TGM2, NKIRAS1, SYK, GABARAPL2, ABCC12, NDEL1, TEP1) were identified as significant biomarkers of AD and confirmed previous works identifying prognoses' roles in AD. After extensive hyperparameter tuning, the DNN model showed the best prediction performance for AD diagnosis among the three machine learning algorithms. The DNN model and preprocessed dataset demonstrated a 5-fold cross-validation accuracy of 0.823 and AUC-ROC of 0.940. Its code is publicly available at


Download data is not yet available.

Author Biographies

Dr. Hayan Lee, Fox Chase Cancer Center


Computational postdoctoral scholar at Snyder Lab, Genetics Department, Stanford University, CA

Assistant Professor, Epigenetics Institute,  Fox Chase Cancer Center, PA 

Dr. Michael Snyder, Stanford University


Chair at the Stanford Department of Genetics and Director of Stanford Center for Genomics and Personalized Medicine

References or Bibliography

Alzheimer's Association. (2021). 2021 Alzheimer’s disease facts and figures. Alzheimer’s & Dementia, 17(3).

Area-Gomez, E., del Carmen Lara Castillo, M., Tambini, M. D., Guardia-Laguarta, C., de Groof, A. J. C., Madra, M., Ikenouchi, J., Umeda, M., Bird, T. D., Sturley, S. L., & Schon, E. A. (2012). Upregulated function of mitochondria-associated ER membranes in Alzheimer disease. The EMBO Journal, 31(21), 4106–4123.

Aykac, A., & Sehirli, A. Ö. (2021). The Function and Expression of ATP-Binding Cassette Transporters Proteins in the Alzheimer’s Disease. Global Medical Genetics, 08(04), 149–155.

Barkved, K. (2022, March 9). How To Know if Your Machine Learning Model Has Good Performance | Obviously AI.

Battineni, G., Chintalapudi, N., Amenta, F., & Traini, E. (2020). A Comprehensive Machine-Learning Model Applied to Magnetic Resonance Imaging (MRI) to Predict Alzheimer’s Disease (AD) in Older Subjects. Journal of Clinical Medicine, 9(7), 2146.

Bekris, L. M., Yu, C.-E., Bird, T. D., & Tsuang, D. W. (2010). Review Article: Genetics of Alzheimer Disease. Journal of Geriatric Psychiatry and Neurology, 23(4), 213–227.

Bellenguez, C., Küçükali, F., Jansen, I. E., Kleineidam, L., Moreno-Grau, S., Amin, N., Naj, A. C., Campos-Martin, R., Grenier-Boley, B., Andrade, V., Holmans, P. A., Boland, A., Damotte, V., van der Lee, S. J., Costa, M. R., Kuulasmaa, T., Yang, Q., de Rojas, I., Bis, J. C., & Yaqub, A. (2022). New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nature Genetics.

Braak, H., & Braak, E. (1997). Frequency of Stages of Alzheimer-Related Lesions in Different Age Categories. Neurobiology of Aging, 18(4), 351–357.

Brickell, K. L., Steinbart, E. J., Rumbaugh, M., Payami, H., Schellenberg, G. D., Van Deerlin, V., Yuan, W., & Bird, T. D. (2006). Early-Onset Alzheimer Disease in Families With Late-Onset Alzheimer Disease. Archives of Neurology, 63(9), 1307.

Caberlotto, L., Nguyen, T.-P., Lauria, M., Priami, C., Rimondini, R., Maioli, S., Cedazo-Minguez, A., Sita, G., Morroni, F., Corsi, M., & Carboni, L. (2019). Cross-disease analysis of Alzheimer’s disease and type-2 Diabetes highlights the role of autophagy in the pathophysiology of two highly comorbid diseases. Scientific Reports, 9(1), 3965.

Campion, D., Dumanchin, C., Hannequin, D., Dubois, B., Belliard, S., Puel, M., Thomas-Anterion, C., Michon, A., Martin, C., Charbonnier, F., Raux, G., Camuzat, A., Penet, C., Mesnage, V., Martinez, M., Clerget-Darpoux, F., Brice, A., & Frebourg, T. (1999). Early-Onset Autosomal Dominant Alzheimer Disease: Prevalence, Genetic Heterogeneity, and Mutation Spectrum. The American Journal of Human Genetics, 65(3), 664–670.

Carrington, A. M., Manuel, D. G., Fieguth, P. W., Ramsay, T., Osmani, V., Wernly, B., Bennett, C., Hawken, S., McInnes, M., Magwood, O., Sheikh, Y., & Holzinger, A. (2022). Deep ROC Analysis and AUC as Balanced Average Accuracy to Improve Model Selection, Understanding and Interpretation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1–1.

Carter, J., & Lippa, C. (2001). β-Amyloid, Neuronal Death and Alzheimers Disease. Current Molecular Medicine, 1(6), 733–737.

Couronné, R., Probst, P., & Boulesteix, A.-L. (2018). Random forest versus logistic regression: a large-scale benchmark experiment. BMC Bioinformatics, 19(1).

D’Eletto, M., Rossin, F., Occhigrossi, L., Farrace, M. G., Faccenda, D., Desai, R., Marchi, S., Refolo, G., Falasca, L., Antonioli, M., Ciccosanti, F., Fimia, G. M., Pinton, P., Campanella, M., & Piacentini, M. (2018). Transglutaminase Type 2 Regulates ER-Mitochondria Contact Sites by Interacting with GRP75. Cell Reports, 25(13), 3573-3581.e4.

Zhang B, Gaiteri C, Bodea LG, Wang Z et al. Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer's disease. Cell 2013 Apr 25;153(3):707-20. PMID: 23622250

Du, P., Zhang, X., Huang, C.-C., Jafari, N., Kibbe, W. A., Hou, L., & Lin, S. M. (2010). Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics, 11(1).

Feng, Y., Li, X., Zhou, W., Lou, D., Huang, D., Li, Y., Kang, Y., Xiang, Y., Li, T., Zhou, W., & Song, W. (2017). Regulation of SET Gene Expression by NFkB. Molecular Neurobiology, 54(6), 4477–4485.

Genecard. (n.d.-a). FAM131A Gene - GeneCards | F131A Protein | F131A Antibody. Retrieved August 3, 2022, from

Genecard. (n.d.-b). FAM234B Gene - GeneCards | F234B Protein | F234B Antibody.

Genecard. (n.d.-c). NKIRAS1 Gene - GeneCards | KBRS1 Protein | KBRS1 Antibody. Retrieved August 3, 2022, from

Ghosh, S., & Geahlen, R. L. (2015). Stress Granules Modulate SYK to Cause Microglial Cell Dysfunction in Alzheimer’s Disease. EBioMedicine, 2(11), 1785–1798.

Goedert, M., & Spillantini, M. G. (2006). A century of Alzheimer’s disease. Science (New York, N.Y.), 314(5800), 777–781.

Google. (2019). Classification: Accuracy | Machine Learning Crash Course. Google Developers.

Hajian-Tilaki, K. (2013). Receiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation. Caspian Journal of Internal Medicine, 4(2), 627–635.

Harrington, L., McPhail, T., Mar, V., Zhou, W., Oulton, R., Program, A. E., Bass, M. B., Arruda, I., & Robinson, M. O. (1997). A Mammalian Telomerase-Associated Protein. Science, 275(5302), 973–977.

Hasin, Y., Seldin, M., & Lusis, A. (2017). Multi-omics approaches to disease. Genome Biology, 18(1).

IBM. (n.d.). What is Logistic regression? | IBM.

Inza, I., Calvo, B., Armañanzas, R., Bengoetxea, E., Larrañaga, P., & Lozano, J. A. (2009). Machine Learning: An Indispensable Tool in Bioinformatics. Methods in Molecular Biology, 25–48.

Iwatsubo, T., Odaka, A., Suzuki, N., Mizusawa, H., Nukina, N., & Ihara, Y. (1994). Visualization of Aβ42(43) and Aβ40 in senile plaques with end-specific Aβ monoclonals: Evidence that an initially deposited species is Aβ42(43). Neuron, 13(1), 45–53.

J. D. Hunter, "Matplotlib: A 2D Graphics Environment," in Computing in Science & Engineering, vol. 9, no. 3, pp. 90-95, May-June 2007, doi: 10.1109/MCSE.2007.55.

Johnson, P., Vandewater, L., Wilson, W., Maruff, P., Savage, G., Graham, P., Macaulay, L. S., Ellis, K. A., Szoeke, C., Martins, R. N., Rowe, C. C., Masters, C. L., Ames, D., & Zhang, P. (2014). Genetic algorithm with logistic regression for prediction of progression to Alzheimer’s disease. BMC Bioinformatics, 15(Suppl 16), S11.

Kaitlin, Smith, T., & Sadler, B. (2018). Random Forest vs Logistic Regression: Binary Classification for Heterogeneous Datasets. SMU Data Science Review, 1(3), 9.

Kobak, D., & Berens, P. (2019). The art of using t-SNE for single-cell transcriptomics. Nature Communications, 10(1).

Kong, Y., & Yu, T. (2018). A Deep Neural Network Model using Random Forest to Extract Feature Representation for Gene Expression Data Classification. Scientific Reports, 8(1).

Krohn, M., Lange, C., Hofrichter, J., Scheffler, K., Stenzel, J., Steffen, J., Schumacher, T., Brüning, T., Plath, A.-S., Alfen, F., Schmidt, A., Winter, F., Rateitschak, K., Wree, A., Gsponer, J., Walker, L. C., & Pahnke, J. (2011). Cerebral amyloid-β proteostasis is regulated by the membrane transport protein ABCC1 in mice. Journal of Clinical Investigation, 121(10), 3924–3931.

Lee, H. J., Jung, Y. H., Choi, G. E., Kim, J. S., Chae, C. W., Lim, J. R., Kim, S. Y., Yoon, J. H., Cho, J. H., Lee, S.-J., & Han, H. J. (2021). Urolithin A suppresses high glucose-induced neuronal amyloidogenesis by modulating TGM2-dependent ER-mitochondria contacts and calcium homeostasis. Cell Death & Differentiation, 28(1), 184–202.

Mark Schmidt, Nicolas Le Roux, Francis Bach. Minimizing Finite Sums with the Stochastic Average Gradient. Mathematical Programming, Springer Verlag, 2017, 162 (1-2), pp.83-112. ff10.1007/s10107- 016-1030-6ff. Ffhal-00860051v2f

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Rafal Jozefowicz, Yangqing Jia, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Mike Schuster, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from

Ma, S., & Dai, Y. (2011). Principal component analysis based methods in bioinformatics studies. Briefings in Bioinformatics, 12(6), 714–722.

Narayanan M, Huynh JL, Wang K, Yang X et al. Common dysregulation network in the human prefrontal cortex underlies two neurodegenerative diseases. Mol Syst Biol 2014 Jul 30;10:743. PMID: 25080494

National Institute on Aging. (2017, May 16). What Happens to the Brain in Alzheimer’s Disease? National Institute on Aging.

National Institute on Aging. (2021, July 8). Alzheimer’s Disease Fact Sheet. National Institute on Aging.

Natoli, G. (2009). When Sirtuins and NF-κB Collide. Cell, 136(1), 19–21.

Paris, D., Ait-Ghezala, G., Bachmeier, C., Laco, G., Beaulieu-Abdelahad, D., Lin, Y., Jin, C., Crawford, F., & Mullan, M. (2014). The Spleen Tyrosine Kinase (Syk) Regulates Alzheimer Amyloid-β Production and Tau Hyperphosphorylation. The Journal of Biological Chemistry, 289(49), 33927–33944.

Park, C. (2021, March 20). DNN_for_ADprediction/dataset at master · ChihyunPark/DNN_for_ADprediction. GitHub.

Park, C., Ha, J., & Park, S. (2020). Prediction of Alzheimer’s disease based on deep neural network by integrating gene expression and DNA methylation dataset. Expert Systems with Applications, 140, 112873.

Piller, C. (2022, July 21). Potential fabrication in research images threatens key theory of Alzheimer’s disease.

Pereira, C. D., Martins, F., Wiltfang, J., da Cruz e Silva, O. A. B., & Rebelo, S. (2017). ABC Transporters Are Key Players in Alzheimer’s Disease. Journal of Alzheimer’s Disease, 61(2), 463–485.

Plotly. (n.d.). Plotly Python Graphing Library.

Rogers, A., & Weiss, S. (2017). False Discovery Rate - an overview | ScienceDirect Topics.

Sancesario, G. M., & Bernardini, S. (2018). Alzheimer’s disease in the omics era. Clinical Biochemistry, 59, 9–16.

Scipy. (n.d.). scipy.stats.ks_2samp — SciPy v1.9.0 Manual. Retrieved August 2, 2022, from

Sklearn. (2014). sklearn.manifold.TSNE — scikit-learn 0.21.3 documentation.

Sklearn. (2018). sklearn.ensemble.RandomForestRegressor — scikit-learn 0.20.3 documentation.

sklearn.decomposition.PCA — scikit-learn 0.20.3 documentation. (2009).

Smith RG, Hannon E, De Jager PL, Chibnik L et al. Elevated DNA methylation across a 48-kb region spanning the HOXA gene cluster is associated with Alzheimer's disease neuropathology. Alzheimers Dement 2018 Dec;14(12):1580-1588. PMID: 29550519

Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian Optimization of Machine Learning Algorithms. Neural Information Processing Systems; Curran Associates, Inc.

Statsmodels. (2019). statsmodels.stats.multitest.fdrcorrection — statsmodels.

Su, Q., Wang, Y., Jiang, X., Chen, F., & Lu, W. (2017). A Cancer Gene Selection Algorithm Based on the K-S Test and CFS. BioMed Research International, 2017, 1–6.

Tabe-Bordbar, S., Emad, A., Zhao, S. D., & Sinha, S. (2018). A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models. Scientific Reports, 8.

University of Southern California. (2006, February 7). Alzheimer’s Found To Be Mostly Genetic: Largest Twin Study Ever Undertaken Confirms Highest Estimates Of Genetic Risk. ScienceDaily.

van Driel, M. A., & Brunner, H. G. (2006). Bioinformatics methods for identifying candidate disease genes. Human Genomics, 2(6), 429.

Vastrad, B., & Vastrad, C. (2021). Bioinformatics analyses of significant genes, related pathways and candidate prognostic biomarkers in Alzheimer’s disease.

Weidberg, H., Shvets, E., & Elazar, Z. (2011). Biogenesis and Cargo Selectivity of Autophagosomes. Annual Review of Biochemistry, 80(1), 125–156.

Wikgren, M. et al. APOE epsilon4 is associated with longer telomeres, and longer telomeres among epsilon4 carriers predicts worse episodic memory. Neurobiol. Aging (2010). doi:10.1016/j.neurobiolaging.2010.03.004

Wilhelm, Jochen. (2021). Re: Can logistic regression be used as the initial baseline or something to start with for any data classification system?. Retrieved from:

Yang, H., Wang, H., Shu, Y., & Li, X. (2018). miR-103 Promotes Neurite Outgrowth and Suppresses Cells Apoptosis by Targeting Prostaglandin-Endoperoxide Synthase 2 in Cellular Models of Alzheimer’s Disease. Frontiers in Cellular Neuroscience, 12, 91.

Yu, W., Yu, W., Yang, Y., & Lü, Y. (2021). Exploring the Key Genes and Identification of Potential Diagnosis Biomarkers in Alzheimer’s Disease Using Bioinformatics Analysis. Frontiers in Aging Neuroscience, 13.

Zhang, X.-H., Jin, G.-H., Li, W., Wang, S.-S., Shan, B.-Q., Qin, J.-B., Zhao, H.-Y., Tian, M.-L., He, H., & Cheng, X. (2022). miR-103-3p targets Ndel1 to regulate neural stem cell proliferation and differentiation. Neural Regeneration Research, 17(2), 401.

Zhu, H., Fu, W. & Mattson, M. P. The Catalytic Subunit of Telomerase Protects Neurons Against Amyloid β-Peptide-Induced Apoptosis. J. Neurochem. 75, 117–124 (2001).Google Scholar



How to Cite

Lin, J., Lee, H., & Snyder, M. (2022). Deep Neural Network Classifier for Alzheimer’s Disease: Omics biomarker prediction for early and quantitative Alzheimer’s Disease diagnosis. Journal of Student Research, 11(3).



HS Research Articles