An Intelligent System for Early Prediction of Cardiovascular Disease using Machine Learning

Authors

  • Aarush Kachhawa Saint Francis High School, Mountain View, CA, USA
  • Jeremy Hitt Graduate Assistant, University of Pennsylvania, PhD candidate (Chemistry)

DOI:

https://doi.org/10.47611/jsrhs.v11i3.2989

Keywords:

Cardiovascular Disease, Machine Learrning, Classification Models

Abstract

Cardiovascular disease (CVD) remains the leading cause of death, responsible for 18.6 million deaths globally in 2019. Given the wide availability of several effective therapeutic treatment options, early diagnosis of CVD is critical for timely intervention and slowing down the progression of the disease. CVD is associated with a multitude of risk markers with non-linear interactions among them, making accurate diagnosis of CVD quite challenging, especially for non-specialized clinicians and under-resourced facilities in developing countries. In recent years, machine learning based computational techniques have shown great promise in becoming a great diagnostic tool. The goal of this research is to leverage multiple machine learning methods such as random forest, gradient boosting, logistic regression and artificial neural network and evaluate their prediction efficacy. This study also evaluates the feasibility of combining multiple UCI datasets in order to improve the prediction accuracy of the models. On a merged dataset of over 700 patients from the UCI machine learning repository, the most accurate model was found to be the random forest classifier, showing an accuracy and F1 score of 94% and AUC of 0.98. It was found that ensemble learning methodologies along with data optimization and hyperparameter tuning techniques were able to achieve higher accuracy relative to prior published studies on these datasets. Finally, this study also proposes how these machine learning workloads can be incorporated into a distributed cloud connected healthcare system to make them widely accessible to practicing doctors and enable them to assess CVD risk of their patients.

Downloads

Download data is not yet available.

Author Biography

Jeremy Hitt, Graduate Assistant, University of Pennsylvania, PhD candidate (Chemistry)

I am a graduate assistant at The University of Pennsylvania pursuing a PhD in chemistry. My research focuses on using electroanalytical techniques, high-throughput screening, digital image processing, machine learning and materials characterization to study alloy electrocatalysts. The aim of my research is to discover new catalysts to produce carbon neutral fuels from CO2 in gas phase electrolyzers and new alloy catalysts for alkaline fuel cells. I also work on developing novel catalyst supports for alkaline, hydrogen fuel cells as part of a Department of Energy EFRC.

In addition to my research, I work with a group of scientists from the National Science Foundation's JUAMI program to develop hardware and software for an open source potentiostat and spread knowledge about energy science to African researchers.

References or Bibliography

2021 Heart Disease and Stroke statistics update fact sheet at-a-glance. (n.d.). Retrieved June 1, 2022, from https://www.heart.org/-/media/phd-files-2/science-news/2/2021-heart-and-stroke-stat-update/2021_heart_disease_and_stroke_statistics_update_fact_sheet_at_a_glance.pdf?la=en

Machine learning: What it is and why it matters. SAS. (n.d.). Retrieved May 31, 2022, from https://www.sas.com/en_us/insights/analytics/machine-learning.html

Nasteski, V. (2017). An overview of the supervised machine learning methods. HORIZONS.B, 4, 51-62. https://doi.org/10.20544/horizons.b.04.1.17.p05

Diabetes prediction using support Vector Machines. Sisense. (2022, March 18). Retrieved May 31, 2022, from https://www.sisense.com/blog/diabetes-prediction-using-support-vector-machines/

What is logistic regression? Master's in Data Science. (n.d.). Retrieved May 31, 2022, from https://www.mastersindatascience.org/learning/introduction-to-machine-learning-algorithms/logistic-regression/

Yıldırım, S. (2020, February 17). Gradient boosted decision trees-explained. Medium. Retrieved May 31, 2022, from https://towardsdatascience.com/gradient-boosted-decision-trees-explained-9259bd8205af

Brownlee, J. (2020, December 2). Bagging and Random Forest Ensemble algorithms for Machine Learning. Machine Learning Mastery. Retrieved May 31, 2022, from https://machinelearningmastery.com/bagging-and-random-forest-ensemble-algorithms-for-machine-learning/

Bhoyar, S., Wagholikar, N., Bakshi, K., & Chaudhari, S. (2021). Real-time heart disease prediction system using Multilayer Perceptron. 2021 2nd International Conference for Emerging Technology (INCET). https://doi.org/10.1109/incet51464.2021.9456389

Whisker plot. Whisker Plot - an overview | ScienceDirect Topics. (n.d.). Retrieved May 31, 2022, from https://www.sciencedirect.com/topics/mathematics/whisker-plot

Pal, M., & Parija, S. (2021). Prediction of heart diseases using Random Forest. Journal of Physics: Conference Series, 1817(1), 012009. https://doi.org/10.1088/1742-6596/1817/1/012009

UCI Machine Learning Repository: Heart disease data set. (n.d.). Retrieved May 31, 2022, from https://archive.ics.uci.edu/ml/datasets/heart+disease

Singh, A., & Kumar, R. (2020). Heart disease prediction using machine learning algorithms. 2020 International Conference on Electrical and Electronics Engineering (ICE3). https://doi.org/10.1109/ice348803.2020.9122958

Mishra, A. (2020, May 28). Metrics to evaluate your machine learning algorithm. Medium. Retrieved May 31, 2022, from https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234

UCI Machine Learning Repository: Statlog (heart) data set. (n.d.). Retrieved May 31, 2022, from https://archive.ics.uci.edu/ml/datasets/statlog+(heart)

Published

08-31-2022

How to Cite

Kachhawa, A., & Hitt, J. (2022). An Intelligent System for Early Prediction of Cardiovascular Disease using Machine Learning. Journal of Student Research, 11(3). https://doi.org/10.47611/jsrhs.v11i3.2989

Issue

Section

HS Research Articles