Prediction of Chronic Graft vs. Host Disease Using Machine Learning

Authors

  • Sanay Bordia Archbishop Mitty High School
  • Professor Ramezani UCLA Professor

DOI:

https://doi.org/10.47611/jsrhs.v11i3.2910

Keywords:

Machine Learning, GVHD, chronic

Abstract

This paper attempts to predict the onset of chronic Graft vs. Host Disease (GVHD) in children with blood cancers who have received a bone marrow or stem cell transplant using machine learning models. It analyzes and compares the results of three different models in terms of how accurate they each are in predicting chronic GVHD. These models are Logistic Regression, J48 algorithm using decision trees, and Multilayer Perceptron. The models are formed using a dataset containing 36 attributes, excluding chronic GVHD itself. Through data preprocessing and analysis in Weka, these 36 attributes are narrowed down for each model to figure out which combination of attributes leads to the best predictive accuracy. The study uses 10-fold cross validation for each model and uses the Receiver Operating Characteristic (ROC) Area as a measure of the accuracy for each model. The study found that Multilayer Perceptron is the best predictor of chronic GVHD. In comparison, Logistic Regression was the worst predictor of chronic GVHD. The J48 algorithm used the least number of attributes to make its prediction.

Downloads

Download data is not yet available.

References or Bibliography

Tompa, Rachel. “Life with graft-vs.-host disease: When the transplant is just the beginning.” Fred Hutch, 21 April 2015, Life with graft-vs.-host disease: When the transplant is just the beginning (fredhutch.org)

Bone marrow transplant: children. (2020). UCI Machine Learning Repository, UCI Machine Learning Repository

Eibe Frank, Mark A. Hall, and Ian H. Witten (2016). The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition, 2016.

Brownlee, Jason. “How to Calculate Precision, Recall, and F-Measure for Imbalanced Classification

.” Machine Learning Mastery, 3 January 2020, How to Calculate Precision, Recall, and F-Measure for Imbalanced Classification (machinelearningmastery.com)

Ekelund, Suzanne. “ROC curves – what are they and how are they used?” acutecaretesting, January 2011, ROC curves – what are they and how are they used? (acutecaretesting.org)

Brownlee, Jason. “A Gentle Introduction to k-fold Cross-Validation.” Machine Learning Mastery, 3 August 2020, A Gentle Introduction to k-fold Cross-Validation (machinelearningmastery.com)

“What is Logistic Regression?” Statistics Solutions, What is Logistic Regression? - Statistics Solutions

Ketha, Santhosh. “Effect of outliers on Neural Network’s performance.” Medium, 29 October 2019, Effect of outliers on Neural Network’s performance | by Santhosh Ketha | Analytics Vidhya | Medium

“J48 Classifier Parameters.” The Schank Academy, j48_parameters.pdf (schankacademy.com)

”Multilayer Perceptron.” Science Direct, Multilayer Perceptron - an overview | ScienceDirect Topics

“The Backpropagation Algorithm-PART(1): MLP and Sigmoid.” ML-DAWN, The Backpropagation Algorithm-PART(1): MLP and Sigmoid | ML-DAWN (mldawn.com)

Chakraborty, Arunava. “Derivative of the Sigmoid function.” Towards Data Science, 7 July 2018, Derivative of the Sigmoid function | by Arc | Towards Data Science

McMullin, Lin. “Differentiability Implies Continuity.” Teaching Calculus, 17 September 2019, Differentiability Implies Continuity | Teaching Calculus

Zach. “What is Considered a Good AUC Score?” Statology, 9 September 2021, What is Considered a Good AUC Score? - Statology

Raschka, Sebastian. “What is the relation between Logistic Regression and Neural Networks and when to use which?”, What is the relation between Logistic Regression and Neural Networks and when to use which? (sebastianraschka.com)

“Decision Tree Advantages and Disadvantages.” eduCBA, Decision Tree Advantages and Disadvantages | Decision Tree Regressor (educba.com)

Pidala, J., Sarwal, M., Roedder, S. et al. Biologic markers of chronic GVHD. Bone Marrow Transplant 49, 324–331 (2014). https://doi.org/10.1038/bmt.2013.97

Published

08-31-2022

How to Cite

Bordia, S., & Ramezani, R. (2022). Prediction of Chronic Graft vs. Host Disease Using Machine Learning. Journal of Student Research, 11(3). https://doi.org/10.47611/jsrhs.v11i3.2910

Issue

Section

HS Research Articles