Diagnosing Breast Cancer Using a Novel Dual-Layered Random Forest with Null Handling

Authors

  • Aarav Sharma Archbishop Mitty High School
  • Thuy-Anh Nguyen Archbishop Mitty High School

DOI:

https://doi.org/10.47611/jsrhs.v11i3.3273

Keywords:

Breast cancer, Random Forest, Artificial Intelligence, Null Handling, Machine Learning

Abstract

The purpose of this project was to determine if I could develop an early and accurate model of breast cancer detection that can decrease the mortality rate of women by using a novel dual-layered Random Forest with Null Handling. Mammograms have an accuracy of about 86.9% and are susceptible to False negatives, and False positives. In order for my model to be trained and tested, the Wisconsin Data for Breast Cancer was accumulated and duplicated. In the duplicated data, random values were deleted. The first random forest is then trained on x% of the processed data. The next random forest is trained on the output of the previous random forest and the processed data. It acted to fine-tune results from the previous model. Lastly, the majority of the votes from the individual random forests led to the cancer prediction. I found out that dual-layered random forests with null values in their training data had an accuracy of 94.4%, which is 7% higher than human accuracy. This model also overcame overfitting. All our dual-layered models or models trained with appended null data worked better than human detection and could be built and tested in under 7 seconds with an easy-to-use interface, allowing for results in the same visit to the hospital. The best model had the first layer of 200 trees, the second layer of 800 trees, and accuracy of over 94% compared to humans with 86.9%. This model is fast, accurate, and can save people’s lives.

Downloads

Download data is not yet available.

Author Biography

Thuy-Anh Nguyen, Archbishop Mitty High School

Advisor Science Research 

References or Bibliography

Breast Cancer Surveillance Consortium. “sensitivity, specificity, and false negative rate for 1,682,504 screening mammography examinations from 2007 - 2013.” BCSC, National Cancer Institute, 31 Dec. 2014, www.bcsc-research.org/statistics/screening-performance-benchmarks/screening-sens-spec-false-negative .

Howley, Elaine K, and Anna M Miller. “False Positives, False Negatives in Breast Cancer.” U.S. News & World Report, U.S. News & World Report, 18 Apr. 2019, health.usnews.com/health-care/patient-advice/articles/2017-04-13/false-positives-false-negatives-in-breast-cancer.

Horev, Rani. “BERT Explained: State of the Art Language Model for NLP.” Medium, Towards Data Science, 17 Nov. 2018, towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270.

Koehrsen, Will. “Random Forest Simple Explanation.” Medium, Medium, 27 Dec. 2017, medium.com/@williamkoehrsen/random-forest-simple-explanation-377895a60d2d.

mc.ai. “Overfitting and Underfitting ( BUG IN ML MODELS ).” Mc.ai, Deep Learning on Medium, 7 Jan. 2020, mc.ai/overfitting-and-underfitting-bug-in-ml-models.

Morris, Elizabeth et al. “Implications of Overdiagnosis: Impact on Screening Mammography Practices.” Population health management vol. 18 Suppl 1, Suppl 1 (2015): S3-11. doi: 10.1089/pop.2015.29023.mor

Silipo, Rosaria. “From a Single Decision Tree to a Random Forest.” Medium, Towards Data Science, 8 Oct. 2019, towardsdatascience.com/from-a-single-decision-tree-to-a-random-forest-b9523be65147.

Study.com. STUDY.COM, Study.com, 1 Apr. 2020, study.com/mammography_training.html.

UCI. “Breast Cancer Wisconsin (Diagnostic) Data Set.” Kaggle, UCI Machine Learning, 25 Sept. 2016, www.kaggle.com/uciml/breast-cancer-wisconsin-data.

World Health Organization. “Breast Cancer.” World Health Organization, World Health Organization, 12 Sept. 2018, www.who.int/cancer/prevention/diagnosis-screening/breast-cancer/en/.

Nahid, Abdullah-Al, and Yinan Kong. “Involvement of Machine Learning for Breast Cancer Image Classification: A Survey.” Computational and Mathematical Methods in Medicine, Hindawi, 31 Dec. 2017, www.hindawi.com/journals/cmmm/2017/3781951/ .

January 01, 2020 | By Marla Paul. Artificial Intelligence Improves Breast Cancer Detection on Mammograms in Early Research, news.northwestern.edu/stories/2020/01/ai-breast-cancer/.

“AI Could Help Radiologists Interpret Mammograms More Accurately.” Stanford School of Engineering, 9 Sept. 2019, engineering.stanford.edu/magazine/article/ai-could-help-radiologists-interpret-mammograms-more-accurately.

Dietsche, Erin, et al. “Doc.ai Is Creating Robo-Doctors That Can Converse with Patients (Updated).” MedCity News, 17 Dec. 2018, medcitynews.com/2017/08/doc-ai/.

Demaitre, Eugene, et al. “COVID-19 Pandemic Prompts More Robot Usage Worldwide.” The Robot Report, 11 June 2020, www.therobotreport.com/covid-19-pandemic-prompts-more-robot-usage-worldwide/ .

“AI to the Rescue: Robot Nurses Deliver Medicines to COVID-19 Patients in Tamil Nadu.” News18, News18, www.news18.com/news/buzz/ai-to-the-rescue-robot-nurses-deliver-medicines-to-covid-19-patients-in-tamil-nadu-2563569.html .

Dickson, Ben. “How AI Can Determine Which Coronavirus Patients Require Hospitalization.” Neural | The Next Web, 3 Apr. 2020, thenextweb.com/neural/2020/04/02/ai-can-help-manage-hospital-resources-during-the-coronavirus-crisis-syndication/ .

“How AI Is Transforming the Future of Healthcare.” Corporate, www.internationalsos.com/client-magazines/in-this-issue-3/how-ai-is-transforming-the-future-of-healthcare .

Patrick GALEY, AFP. “AI Is Now Officially Better at Diagnosing Breast Cancer Than Human Experts.” ScienceAlert, www.sciencealert.com/ai-is-now-officially-better-at-diagnosing-breast-cancer-than-human-experts .

Published

08-31-2022

How to Cite

Sharma, A., & Nguyen, T.-A. (2022). Diagnosing Breast Cancer Using a Novel Dual-Layered Random Forest with Null Handling . Journal of Student Research, 11(3). https://doi.org/10.47611/jsrhs.v11i3.3273

Issue

Section

HS Research Articles