Identification of a Panel of Biomarkers for the Early Detection of Ovarian Cancer


  • Riya Davar Texas Academy of Mathematics & Science
  • Madhuri Yalamanchili, MD Oncologist in Binghamton, New York



Ovarian Cancer, RandomForest, mRMR, Heroku, ML Classifiers


According to the CDC, in the United States, Ovarian Cancer is the second most prevalent form of gynecologic cancer and is the fifth leading cause of mortality in women. The only reliable method to screen for this cancer is TVS (trans-vaginal sonography), which is both invasive and costly. The goal of this project was to use the mRMR (Maximum Relevance Minimum Redundancy) Feature Selection Algorithm to select a panel of biomarkers from the Ovarian Cancer dataset and create a non-invasive and inexpensive software tool that could help validate the panel and assist with the early detection of Ovarian Cancer, with a reasonable level of sensitivity.

This project uses an ovarian cancer dataset with 49 features. The mRMR filter method [9, 10, 12]of feature selection eliminates the redundant features while keeping the relevant features that impact the target class. This project accomplished the final goal of creating a working web application that asks a clinician to provide a few basic blood test results and generates a prediction. The machine learning model [7] used by the application is Random Forest Machine Learning model which is created with the K best features picked by the mRMR algorithm and is successfully utilized to predict the disease and treatment targets thus helping with reducing the mortality rate from ovarian cancer. 

This project used the Random Forest Classifier model machine learning model. It has been shown to work well with smaller datasets (as with this project’s dataset) and had a sensitivity score of 0.96.


Download data is not yet available.

Author Biography

Madhuri Yalamanchili, MD, Oncologist in Binghamton, New York

Advisor to the author Riya Davar

Dr. Madhuri Yalamanchili is an oncologist in Binghamton, New York, and is affiliated with multiple hospitals in the area, including Our Lady of Lourdes Memorial Hospital and United Health Services Hospitals-Binghamton. She received her medical degree from Siddhartha Medical College NTR and has been in practice between 11-20 years.

References or Bibliography

“1.17. Neural Network Models (supervised)." Scikit-learn, Accessed 17 Jan. 2022.

“12 Types of Neural Networks Activation Functions: How to Choose?" V7 - AI Data Platform

for ML Teams, 17 Jan. 2022,

“Advantages of Tree-Based Modeling." Summit | Quantitative Consulting and Data Analytics,

“Understanding the AUC-ROC Curve in Machine Learning Classification."

Analytics India Magazine, 7 Oct. 2021,


“Complete Guide on Model Deployment with Flask and Heroku." Medium, 1

Jan. 2022,

“How to Use StandardScaler and MinMaxScaler Transforms in Python." Machine Learning

Mastery, 27 Aug. 2020,

“Machine Learning: What It is and Why It Matters."

Malik, Farhad. "What Are Hidden Layers?" Medium, 20 May 2019,


Mazzanti, S. (2022, February 15). “MRMR” explained exactly how you wished someone explained to you. Medium.

Song, H., Yang, E., Kim, J., Park, C., Kyung, M., & Kim, Y. (2018). Best serum biomarker combination for ovarian cancer classification. BioMedical Engineering OnLine, 17(S2).

“Maximum Relevance and Minimum Redundancy Feature Selection Methods for a Marketing

Machine Learning Platform." E-Print Archive,

Mendeley Data,

Narkhede, Sarang. "Understanding AUC - ROC Curve." Medium, 15 June 2021,

“NN - Multi-layer Perceptron Classifier (MLPClassifier)." Michael Fuchs Python, 3 Feb. 2021,

“Ovarian Cancer - Symptoms and Causes." Mayo Clinic, 25 July 2019,

“Types and Stages.", 18 June 2021,



How to Cite

Davar, R., & Yalamanchili, M. (2022). Identification of a Panel of Biomarkers for the Early Detection of Ovarian Cancer. Journal of Student Research, 11(2).



HS Research Projects