Professional Documents
Culture Documents
Paper Draft Mtech
Paper Draft Mtech
Abstract— Ovarian Cancer (OC) is one of the most widely recognized kinds of disease in women. Ovarian cancer in the cancer which starts
in the ovary or at the finish of fallopian tubes next to the ovary. Most of the patients with ovarian cancer have advanced illness at the hour of
diagnosis on the grounds that beginning phase tumors are regularly asymptomatic bringing out less fortunate long-term endurance. In order
to tackle this problem, machine learning can help in analyzing better care and low-cost health care. In this research, we are investigating the
feasibility of employing the machine learning algorithms say SVM, Random Forest, Logistic Regression, KNN and Decision Tree for
identifying the malignant or benign type of ovarian cancer found in women.
In the dataset, out of 349 records 10% that is 35 records were for each of the machine learning algorithms and the mean score
considered as testing data and the rest 314 data samples were of these 10 folds was taken for the evaluation metrics considered.
considered as training data. At first, we run the five machine The considered evaluation metrics can be calculated as follows.
learning algorithms on the pre-processed data without any feature
selection methods applied on the dataset and obtained the results. Accuracy Score = (TP+TN)/(TP+TN+FP+FN)
This time all the 49 features in the dataset were considered while
training the model. Precision Score = TP/(TP+FP)
Three most commonly used evaluation metrics namely accuracy
score, precision score, recall score and F1 score were considered Recall Score = TP/(TP+FN)
in order to evaluate and compare the models.
A cross fold validation with number of folds = 10 was performed F1 Score = 2. (Precision.Recall)/(Precision+Recall)
Next a feature subset selection namely PCA was applied on the these 10 best features and the respective values of evaluation
processed dataset. The 10 best PCA features were selected and metrics were obtained.
then the dataset was tested with all the five algorithms with only
At last, TSNE a feature subset selection method was applied on only these 3 selected best features and the respective values of
the processed dataset. The best 3 TSNE features were selected evaluation metrics were obtained.
and then the dataset was tested with all the five algorithms with
REFERENCES
[1] R.G. Moore, M. Jabre-Raughley, A.K. Brown, K.M. microbial features and tumor marker levels as potential
Robison, M.C. Miller, W.J. Allard, R.J. Kurman, R.C. diagnostic tools for ovarian cancer, PLOS ONE 15 (1)
Bast, S.J. Skates, Comparison of a novel multiple marker (2020) e0227707.
assay vs the Risk of Malignancy Index for the prediction [11] L.H. Kim, J.L. Quon, T.A. Cage, M.B. Lee, L. Pham, H.
of epithelial ovarian cancer in patients with a pelvic mass, Singh, Mortality prediction and long-term outcomes for
American Journal of Obstetrics and Gynecology 203 (3) civilian cerebral gunshot wounds: A decision-tree
(2010) 228.e221-228.e226. algorithm based on a single trauma center, Journal of
[2] J. Wang, J. Gao, H. Yao, Z. Wu, M. Wang, J. Qi, Clinical Neuroscience 75 (2020) 71–79.
Diagnostic accuracy of serum HE4, CA125 and ROMA [12] C.-S. Rau, S.-C. Wu, P.-C. Chien, P.-J. Kuo, Y.-C. Chen,
in patients with ovarian cancer: a meta-analysis, Tumor H.-Y. Hsieh, H.-Y. Hsieh, Prediction of Mortality in
Biology 35 (6) (2014) 6127–6138. Patients with Isolated Traumatic Subarachnoid
[3] A. Lukanova, R. Kaaks, Endogenous hormones and Hemorrhage Using a Decision Tree Classifier: A
ovarian cancer: epidemiology and current hypotheses, Retrospective Analysis Based on a Trauma Registry
Cancer epidemiology, biomarkers & prevention : a System, International Journal of Environmental Research
publication of the American Association for Cancer and Public Health 14 (11) (2017) 1420.
Research, cosponsored by the American Society of [13] R. Sumbaly, N. Vishnusri, S. Jeyalatha, Diagnosis of
Preventive Oncology 14 (1) (2005) 98–107. Breast Cancer using Decision Tree Data Mining
[4] S.M. Ho, Estrogen, progesterone and epithelial ovarian Technique, International Journal of Computer
cancer, Reproductive biology and endocrinology : RB&E Applications 98 (2014) 16–24.
1 (2003) 73.
[5] P. Zhang, C. Wang, L. Cheng, P. Zhang, L. Guo, W. Liu,
Z. Zhang, Y. Huang, Q. Ou, X. Wen, et al., Development
of a multi-marker model combining HE4, CA125,
progesterone, and estradiol for distinguishing benign
from malignant pelvic masses in postmenopausal women,
Tumor Biology 37 (2) (2016) 2183–2191.
[6] L.J. Havrilesky, C.M. Whitehead, J.M. Rubatt, R.L.
Cheek, J. Groelke, Q. He, D.P. Malinowski, T.J. Fischer,
A. Berchuck, Evaluation of biomarker panels for early
stage ovarian cancer detection and monitoring for disease
recurrence, Gynecol Oncol 110 (3) (2008) 374–382.
[7] I.H. Witten, E. Frank, M.A. Hall, Data Mining: Practical
Machine Learning Tools and Techniques, Morgan
Kaufmann Publishers Inc., 2011.
[8] P.-N. Tan, M. Steinbach, Kumar V, Introduction to Data
Mining, (First Edition), Addison-Wesley Longman
Publishing Co. Inc., 2005.
[9] M.D. Ganggayah, N.A. Taib, Y.C. Har, P. Lio, S.K.
Dhillon, Predicting factors for survival of breast cancer
patients using machine learning techniques, BMC
Medical Informatics and Decision Making 19 (1) (2019)
48.
[10] R. Miao, T.C. Badger, K. Groesch, P.L. Diaz-Sylvester,
T. Wilson, A. Ghareeb, J.A. Martin, M. Cregger, M.
Welge, C. Bushell, et al., Assessment of peritoneal