Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

2022 19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON) | 978-1-6654-8584-5/22/$31.

00 ©2022 IEEE | DOI: 10.1109/ECTI-CON54298.2022.9795429 2022 19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)

Early risk prediction of cervical cancer: A machine


learning approach
Ishrak Jahan Ratul, Abdullah Al-Monsur, Bushra Tabassum, Abrar Mohammad Ar-Rafi,
Mirza Muntasir Nishat and Fahim Faisal
Department of Electrical and Electronic Engineering
Islamic University of Technology
Dhaka, Bangladesh
Email: { ishrakjahan, al-monsur, bushratabassum, abrarmohammad, mirzamuntasir, faisaleee } @iut-dhaka.edu

Abstract— Cervical cancer is a vital public health issue that prior times [7]. In this regard, early identification of cervical
affects women worldwide. As it is a fatal disease, early risk cancer using lifestyle information can help save many lives
prediction of cervical cancer can play an important role in [8-10]. So, collecting sufficient data, analyzing them and
prevention by raising public awareness of this disease. Early finding the hidden pattern can execute a momentous input in
prediction using a Machine Learning (ML) model can be a this regard [11-14]. With the advancement of data science,
beneficial solution for both healthcare professionals and people
machine learning techniques are proven to be handy in
at risk. In this study, eleven supervised ML algorithms are
utilized to forecast early jeopardies of this disease using a performing such operations so that prompt detection and
dataset from UCI ML repository. The ML models are timely treatment can be ensured by the healthcare
rummaged to prophesy the early threats, and performance professionals [15-19].
parameters like accuracy, precision, F1-score, re-call, and
ROC-AUC are estimated. Finally, a reasonable analysis is
Many researches have contributed in developing such
performed, revealing that this study achieved 93.33% automated and computer aided diagnosis system [20-23]
prediction accuracy with Multi-Layer Perceptron (MLP) which will eventually reduce the screening time of the
algorithm with default hyperparameters. However, employing patients [24-27] and ease the overall diagnosis process [28-
the hyperparameter tuning method with Grid Search Cross 31]. Sobar et al. employed a classifier to envisage the
Validation (GSCV), K-Nearest Neighbors (KNN), Decision menace depending upon behavior. They used two standard
Tree Classifier (DTC), Support Vector Machine (SVM), techniques and found the maximum accuracy of 91.67%
Random Forest Classifier (RFC), and Multi-Layer Perceptron [32]. Kashyap et al. suggested a method using Pap smear
(MLP) all portrayed accuracy of 93.33%. images and categorizing using SVM algorithm and found an
Keywords—Cervical Cancer, UCI repository, Analytical accuracy of 95% [33]. However, Njoroge et al. employed a
Analysis, ML Techniques Pap smear test and a Fourier-Transform Infrared (FTIR)
spectroscopy-based classifier to reach an overall accuracy of
I. INTRODUCTION 72% [34]. On the other hand, Fazal et al. suggested a model
Cervical cancer refers to a malignancy that affects that used DBSCAN and isolation forest as outlier removers,
a woman's cervix which is considered as one of the most and random forest (RF) classifiers to categorize the data and
important health issues that affects millions of women maximum accuracy of 99.5% was obtained [35].
worldwide, particularly in developing countries. In Furthermore, Wu et al. used three SVM-based techniques to
accordance with WHO, over 89% of deaths are occurred due diagnose four targets, classify them and concluded that
to this in not so developing nations [1]. In 2012, about SVM-PCA performed better than other models [36]. Using a
445,000 cases were discovered and almost 83% of all were convolutional neural network and multiple machine learning
new cases [2]. Symptoms of cervical cancer include classifiers, Hyeon et al. developed and trained a model to
irregular periods, unexpected blood, and atypical classify the status of cervical cells from microscopic
menstruation. Hence, a pap smear test can diagnose cervical pictures. The best accuracy was found to be 89.7% [37]. In
cancer and has been shown to reduce death risk by almost this study, eleven supervised machine learning models have
90% and cervical cancer risk by 60% to 90% [3]. However, been employed, such as: DTC, MLP, RFC, KNN, SVM,
the absence of medicinal equipment, inadequate nurturing, CatBoost (CatB), Gaussian Naïve Bayes (GNB), Gradient
simple diagnostic reproducibility, careless maintenance, and Boosting Classifier (GradB), AdaBoost (AdaB), XG Boost
ennui on the part of the specialists delivering the exam (XGB), XG Boost with Random Forest (XGBRF) on a
owing to its droning behavior are main drawbacks of this dataset. The findings could have a noteworthy impact on
examination [4]. Moreover, Human Papilloma Virus (HPV), computer-assisted diagnosis or the development of an e-
is a cancer-causing virus that can be spread by bad lifestyle healthcare system.
choices. On the other hand, high-risk HPV infection exhibits
itself on the surface as condylomas that discharge II. METHODOLOGY
contagious virions [5]. Despite the fact that pap-smear The dataset was attained by dint of UCI ML repository
check and HPV inoculation have reduced the number, the and it contains information about cervical cancer risk
death knows no bounds [6]. According to statistics, half of behavior [38]. The dataset consists of 72 instances and 19
all cervical cancer cases in America occur due to not attributes, including one target column. All of the attributes
checking, with another 10% not having been tested in the are numerical type with no missing value.

978-1-6654-8584-5/22/$31.00 ©2022 IEEE


Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 15,2023 at 15:42:52 UTC from IEEE Xplore. Restrictions apply.
2022 19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)

At first, exploratory data analysis was performed on the


dataset, and a correlation heatmap was created and
portrayed in Fig. 1, defining the correlation between
attributes. The dataset's statistical information is shown in
Table I.

TABLE I. DIFFERENT ATTRIBUTES


Max- Standard
No. Numeric Attributes Mean
Min Deviation
1 behavior_sexualRisk 10-2 9.67 1.19
2 behavior_eating 15-3 12.79 2.36
3 behavior_personalHygine 15-3 11.08 3.03
4 intention_aggregation 10-2 7.90 2.74
5 attitude_consistency 15-6 13.35 2.37
6 intention_commitment 10-2 7.18 1.52
7 attitude_spontaneity 10-4 8.61 1.52
8 norm_significantPerson 5-1 3.13 1.85
9 norm_fulfillment 15-3 8.49 4.91
10 perception_vulnerability 15-3 8.51 4.28
11 perception_severity 10-2 5.39 3.40
12 motivation_strength 15-3 12.65 3.21
13 motivation_willingness 15-3 9.69 4.13
14 socialSupport_emotionality 15-3 8.09 4.24
15 SocialSupport_appreciation 10-2 6.16 2.90
16 socialSupport_instrumental 15-3 10.38 4.32
17 empowerment_knowledge 15-3 10.54 4.37
18 empowerment_abilities 15-3 9.32 4.18
19 empowerment_desires 15-3 10.28 4.48
20 ca_cervix 1-0 0.29 0.46

The dataset was divided into 80:20 ratios for training and
testing ML algorithms upon exploratory data analysis. This
study used two approaches: one used default
hyperparameters of ML algorithms, while the other one used
Grid Search Cross-Validation with a 10-fold method to tune Fig. 2 Algorithm of the work
hyperparameters. Eleven supervised machine learning
algorithms, such as DTC, GNB, RFC, KNN, SVM, CatB, III. RESULTS & DISCUSSION
MLP, GradB, AdaB, XGB, XGBRF are trained using train In the first approach, The ML classifiers are trained and
dataset. Then the performance metrics are evaluated for both tested using ML algorithms' default hyperparameters, and
the default hyperparameter method and the tuned the performance metrics are evaluated and presented in
hyperparameter approach using the test dataset. Fig. 2 Table III. The confusion matrices are represented in Table II
depicts the complete algorithm of the work. and ROC in Fig. 3. The MLP algorithms beat all other
algorithms in most performance metrics, as shown in Table
III. As can be seen, MLP portrays best accuracy (0.9333),
precision (0.9091), F1-score (0.9524), and re-call (1.000) of
any model. RFC, SVM, and XGBRF all maximize
precision, while GNB, CatB, and GradB all maximize recall.
Furthermore, using the GradB algorithm, the maximum
ROC_AUC was 0.9545.
In the second approach, the performance analyses were
evaluated after hyperparameter tuning which was performed
using a 10-fold GSCV method depicted in Table IV, and
ROC in Fig. 4. The DTC, RFC, KNN, and SVM
outperformed all ML models with accuracy (0.9333),
precision (0.9091); according to the performance metrics
shown in Table IV. Besides, XGBRF and GNB, GradB both
maximize precision and recall. The outcomes acquired in
the experiments are fairly satisfactory. As can be observed,
with the default hyperparameters, only one algorithm (MLP)
showed the best performance (accuracy). However when
hyperparameter tuning was performed using the GSCV
Fig. 1 Correlation heatmap of attributes technique, several algorithms (DTC, RFC, KNN, SVM,
MLP) outperformed the prior experiment. Furthermore, the
overall performance of ML algorithms was increased in the
second approach.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 15,2023 at 15:42:52 UTC from IEEE Xplore. Restrictions apply.
2022 19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)

This can be considered as a noteworthy finding from the


study, and it may help to achieve more accurate, efficient
and, trustworthy cervical cancer risk prediction model.
Thus, this study presents an analytical methodology to
demonstrate the presentation of diverse machine learning
classifiers in risk prediction so that healthcare professionals
can have a prior idea and prompt diagnosis.

Fig. 3 ROC (without hyperparameter tuning)

Fig. 3 Grid Search algorithm


TABLE II. CONFUSION MATRICES OF ML ALGORITHMS
Without Tuning With Tuning
Algorithms
TP TN FP FN TP TN FP FN
DTC 9 3 2 1 10 4 1 0
GNB 9 4 2 0 9 4 2 0
RFC 10 3 1 1 10 4 1 0
KNN 9 3 2 1 10 4 1 0
SVM 10 3 1 1 10 4 1 0
CatB 9 4 2 0 9 3 2 1
MLP 10 4 1 0 10 4 1 0
GradB 8 4 3 0 8 4 3 0
AdaB 9 3 2 1 9 3 2 1
XGB 9 3 2 1 9 3 2 1 Fig. 4 ROC (with hyperparameter tuning)
XGBRF 10 3 1 1 10 3 1 1 Hence, a smart and intelligent support system can be
TABLE III. ANALYSIS OF ML ALGORITHMS (WITHOUT TUNING) developed and effective healthcare management system can
be launched so people from all spheres of life can have
Accuracy Precision F-1 Re-call ROC-AUC
DTC .800 .818 .857 .900 .784
proper treatment of cancer.
GNB .867 .818 .900 1.000 .909
RFC .867 .909 .909 .909 .829 IV. CONCLUSION
KNN .800 .818 .857 .900 .784 Cervical cancer is utmost perilous risk for females
SVM .867 .909 .909 .909 .829 worldwide. Early detection or risk prediction can help
CatB .867 .818 .900 1.000 .909
MLP .933 .909 .952 1.000 .864
reduce the number of deaths caused by this disease.
GradB .800 .727 .842 1.000 .954 Numerous data are collected and evaluated to build a
AdaB .800 .818 .857 .900 .784 reliable prediction model using ML algorithms. This
XGB .800 .818 .857 .900 .784 research compares the performance of eleven supervised
XGBRF .867 .909 .909 .909 .829 ML algorithms in envisaging the menace of cervical cancer.
TABLE IV. ANALYSIS OF ML ALGORITHMS (WITH TUNING) To improve the performance of classifiers, hyperparameter
tuning was performed using GSCV, and the maximum
Accuracy Precision F-1 Re-call ROC-AUC
DTC .9333 .9091 .952 1.0 .9545 accuracy found in this study is 93.33 % by DTC, RFC,
GNB .8667 .8182 .900 1.0 .9091 KNN, SVM, and MLP algorithms. The foremost finding of
RFC .9333 .9091 .952 1.0 .9545 this work is the better prediction accuracy and consistency
KNN .9333 .9091 .952 1.0 .9545 which may aid in the development and implementation of
SVM .9333 .9091 .952 1.0 .9545 computer-aided diagnosis and serve as an efficient tool for
CatB .8000 .8182 .857 .9 .7841
MLP .9333 .9091 .952 1.0 .8636
healthcare practitioners. However, this should be thoroughly
GradB .8000 .7273 .842 1.0 .9545 tested before being used in a clinical setting. Many future
AdaB .8000 .8182 .857 .9 .7841 developments in this area can be accomplished by gathering
XGB .8000 .8182 .857 .9 .7841 more data which can contribute significantly to the
XGBRF .8667 .9091 .909 .9 .8295 establishment of an e-healthcare system.

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 15,2023 at 15:42:52 UTC from IEEE Xplore. Restrictions apply.
2022 19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON)

REFERENCES
[1] “Cervical cancer.” https://www.who.int/news-room/fact- Classifiers by Hyperparameter Optimization in Detecting Anxiety
sheets/detail/cervical-cancer (accessed Feb. 19, 2022). Levels of Online Gamers," 2021 24th Int. Conf. on Computer and
[2] WHO, “Comprehensive Cervical Cancer Control,” Geneva, pp. 366– Information Technology (ICCIT), 2021, pp. 1-5, doi:
378, 2014. 10.1109/ICCIT54785.2021.9689911
[3] M. Safaeian, et al., “Cervical Cancer Prevention-Cervical Screening: [22] T. Hasan et al., "Exploring the Performances of Stacking Classifier in
Science in Evolution,” Obstet. Gynecol. Clin. North Am., 34(4), pp. Predicting Patients Having Stroke," 2021 8th NAFOSTED Conference
739–760, 2007, doi: 10.1016/j.ogc.2007.09.004. on Information and Computer Science (NICS), pp. 242-247, 2021,
doi: 10.1109/NICS54270.2021.9701526.
[4] N. Colombo, et al. “Cervical cancer: ESMO clinical practice
guidelines for diagnosis, treatment and follow-up,” Ann. Oncol., vol. [23] M. M. Nishat, T. Hasan, S. M. Nasrullah, F. Faisal, M. A. A. R. Asif,
23, no. SUPPL. 7, 2012, doi: 10.1093/annonc/mds268. and M. A. Hoque, "Detection of Parkinson's Disease by Employing
Boosting Algorithms," 2021 Joint 10th Int. Conf. Informatics,
[5] J. Doorbar, “Molecular biology of human papillomavirus infection
Electronics & Vision (ICIEV) and 2021 5th Int. Conf. on Imaging,
and cervical cancer,” Clin. Sci., vol. 110, no. 5, pp. 525–541, 2006
Vision & Pattern Recognition (icIVPR), pp. 1-7, 2021, doi:
[6] I. C. Scarinci et al., “Cervical cancer prevention: New tools and old 10.1109/ICIEVicIVPR52578.2021.9564108
barriers,” Cancer, vol. 116, no. 11, pp. 2531–2542, 2010
[24] Nishat, M. M., et al. “Detection of Autism Spectrum Disorder by
[7] D. Saslow et al., “American Cancer Society, American Society for Discriminant Analysis Algorithm,” BIM, pp. 473-482, Springer,
Colposcopy and Cervical Pathology, and American Society for Singapore, 2022 doi: 10.1007/978-981-16-6636-0_36
Clinical Pathology screening guidelines for the prevention and early [25] A. A. Rahman, M. I. Siraji, L. I. Khalid, F. Faisal, M. M. Nishat and
detection of cervical cancer,” CA. Cancer J. Clin., vol. 62, no. 3, pp. M. R. Islam, "Detection of Mental State from EEG Signal Data: An
147–172, 2012, doi: 10.3322/caac.21139.
Investigation with Machine Learning Classifiers," 2022 14th
[8] J. Lu, et al. “Machine learning for assisting cervical cancer diagnosis: International Conference on Knowledge and Smart Technology
An ensemble approach,” Futur. Gener. Comput. Syst., vol. 106, pp. (KST), pp. 152-156, 2022, doi: 10.1109/KST53302.2022.9729084.
199–205, 2020, doi: 10.1016/j.future.2019.12.033.
[26] M. R. Rahman, S. Tabassum, E. Haque, M. M. Nishat, F. Faisal and
[9] B. Nithya and V. Ilango, “Evaluation of machine learning based E. Hossain, "CNN-based Deep Learning Approach for Micro-crack
optimized feature selection approaches and classification methods for Detection of Solar Panels," 2021 3rdInt. Conf. on STI 4.0, 2021, pp. 1-
cervical cancer prediction,” SN Appl. Sci., vol. 1, no. 6, 2019 6, doi: 10.1109/STI53101.2021.9732592.
[10] R. Weegar and K. Sundström, “Using machine learning for predicting [27] Nishat, M. M., et al. “A Comprehensive Investigation of the
cervical cancer from Swedish electronic health records by mining Performances of Different Machine Learning Classifiers with
hierarchical representations,” PLoS One, 15(8), pp. 1–19, 2020 SMOTE-ENN Oversampling Technique and Hyperparameter
[11] F. Faisal and M. M. Nishat, "An Investigation for Enhancing Optimization for Imbalanced Heart Failure Dataset,” Scientific
Registration Performance with Brain Atlas by Novel Image Inpainting Programming, https://doi.org/10.1155/2022/3649406
Technique using Dice and Jaccard Score on Multiple Sclerosis (MS) [28] Nishat, M. M, Faisal, F., Mahbub, M. A., Mahbub, M. H., et al.,
Tissue," Biomed. and Pharm. J., vol. 12, no. 3, pp. 1249-1262, 2019 “Performance Assessment of Different Machine Learning Algorithms
[12] M. M. Nishat, F. Faisal, T. Hasan, M. F. B. Karim, Z. Islam and M. in Predicting Diabetes Mellitus”. Biosc.Biotech.Res.Comm.,14(1),
R. Kaysar, "An Investigative Approach to Employ Support Vector 2021 doi: http://dx.doi.org/10.21786/bbrc/14.1/10.
Classifier as a Potential Detector of Brain Cancer from MRI [29] Islam, Mahdi, Musarrat Tabassum, Mirza Muntasir Nishat, Fahim
Dataset," 2021 Int. Conf. on Electronics, Comm. and Inf. Tech. Faisal and Muhammad Sayem Hasan, “Real-Time Clinical Gait
(ICECIT), 2021, pp. 1-4, doi: 10.1109/ICECIT54077.2021.9641168 Analysis and Foot Anomalies Detection Using Pressure Sensor and
[13] M. M. Nishat et al., "Performance Investigation of Different Boosting Convolutional Neural Network” 2022 7th International Conference on
Algorithms in Predicting Chronic Kidney Disease," 2020 2nd Int. Business and Industrial Research (ICBIR), IEEE. accepted, in press.
Conf. on STI 4.0, pp. 1-5, 2020,doi: 10.1109/STI50764.2020.9350440 [30] Rahman, A. A., Siraji, M. I., Khalid, L. I., Faisal, F., Nishat, M. M.,
[14] M. A. A. R. Asif et al., "Performance Evaluation and Comparative Ahmed, A., Mamun, M. A. A., “Perceived Stress Analysis of
Analysis of Different Machine Learning Algorithms in Predicting Undergraduate Students During COVID-19: A Machine Learning
Cardiovascular Disease," Engineering Letters, vol. 29, no. 2, pp. 731- Approach,” 2022 IEEE 21st Mediterranean Electrotechnical
741, 2021 Conference (MELECON), Palermo, Italy, accepted, in press
[15] M. M. Nishat et al., “A Comprehensive Analysis on Detecting [31] F. Faisal, M. M. Nishat and M. A. M. Oninda, "Spectroscopic
Chronic Kidney Disease by Employing Machine Learning Characterization of Biological Tissue using Quantitative Acoustics
Algorithms,” EAI Endorsed Transactions on Pervasive Health and Technique," 2018 4th Int. Conf. on Electrical Engineering and
Technology, vol. 18, no. e6, 2021, doi: 10.4108/eai.13-8-2021.170671 Information & Communication Technology (iCEEiCT), pp. 38-43,
[16] M. R. Farazi, F. Faisal, Z. Zaman and S. Farhan, "Inpainting multiple 2018, doi: 10.1109/CEEICT.2018.8628146.
sclerosis lesions for improving registration performance with brain [32] Sobar, R. Machmud, and A. Wijaya, “Behavior Determinant Based
atlas," 2016 Int. Conf. on Medical Engg., Health Info. and Tech. Cervical Cancer Early Detection with Machine Learning Algorithm,”
(MediTec), 2016, pp. 1-6, doi: 10.1109/MEDITEC.2016.7835363. Adv. Sci. Lett., 22,(10), pp. 3120–3123, 2016
[17] M. A. A. R. Asif et al., "Computer Aided Diagnosis of Thyroid [33] K. R. Debashree Kashyap, et al. “Cervical Cancer Detection And
Disease Using Machine Learning Algorithms," 2020 11th Int. Conf. Classification Using Independent Level Sets And Multi SVMs,”
on Electrical and Computer Engineering (ICECE), 2020, pp. 222- Biomed. Pharmacol. J., 9(2), pp. 663–671, 2016
225, doi: 10.1109/ICECE51571.2020.9393054 [34] E. Njoroge, et al. “Classification of cervical cancer cells using FTIR
[18] M. M. Nishat and F. Faisal, "An Investigation of Spectroscopic data,” Annu. Int. Conf. IEEE Eng. Med. Biol. - Proc., pp. 5338–5341,
Characterization on Biological Tissue," 2018 4th Int. Conf. on 2006, doi: 10.1109/IEMBS.2006.260024.
Electrical Engg. and Information & Communication Tech. [35] Y. S. Muhammad Fazal Ijaz, Muhammad Attique, “Data-Driven
(iCEEiCT), 2018, pp. 290-295, doi: 10.1109/CEEICT.2018.8628081. Cervical Cancer Prediction Model with Outlier Detection and Over-
[19] F. Faisal, M. M. Nishat, M. A. Mahbub, M. M. I. Shawon, and M. M. Sampling Methods,” Sensors, vol. 20, no. 10, pp. 1424–8220, 2020.
U. H. Alvi, "Covid-19 and its impact on school closures: a predictive [36] W. Wu and H. Zhou, “Data-driven diagnosis of cervical cancer with
analysis using machine learning algorithms," 2021 Int. Conf. on support vector machine-based approaches,” IEEE Access, vol. 5, pp.
Science & Contemporary Technologies (ICSCT), 2021, pp. 1-6, doi: 25189–25195, 2017, doi: 10.1109/ACCESS.2017.2763984
10.1109/ICSCT53883.2021.9642617
[37] J. Hyeon, H. J. Choi, K. N. Lee, and B. D. Lee, “Automating
[20] A. A. Rahman et al., "Detection of Epileptic Seizure from EEG papanicolaou test using deep convolutional activation feature,” Proc.
Signal Data by Employing Machine Learning Algorithms with - 18th Int. Conf. Mob. Data Manag. MDM 2017, pp. 382–385, 2017
Hyperparameter Optimization," 2021 4th Int. Conf. on Bio- [38] “UCI Machine Learning Repository: Cervical Cancer Behavior Risk
Engineering for Smart Technologies (BioSMART), 2021, pp. 1-4, doi: Data Set.”
10.1109/BioSMART54244.2021.9677770
https://archive.ics.uci.edu/ml/datasets/Cervical+Cancer+Behavior+Ri
[21] A. A. Rahman, L. I. Khalid, M. I. Siraji, M. M. Nishat, F. Faisal and sk
A. Ahmed, "Enhancing the Performance of Machine Learning

Authorized licensed use limited to: Amrita School of Engineering. Downloaded on September 15,2023 at 15:42:52 UTC from IEEE Xplore. Restrictions apply.

You might also like