Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

IEEE Conference ID: 59847 1st – 3rd Nov.

2023
Proceedings of the 2023 3rd International Conference on Technological Advancements in Computational Sciences

Lung Cancer Detecting using Radiomics Features


and Machine Learning Algorithm
2023 3rd International Conference on Technological Advancements in Computational Sciences (ICTACS) | 979-8-3503-4233-8/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICTACS59847.2023.10389900

M.N. Rajesh1, Nikita Tanni2, Z. Tanveer Baig3


Ph.D., Electronics and Communication Engineering, Jain (Deemed–to be–University), Bengaluru, India
1

2
B.Tech in Computer Science and Engineering (Data Science), Christ (Deemed to be University), Bengaluru, India
3
Dept of IT, Amity University, Taskent, Uzbekistan
E-mail: 1rajeshmn.mn@gmail.com, 2nikita.tanni@btech.christuniversity.in, 3tbzaheer@amity.uz

Abstract—Lung Cancer Incidence across the globe is the encompasses Trachea, bronchus as well as lungs. Lung
second leading cancer type tallying to about 2,206,771 during Cancer Incidence or number of new cases within Asia
2020 and is estimated to rise to about 3,503,378 by 2040 for during 2020 is about 1,300,018 and estimated to reach
both male and female sexes and for all ages accounting to 2,133,230 during 2040 for both sexes and for all ages. Lung
11.4% as per Globocan 2020 [1]. It is the leading death-causing
Cancer Mortality or number of deaths within Asia during
cancer. Lung Cancer [2] in broad terms encompasses Trachea,
bronchus as well as lungs. Purpose: The study is aimed to 2020 is about 1,100,776 and estimated to reach 1,902,323
understand Radiomics based approach in the identification as during 2040 for both sexes and for all ages. Lung Cancer
well as classification of CT Images with Lung Cancer when Incidence or number of new cases within India during 2020
Machine Learning (ML) algorithms are applied. Method: CT is about 72,510 and is estimated to reach 120,374 during
Image from LIDC-IDRI [4] Dataset has been chosen. CT Image 2040 for both sexes and for all ages. Lung Cancer Mortality
Dataset was balanced and image features by PyRadiomics or the number of deaths within India during 2020 is about
library were collected. Various ML features classification 66,279 and is estimated to reach 110,076 during 2040 for
algorithms are utilized to create models and matrices adopted both sexes and for all ages.
in judging their accuracies. The models, distinctive capacity
Lung Cancer screening [3] is important in the diagnosis
is assessed by receiver operating characteristics (ROC)
analysis. Result: The Accuracy scores and ROC-AUC values as well as management of lung nodules and also cancer.
obtained for various Classification Model are as follows, for Prevalent clinical imaging modalities include the X-Ray,
Ada Boosting, the accuracy score was 0.9993 ROC-AUC was Computerized Tomography (CT), the Single-Photon
0.9993 and followed by GBM, the accuracy score was 0.9993, Emission Computerized-Tomography (SPECT), the
was 0.9992. Conclusion: Extracting texture parameters on Magnetic resonance imaging (MRI), the Positron Emission
CT images as well as linking the Radiomics method with ML Tomography (PET) as well as Ultra-sound (US). For Clinical
would categorize Lung Cancer commendably. Analysis, a combination of the above imaging leads to
Keywords: Lung Cancer, Radiomics, Feature Extraction, multimodal analysis. Notably CT imaging is the commonly
Selection, Image Classification, Machine Learning
adopted method for lung [4] as it aids to diagnose and
I. Introduction manage lung cancer earlier & more effectively. Advantages
Uncontrolled cell division in the lung [2] results in of CT Images Include that it is accurate, non-invasive,
nodules and tumors and impairs normal lung function. This easier, as well as painless. CT Imaging can capture bone,
condition is called lung cancer. The most prevalent kind of delicate tissue as well as veins simultaneously.
lung cancer that spreads uncontrollably and is difficult to
treat is small-cell lung cancer (SCLC), that accounts to 80%
of cases. Another variant of is non-small cell lung cancer
(NSCLC).
Lung Cancer Incidence or number of new cases across
the globe is the second leading cancer type tallying to about
2,206,771 during 2020 and is estimated to rise to about
3,503,378 by 2040 for both male and female sexes and for
all ages accounting to 11.4% as per Globocan 2020 [1]. It Fig. 1: Lung Cancer Screening
is the leading death-causing cancer type of about 1,796,144 For detection and Classification of the Clinical
during 2020 and estimated to rise to about 2,942,519 by 2040 Images, firstly Images are preprocessed, region of interest
for both male and female sexes and for all ages accounting (ROI) is identified using segmentation. In the subsequent
to 18% of all Cancer Cases. Lung Cancer in broad terms step, Radiomics based features are extracted, and

Copyright © IEEE–2023 ISBN: 979-8-3503-4233-8 1479

Authorized licensed use limited to: VIT University. Downloaded on March 08,2024 at 18:27:52 UTC from IEEE Xplore. Restrictions apply.
IEEE Conference ID: 59847 1st – 3rd Nov. 2023
Proceedings of the 2023 3rd International Conference on Technological Advancements in Computational Sciences

important features are selected and Image Classification is The application of this technique based on the information
accomplished as in Figure 2 below obtained can be used for personalized therapy and in medical
decision support systems. Identification of important
features and isolating most dependable, non-repetitive, as
well as relevant feature-set in model creation is known as
Feature selection. It is predominantly adopted to improvise
extrapolative model performance at the same time dropping
modeling processing overheads which would also result in
improving the accuracies of the classification. For feature
Fig. 2: The Workflow of a Basic Computer Aided Diagnosis System reduction, the two utmost frequently adopted methods are
A. Dataset Correlation coefficients as well as least absolute shrinkage
Lung Cancer Image Dataset identification is crucial as and selection operator (LASSO) [17].
well as initial step in the process of study. There are various D. Image Classification
imaging modalities used for lung cancer detection but most In [15] the author for Lung Cancer classification has
preferred is CT imaging as it is of high sensitivity and low adopted ML models such as Support vector machine (SVM)
cost. For the study, LIDC-IDRI [5] has been chosen and it classifier, Decision-Trees (DT), Linear-Regression (LR),
contains CT along with the digital radiography (DX) as well Ridge Classifier, kNN, Random-Forest (RF), Gradient-
as computed radiography (CR) of about 1010 Participants, Boosting Machine Classifier (GBM), Light Gradient
with a total of 244,527 Images [6]. CT Images are considered Boosting Machine (LGBM) as well as eXtreme Gradient
for the study. Boosting (XGB) classifier models. And for assessment of
II. Literature Review the segregating capacity of each of these models is analysed
with receiver operating characteristics (ROC). Area under
A. Image Preprocessing the curve (AUC) is adopted to indicate performance of CT
The Image Preprocessing is adopted so as to increase, Radiomics -based model to classifiy.
initial medical image's readability, quality as well as
usability. While using CT Images preprocessing is adopted III. Proposed Methodology
so as to eliminate noise and any other inapt information In the current Lung Cancer Research work, CT Image
from raw images and also to enhance quality to help in dataset [6] is considered for the study. The proposed
detecting required information. Radiomics based model includes five stages[16] namely
first stage being CT image augmentation, second stage
B. Image Segmentation
being Image segmentation, third stage being by features
Researchers from 80’s have attempted to develop and extraction using PyRadiomics, and forth stage being feature
propose novel methods to detect, segment [7, 8, 9], and selection and/or reduction and finally fifth stage where the
diagnose lung tumor growth using Computed Tomography features are analyzed using ML classification models to
scans. Due to location, size, and internal structure it is predict the results as in Figure 3 below.
tough to detect and segment pulmonary tumors and masses.
Various AI-based techniques for the segmentation of tumors
are adopted. Prominently techniques include Watershed
[10], U-Net [11], MV-CNN [12], and Faster R-CNN [13].
C. Feature Extraction and Selection
Extraction of image texture characteristics or features
once segmentation is completed and region of interest is
marked, is accomplished through PyRadiomics [14] library
which is an open-source Python package. The package is
well acknowledged for extracting characteristics unique and
Fig. 3: Architecture of Radiomics-Based ML Model
not visible directly from radiology scans. Radiomics [15,
16] centers upon quantitative characteristics (Radiomics A. Image Augmentation
features) mining from out of the radiology scans that In the first step, number of normal and with tumor
could not have been viewed by the radiologists in normal images is balanced using image augmentation. About 1197
circumstances. Adopting refined mathematical methods, CT scans are considered for study out of the LIDC-IDRI
Radiomics features such as 2D or 3D shape information, Image Dataset [5], of which there were about 451 CT Images
image intensity values, wavelet as well as texture information from normal patients and of about 746 CT Images from the
can be extracted. The raw data or the features with values can patients who had cancer. The Image Data Generator class
be further normalized and cleansed for further processing. from Keras, is adopted to generate a variety of picture

1480 Copyright © IEEE–2023 ISBN: 979-8-3503-4233-8

Authorized licensed use limited to: VIT University. Downloaded on March 08,2024 at 18:27:52 UTC from IEEE Xplore. Restrictions apply.
Lung Cancer Detecting using Radiomics Features and Machine Learning Algorithm

augmentation techniques, within the implementation. them as a powerful and accurate model. AdaBoost's core
Numerous augmentation options, including rotation, image principle is to iteratively train numerous frail learners,
skew with width and height shift, shearing, zooming, such as DTs or simple models, by giving weights to each
horizontal and vertical flipping, are utilized. After Image individual piece of input. A weak learner is trained using
augmentation, the number of CT images of the normal the initial weighted data, which has been assigned to all data
patient were 2631 and number of CT images of patient with points equally. The incorrectly categorized data points are
cancer were 2868 resulting a total number of images as provided greater weights after each iteration, in comparison
5499 CT Images. with the fact that the appropriately classified points are
B. ROI Segmentation given lower weights. Subsequent weak learners are then
trained on this updated weighted data, focusing more on the
In the second step, for each of the original images, the
previously misclassified instances.
region of interest is developed by segmenting the lesion or
tumor, which is creating a mask image. These mask images 2) Gradient Boosting
are created by taking in the input image, and converted it The approach utilizes to enhance regression as well
into binary image using threshold values. Otsu Threshold as in classification models with a goal of optimizing the
technique [18] is adopted for segmentation by the authors model's learning process revolves around a technique
[19] and the technique is adopted to create the mask. known as Gradient Boosting (GB) [21, 22]. This method is
C. Feature Extraction particularly well-suited for creating non-linear models, such
as regression or decision trees. In the case of regression
During the third stage the Original image and its
decision trees, the process integrates gradual as well as
mask is used to extract the Radiomics features through
sequential adopting new learners into the model mix.
the PyRadiomics package [14]. A total of 124 Radiomics
Gradient Boosting belongs to ensemble learning family,
texture characteristics were pulled out by running the Python
which is highly focused to combine multiple weak learners
PyRadiomics Library which provides 8 classes such as First
to create a strong and accurate model. One of the key
order, Shape-based (2D), Gray-Level Run-Length Matrix,
advantages of Gradient Boosting is its ability to handle
Shape-based (3D), Gray-Level Co-occurrence Matrix,
complex and non-linear relationships within the data.
Neighboring Gray-Tone Differences Matrix- NGTDM,
However, Gradient Boosting is a versatile and effective
Gray-Level Size-Zones Matrix- GLSZM and Gray-Level
machine-learning approach.
Dependence Matrix- GLDM from the CT Images considered
and Radiomics Features dataset was created. 3) LGBM
D. Feature Selection Analysis and Classification An enhanced Gradient-Boosting Machine called
Light Gradient-Boosting Machine (LGBM) [23], which
During the Final stage, feature selection is applied.
is based on Decision Trees (DTs), was created with the
Feature choice through reduction is a form of pre-processing
goal of dramatically increasing model efficacy while also
which recognizes the vital characteristics of a certain issue.
using the least amount of memory possible. Because of
In radiological usage, vital feature recognition is established
this, LGBM is a very effective and potent algorithm for
to be operative, as this ensures in reducing the quantity of
a variety of machine-learning applications. LGBM uses
dimensions and hence delivers details regarding the aspects
two important strategies—Exclusive Feature Bundling
that add to the growth of a disease. Feature Selection is
(EFB) as well as the Gradient based One-Side Sampling—
accomplished with Mutual Information and Analysis of to achieve its outstanding performance. EFB enables the
variance. Mutual Information is a measure that evaluates algorithm to combine exclusive characteristics, which
the concurrence in determining the random parameters are lowers memory usage and speeds up processing. This
significantly dependent on each other. Analysis of variance method reduces memory consumption without sacrificing
is an approach which is easier to deploy as well effective for prediction accuracy, making it particularly useful when
judging if there is a variations in means with-in groups. The working with a large number of sparse characteristics. Due
Radiomics feature dataset of 5499 CT Images was divided to its exceptional performance and memory effectiveness,
into 50% for Training as well as 50% for Testing resulting LGBM is becoming more and more popular in a variety of
in the training image dataset having 2749 and the testing fields, including image categorization, web search ranking,
image dataset having 2750 images. and numerous other machine-learning applications.
Once the Features are selected classification algorithms 4) XGB
are used to develop models that can predict the results.
Extreme Gradient Boosting, also known as XGBoost
1) ADA Boosting [24], is a highly regarded ensemble method that has
AdaBoost (Adaptive Boosting) [20] is a well- transformed sequential decision trees and other tree-based
established and powerful ensemble technique that targets to machine learning techniques. As a member of the Boosting
enhance the performance of frail learners by accomplishing family, XGBoost excels at enhancing productivity and

Copyright © IEEE–2023 ISBN: 979-8-3503-4233-8 1481

Authorized licensed use limited to: VIT University. Downloaded on March 08,2024 at 18:27:52 UTC from IEEE Xplore. Restrictions apply.
IEEE Conference ID: 59847 1st – 3rd Nov. 2023
Proceedings of the 2023 3rd International Conference on Technological Advancements in Computational Sciences

effectiveness across a range of predictive activities. By of capacity of classifying model to discriminate


using the boosting technique, XGBoost is able to overcome between classes which is adopted as a summary of
the weaknesses of individual models and take use of their ROC curve. It measures area within the ROC curve,
combined strengths to provide a highly effective and which is a performance measure for classification
accurate predictive framework. Due to its ability to handle that accounts for both true positive rate (sensitivity)
complicated data relationships, lessen overfitting, and as well as false positive rate. AUC values would be
provide faster and more effective training than conventional between ‘1’ to ‘0’, if ‘1’, the model distinguishes
tree-based algorithms, XGBoost has emerged as a favored Positives and Negatives clearly, if ‘0’, the model
option in many real-world applications. In ensemble indicates Positives as Negatives and Negatives as
learning, several predictors are merged to produce a Positives.
reliable as well as accurate model. The ensemble technique
IV. Experimental Results and Discussion
category's Boosting subsection includes XGBoost. It works
by successively training a group of unreliable learners, Experiments were conducted using Ada Boosting,
typically decision trees, where each succeeding model fixes Gradient Boosting, Light GBM (LGBM) as well as Extreme
the mistakes of the one before it. Gradient Boosting also known as XGBoost Machine
Learning Algorithms.
E. Performance Metrics and Analysis
A. Ada Boosting
In the study, the proposed model performance was
analyzed using various metrics derived out of True Positive We considered Ada Boosting [20] to start with and to
(TP) as well as True Negative (TN). It is to be noted that intensify the performance of the Ada Boosting Model, we
the proper identification of Lung Cancer depends directly determined the ideal set of parameters that produced the
on the relations of True Positive (TP) along with the True most precise results by methodically examining a range of
Negative (TN), while relations of False Positive (FP) as well hyperparameter values and we tuned its hyperparameters.
as the False Negative (FN) indicate incorrect identification The Image Dataset was split into Training with 75% that is
of Lung Cancer. 4043 images and for Testing with 25% that is 1348 images.
• Accuracy: The accuracy of the model is a measure To assess model's performance, details were captured
of the fraction of correctly classified instances through Confusion Matrix and also captured through
among all instances in the test set as in equation Classification Report as in Table 1 below.
(_1) below, Confusion Matrix of the Ada Boosting Model:
Accuracy = (TP +TN)/ [ (TP +TN +FP+ FN)]   (1) [[1283 0]
• Confusion Matrix: The confusion matrix [ 1 1466]]
summarizes performance of a model by providing Table I: Classification Report of Ada Boosting Model
a table that shows the true positive, true negative, Precision Recall f1-score Support
false positive, as well as false negative predictions. 0 01.0 01.0 01.0 629
Results were generated using Classification Report 1 01.0 01.0 01.0 719
were generated with Precision, a metric which
accuracy 01.0 1348
indicates about the quality of positive outcomes as
macro avg 01.0 01.0 01.0 1348
in equation (2), Sensitivity or Recall indicates how
accurately the model recognizes true positives as in weighted avg 01.0 01.0 01.0 1348
equation (3). Ideally, model outcomes is accurate Using methods like grid search, we assessed the model's
when the Precision as well as Recall are of value performance with various parameter settings as part of the
one. F1 score is harmonic mean considering both hyperparameter tuning process resulting in better predictive
precision as well as recall, hence is a better metrics abilities. The performance metrics of the ML model
over accuracy, F1 score is as in equation(5).
improved when we tuned the n estimators and learning rate
Specificity as in equation (4) is provided herewith
than it did with the default parameter settings. After several
for reference.
iterations, the performance reported by the model is as in
Table II below.
Table II: Classification Report of Ada Boosting Models
Parameters ROC
Model Accuracy
Learning rate estimators AUC
• ROC AUC: Classification model performs can be ABModel01 0.1 3 0.9964 0.9963
evaluated by understanding the plot obtained with ABModel02 0.7 5 0.9964 0.9963
Receiver-Operating Characteristic (ROC) Curve.
ABModel03 0.7 5 0.9964 0.9963
Now Area-Under the Curve (AUC) is a degree

1482 Copyright © IEEE–2023 ISBN: 979-8-3503-4233-8

Authorized licensed use limited to: VIT University. Downloaded on March 08,2024 at 18:27:52 UTC from IEEE Xplore. Restrictions apply.
Lung Cancer Detecting using Radiomics Features and Machine Learning Algorithm

Table V: Classification Report of Light Gradient Boosting Models


ABModel04 0.07 8 0.9964 0.9963
ABModel05 0.87 10 0.9993 0.9993 Precision Recall f1-score Support
ABModel06 0.2 12 0.9993 0.9993 0 01.0 01.0 01.0 629
1 01.0 01.0 01.0 719
B. Gradient Boosting
accuracy - - 01.0 1348
As a second ML Algorithm to experiment we considered
Gradient Boosting. The Image Dataset was split into macro avg 01.0 01.0 01.0 1348
Training with 75% that is 4043 images and for Testing with weighted avg 01.0 01.0 01.0 1348
25% that is 1348 images. To assess model's performance,
details were captured through Confusion Matrix and also The performance of the resulting Light Gradient
captured through Classification Report as in Table 3 below. Boosting ML Algorithm models were better when we tuned
Confusion Matrix of the Gradient Boosting Model: learning rate and n estimator parameters. We determined the
[[629 0] ideal set of parameters that produced the most precise results
[ 1 718]] by methodically examining a range of hyperparameter
Table III: Classification Report of Gradient Boosting Model
values. After several iterations, the performance reported
by the model is as in Table 6 below.
Precision Recall f1-score Support
Table VI: Classification Report of Light Gradient Boosting Models
0 01.0 01.0 01.0 629
Parameters ROC
01.0 01.0 01.0 Model Accuracy
1 719 Learning rate estimators AUC

accuracy - - 01.0 1348 LGBMModel01 0.1 1 0.9971 0.997


macro avg 01.0 01.0 01.0 1348 LGBMModel02 0.07 4 0.9971 0.997

weighted avg 01.0 01.0 01.0 1348 LGBMModel03 0.07 5 0.9971 0.997
LGBMModel04 0.09 13 0.9978 0.9978
While during the experimentation with the Gradient LGBMModel05 0.15 15 0.9993 0.9993
Boosting ML Algorithm, the performance of the resulting
LGBMModel06 0.15 25 0.9985 0.9985
models were better while we tuned learning rate, max
depth, and n estimator parameters. We determined the ideal LGBMModel07 0.15 12 0.9978 0.9978
set of parameters that produced the most precise results by D. XGB
methodically examining a range of hyperparameter values.
Extreme Gradient Boosting, also known as XGBoost
After several iterations, the performance reported by the
was considered as the final experimentation ML Algorithm.
model is as in below Table 4.
For XG Boost experimentation we considered Dataset to be
Table IV: Classification Report of Gradient Boosting Models
split into 50% for Training that is with 2749 Images and
Parameters 50% for Testing that is with 2750 Images. To assess model's
ROC performance, details were captured through Confusion
Model Learning Max Accuracy
AUC
rate Depth Estimators Matrix and also captured through Classification Report as
in Table 7 below.
GBModel01 0.01 4 4 0.9993 0.9993
Confusion Matrix of the XGBoost Model:
GBModel02 0.1 5 4 0.9993 0.9992 [[1280 3]
[ 0 1467]]
GBModel03 0.05 10 1 0.9993 0.9993
Table VII: Classification Report of XGBoost Models

C. LGBM Precision Recall_ f1-score Support


We considered to experiment with the Light Gradient 0 1.0 1.0 1.0 1283
Boosting Model after considering Gradient Boosting 1 1.0 1.0 1.0 1467
Algorithm to understand its classification performance. accuracy - - 1.0 2750
The Image Dataset was split into Training with 75% that is macro avg 1.0 1.0 1.0 2750
4043 images and for Testing with 25% that is 1348 images.
weighted avg 1.0 1.0 1.0 2750
To assess model's performance, the details were captured
through Confusion Matrix and also captured through To enhance the performance of the XGBoost model,
Classification Report as in Table 5 below. we considered learning rate and n estimators as the key
Confusion Matrix of the LGB Model’s: parameters and tuned them. We determined the ideal set of
[[629 0] values for these parameters that produced the most precise
[ 1 718]] results by methodically examining a range of hyperparameter

Copyright © IEEE–2023 ISBN: 979-8-3503-4233-8 1483

Authorized licensed use limited to: VIT University. Downloaded on March 08,2024 at 18:27:52 UTC from IEEE Xplore. Restrictions apply.
IEEE Conference ID: 59847 1st – 3rd Nov. 2023
Proceedings of the 2023 3rd International Conference on Technological Advancements in Computational Sciences

values. After several iterations, the performance reported by Radiomics approach in extracting features, identifying
the model is as in Table 8 below. the important features, and classifying them with the
Table VIII: Classification Report of XGBoost Models Machine Learning Algorithms. It is also to be noted that
the experiment implemented Image Dataset balancing
Parameters
using image augmentation considering both normal and
Model Learning Accuracy ROC AUC
rate
estimators cancer which also aided for better performance outcomes.
The Accuracy scores and ROC-AUC values obtained for
XGBModel01 0.1 1 0.9978 0.9978
various Classification Models are as follows, for Ada
XGBModel02 0.9 3 0.9985 0.9985
Boosting, the accuracy score was 0.9993 ROC-AUC was
XGBModel03 0.9 3 0.9985 0.9985 0.9993, followed by GBM, the accuracy score was 0.9993
XGBModel04 0.01 2 0.9978 0.9978 was 0.9992, followed by XGB, the accuracy score was
XGBModel05 0.01 3 0.9978 0.9978 0.9985 was 0.9985, followed by LGBM, the accuracy
XGBModel06 1.2 4 0.9978 0.9978 score was 0.9985 was 0.9985. However, it is observed that
XGBModel07 2.2 3 0.9978 0.9978 there are various studies which are informative as well as
XGBModel08 0.1 1 0.9978 0.9978
with quality for using CT Radiomics -based models. These
studies and approaches has to be channelized, harmonized
V. Discussion and exclusive validation needs to be considered so as to
Based on the results obtained, performance-based adopt these models.
results tabulated of proposed ML Models experimented Acknowledgment
using Ada Boosting, Gradient Boosting, Light GBM
As authors we acknowledge that the National Cancer
(LGBM) as well as Extreme Gradient Boosting also known
Institute as well as the Foundation for the National Institutes
as XGBoost_ to classify balanced CT Image Dataset using
of Health, have critical role in the creation of the free
Radiomics features in terms of their accuracy and ROC-
publicly available LIDC/IDRI Database used as part of this
AUC values is as indicated in Table 9 below.
study.
Table IX: Performance Metrics Summary of the Proposed Machine
Learning Models using Radiomics Features References
[1] J Ferlay, M Ervik, F Lam, et al, eds. Global Cancer Observatory:
Cancer Today. International Agency for Research on Cancer; 2020.
Feature Classification Model Accuracy ROC-AUC
Accessed November 25, 2020. gco.iarc.fr/today
Ada Boosting 0.9993 0.9993 [2] Lemjabbar-Alaoui H, Hassan OU, Yang YW, Buchanan P. Lung
GBM 0.9993 0.9992 cancer: Biology and treatment options. Biochim Biophys Acta. 2015
Dec;1856(2):189-210. doi: 10.1016/j.bbcan.2015.08.002. Epub 2015
XGB 0.9985 0.9985 Aug 19. PMID: 26297204; PMCID: PMC4663145.
LGBM 0.9985 0.9985 [3] Li, P., Wang, S., Li, T., Lu, J., HuangFu, Y., & Wang, D. (2020).
A Large-Scale CT and PET/CT Dataset for Lung Cancer Diagnosis
(Lung-PET-CT-Dx) [Data set]. The Cancer Imaging Archive. https://
doi.org/10.7937/TCIA.2020.NNC2-0461
[4] Mridha MF, Prodeep AR, Hoque ASMM, Islam MR, Lima AA,
Kabir MM, Hamid MA, Watanobe Y. A Comprehensive Survey
on the Progress, Process, and Challenges of Lung Cancer Detection
and Classification. J Healthc Eng. 2022 Dec 16;2022:5905230. doi:
10.1155/2022/5905230. PMID: 36569180; PMCID: PMC9788902.
[5] Armato III, S. G., McLennan, G., Bidaut, L., McNitt-Gray, M. F.,
Meyer, C. R., Reeves, A. P., Zhao, B., Aberle, D. R., Henschke, C.
I., Hoffman, E. A., Kazerooni, E. A., MacMahon, H., Van Beek, E.
J. R., Yankelevitz, D., Biancardi, A. M., Bland, P. H., Brown, M. S.,
Engelmann, R. M., Laderach, G. E., Max, D., Pais, R. C., Qing, D. P.
Y., Roberts, R. Y., Smith, A. R., Starkey, A., Batra, P., Caligiuri, P.,
Farooqi, A., Gladish, G. W., Jude, C. M., Munden, R. F., Petkovska,
I., Quint, L. E., Schwartz, L. H., Sundaram, B., Dodd, L. E.,
Fig. 4: Performance Matrix Fenimore, C., Gur, D., Petrick, N., Freymann, J., Kirby, J., Hughes,
The bar in Fig 4 chart visualizes the performance B., Casteele, A. V., Gupte, S., Sallam, M., Heath, M. D., Kuhn, M.
metrics of four proposed machine learning models: Ada H., Dharaiya, E., Burns, R., Fryd, D. S., Salganicoff, M., Anand,
Boosting, GBM, XGB, and LGBM. For each model, both V., Shreter, U., Vastagh, S., Croft, B. Y., Clarke, L. P. (2015). Data
From LIDC-IDRI [Data set]. The Cancer Imaging Archive. https://
accuracy and ROC-AUC scores are presented, facilitating a doi.org/10.7937/K9/TCIA.2015.LO9QL9SX.
clear comparison of their effectiveness. [6] Armato SG 3rd, McLennan G, Bidaut L, McNitt-Gray MF, Meyer
CR, Reeves AP, Zhao B, Aberle DR, Henschke CI, Hoffman EA,
VI. Conclusion Kazerooni EA, MacMahon H, Van Beeke EJ, Yankelevitz D,
It is observed by experimentation that the analysis Biancardi AM, Bland PH, Brown MS, Engelmann RM, Laderach
and classification of the Medical Images, by adopting GE, Max D, Pais RC, Qing DP, Roberts RY, Smith AR, Starkey A,

1484 Copyright © IEEE–2023 ISBN: 979-8-3503-4233-8

Authorized licensed use limited to: VIT University. Downloaded on March 08,2024 at 18:27:52 UTC from IEEE Xplore. Restrictions apply.
Lung Cancer Detecting using Radiomics Features and Machine Learning Algorithm

Batrah P, Caligiuri P, Farooqi A, Gladish GW, Jude CM, Munden S., Aerts, H. J. W. L. (2017). Computational Radiomics System to
RF, Petkovska I, Quint LE, Schwartz LH, Sundaram B, Dodd LE, Decode the Radiographic Phenotype. Cancer Research, 77(21),
Fenimore C, Gur D, Petrick N, Freymann J, Kirby J, Hughes B, e104–e107. doi.org/10.1158/0008-5472.CAN-17-0339
Casteele AV, Gupte S, Sallamm M, Heath MD, Kuhn MH, Dharaiya [15] Felfli M, Liu Y, Zerka F, Voyton C, Thinnes A, Jacques S, Iannessi
E, Burns R, Fryd DS, Salganicoff M, Anand V, Shreter U, Vastagh S, A, “Systematic Review, Meta-Analysis and Radiomics Quality
Croft BY. The Lung Image Database Consortium (LIDC) and Image Score Assessment of CT Radiomics-Based Models Predicting
Database Resource Initiative (IDRI): A completed reference database Tumor EGFR Mutation Status in Patients with Non-Small-Cell Lung
of lung nodules on CT scans. Medical Physics, 38: 915--931, 2011. Cancer,” International Journal of Molecular Sciences, vol. 24, no. 14,
DOI: https://doi.org/10.1118/1.3528204 p. 11433, Jul. 2023, doi: 10.3390/ijms241411433.
[7] Ma Z., João Manuel R. S. T., Natal Jorge R. M. A review on the [16] M. N. Rajesh, B. S. Chandrasekar, S. Shivakumar Swamy, "Prostate
current segmentation algorithms for medical images. Proceedings of Cancer Detection using Radiomics-based Feature Analysis with ML
the 1st International Conference on Imaging Theory and Applications;
Algorithms and MR Images," International Journal of Engineering
February 2009; Lisboa, Portugal. IMAGAPP
Trends and Technology, vol. 70, no. 12, pp. 42-58, 2022. Crossref,
[8] Ma Z., Tavares J. M. R., Jorge R. N., Mascarenhas T. A review of
doi.org/10.14445/22315381/IJETT-V70I12P206
algorithms for medical image segmentation and their applications
[17] Lu, L.; Sun, S.H.; Yang, H.; E, L.; Guo, P.; Schwartz, L.H.; Zhao,
to the female pelvic cavity. Computer Methods in Biomechanics
and Biomedical Engineering. 2010;13(2):235–246. doi: B., “Radiomics Prediction of EGFR Status in Lung Cancer—Our
10.1080/10255840903131878 Experience in Using Multiple Feature Extractors and The Cancer
[9] Senthil Kumar K., Venkatalakshmi K., Karthikeyan K. Lung cancer Imaging Archive Data,” Tomography, vol. 6, no. 2, pp. 223–230, Jun.
detection using image segmentation by means of various evolutionary 2020, doi: 10.18383/j.tom.2020.00017
algorithms. Computational and Mathematical Methods in Medicine. [18] Otsu, N.: A threshold selection method from gray-level histograms.
2019;2019:16. doi: 10.1155/2019/4909846.4909846 Automatica 11, 285–296 (1975)
[10] Avinash S., Manjunath K., Kumar S. S. An improved image [19] Ning, G. Two-dimensional Otsu multi-threshold image segmentation
processing analysis for the detection of lung cancer using gabor filters based on hybrid whale optimization algorithm. Multimed Tools
and watershed segmentation technique. Proceedings of the 2016 Appl 82, 15007–15026 (2023). https://doi.org/10.1007/s11042-022-
International Conference on Inventive Computation Technologies 14041-1
(ICICT); August 2016; Coimbatore, India. pp. 1–6 [20] Chengsheng, Tu & Huacheng, Liu & Bing, Xu. (2017). AdaBoost
[11] Shaziya H., Shyamala K., Zaheer R. Automatic lung segmentation typical Algorithm and its application research. MATEC Web of
on thoracic ct scans using u-net convolutional network. Proceedings Conferences. 139. 00222. 10.1051/matecconf/201713900222.
of the 2018 International conference on communication and [21] Friedman, J.H.: Greedy function approximation: a gradient boosting
signal processing (ICCSP); April 2018; Tamilnadu, India. IEEE; machine. Ann. Stat. pages 1189–1232 (2001)
pp. 643–647 [22] Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data
[12] Wang S., Zhou Mu, Gevaert O., et al. A multi-view deep convolutional Anal. 38(4), 367–378 (2002)
neural networks for lung nodule segmentation. Proceedings of the [23] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., …
2017 39th Annual International Conference of the IEEE Engineering Liu, T.-Y. (2017). Lightgbm: A highly efficient gradient boosting
in Medicine and Biology Society (EMBC); July 2017; Jeju Island, decision tree. Advances in Neural Information Processing Systems,
Korea. IEEE; pp. 1752–1755
30, 3146–3154.
[13] Su Y., Li D., Chen X. Lung nodule detection based on faster r-cnn
[24] Chen, Tianqi, and Carlos Guestrin. "Xgboost: A scalable tree boosting
framework. Computer Methods and Programs in Biomedicine.
system." In Proceedings of the 22nd acm sigkdd international
2021;200 doi: 10.1016/j.cmpb.2020.105866.105866
conference on knowledge discovery and data mining, pp. 785-794.
[14] Griethuysen, J. J. M., Fedorov, A., Parmar, C., Hosny, A., Aucoin,
N., Narayan, V., Beets-Tan, R. G. H., Fillon-Robin, J. C., Pieper, 2016.

Copyright © IEEE–2023 ISBN: 979-8-3503-4233-8 1485

Authorized licensed use limited to: VIT University. Downloaded on March 08,2024 at 18:27:52 UTC from IEEE Xplore. Restrictions apply.

You might also like