Nishajenipher 2020

Proceedings of the Third International Conference on Intelligent Sustainable Systems [ICISS 2020]
IEEE Xplore Part Number: CFP20M19-ART; ISBN: 978-1-7281-7089-3
A Study on Early Prediction of Lung Cancer Using

Machine Learning Techniques
Ms. V. Nisha Jenipher 1 Dr. S. Radhika2
Research Scholar, CSE Department Professor, EEE Department
2020 3rd International Conference on Intelligent Sustainable Systems (ICISS) | 978-1-7281-7089-3/20/$31.00 ©2020 IEEE | DOI: 10.1109/ICISS49785.2020.9316064
Sathyabama Institute of Science and Technology Sathyabama Institute of Science and Technology
Assistant Professor, St. Joseph’s Institute of Technology Chennai, India.
Chennai, India. radhikachandru79@gmail.com
jennisjit@gmail.com
Abstract— Machine learning techniques are being used in cancer Metastases) system. Based on the Tumor size, stage I and stage
research for more than a decade. Nowadays, Machine Learning II are defined. Stage III shows lymph node involvement and in
Algorithms (MLa) can contribute significantly to the area of Lung stage IV, the malignant growth has spread to other parts of the
cancer (LC) research. LC accounts for the highest mortality rate body such as the liver and brain [7]. The term Volume Doubling
across the globe, hence early prediction and classification of cancer
cells can increase the survival rate substantially. Though there are
Time (VDT) is used in LC screening. VDT is characterized as
many algorithms used in the field of neurology, radiology, oncology the time required for a nodule to double its volume. As indicated
for LC prediction, MLa outperforms those algorithms due to their by most investigations a nodule that takes VDT less than 400
accuracy and efficiency. This study first focuses on the workflow days supposed to have a high probability of danger, while a
methodology used by MLa for early prediction and classification of VDT over 500 days can be normal or benign nodule [8][9]. 13 of
LC. The methodologies include selecting the input data, preparing 48 CT Screen detected LC have a VDT greater than 400 days
the data, feature selection and extraction, training and testing the [10].
data, and selecting the best ML technique. Second, a survey report The use of machine learning techniques in LC and in
of the ML algorithms used in LC and their methodologies is also other cancer research [36] is emerging since the 1990s.
presented. Third, the performance metrics such as Accuracy,
Sensitivity, Specificity, Precision, F1 Score, Root Mean Square
According to PubMed statistical data, more than 650 research
Error (RMSE), Confusion Matrix, Area Under the curve (AUC) – articles have been published in detecting LC using Machine
Receiver Operating Characteristics (ROC) curve, Precision-Recall learning techniques.
(PR) curve with different MLa are analyzed. Finally, this study also The objective of this paper is to present a proposed
covers the parameters used in constructing an efficient and system in predicting the early stage of LC using MLa
accurate ML model for the early prediction of LC. Techniques. Section II describes the general workflow in the
related work. Section III presents Future work and finally,
Keywords - Machine learning, Lung cancer, Prediction, Section IV concludes the paper.
Classification, Methodologies, Survey
I. INTRODUCTION
Around the world, lung cancer (LC) remains the main
source of Cancer growth rate and mortality, with so many
millions of new cases and mortality [1][37]. Estimated LC new
occurrence cases and mortality from 1999 - 2020 around the
world [1, 39 – 43] is shown in Fig. 1. Though LC is the most
life-threatening disease, early diagnosis and prediction is the best
way to survive. Smoking is the main cause of cancer in the lungs
and non-smokers, various factors contribute to the risk of LC
[2]. If half of the high-risk individuals of LC were screened, over
12,000 deaths could be prevented [3]. Detection of LC is done
by various tests such as Chest X-ray, Sputum cytology, Low
dose Spiral or helical CT Scan, and Low Dose Computed
Tomography (LCDT). Among various tests, LDCT screening
can decrease LC mortality by 14 to 20 percent in high-risk
populations [4][5], and also LDCT Scan detects smaller lung
tumors at an earlier stage [6]. All Nodules found in the lungs
need not be cancerous. Lung malignancies are extensively
characterized into 3 kinds: Small cell LC (SCLC) and non-small
cell LC (NSCLC) and lung carcinoid tumor.
There are 2 types of staging systems in Lung malignant
growth, one is the number system (Stage I, Stage II, Stage III, Fig. 1 Estimated LC new cases and mortality (1999–2020)
Stage IV), and the other is the TNM (Tumor, Nodes,
978-1-7281-7089-3/20/$31.00 ©2020 IEEE 911
Authorized licensed use limited to: University of Prince Edward Island. Downloaded on June 04,2021 at 09:33:16 UTC from IEEE Xplore. Restrictions apply.
II. RELATED WORK A. Data Collection
Collecting the data is an assortment of information. A
The general workflow for predicting the early stage of dataset is a collection of a solitary information base table or a
LC using MLa consists of 4 phases namely Data Collection, Data solitary measurable information network, where each segment of
preparation, Modeling the MLa, and Evaluation process. Each the table speaks to a specific variable and each column relates to
Phase has a workflow process and the proposed system is a given individual from the informational index being referred
presented in Fig 2. to.
Fig. 2 Proposed system for predicting early LC using MLa
MLa relies intensely upon information and without information, doesn’t match the problem statement.
it is difficult for MLa to learn. Before starting with a collection  Some data collected may have a high or low number of
of data regarding the prediction of LC, one must ensure the samples and therefore insufficient or inadequate
answer to the following questions. representation occurs.
 Data collected should not be prejudice.
 Where are the data for LC exists? The input data fed into MLa for predicting LC can be in any
 How much data needed by MLa? form such as image dataset [23] like MRI or CT scan, big data
 What kind of dataset need to be collected? which can be either structured or unstructured, clinical data of
 What are the data patterns needed by the MLa? patients (CDp), computer data, or a combination of both clinical
data and computer data. Many health records can be used
Need for data Collection [11],[15] to get information about the social and lifestyle data to
 Incorrect prediction occurs when the data collected predict the disease. In recent times, many CDp [12] and registries
978-1-7281-7089-3/20/$31.00 ©2020 IEEE 912
[13] are been used to track the growth of LC cells, diagnosis  Improves the performance and training of MLa
procedures, and death of the patients. They provide variables  Required by MLa to model the data correctly.
[14] that are very much related to a prediction of disease.
Different types of input dataset collected by the researchers for Table II: Data preparation reported in previous study
MLa is depicted in Table I.
Paper Year Data Preparation method
Michael et al. [12] 2015 Missing data dropped
Table I: Dataset used in Previous Study Lingming Yu et al. [16] 2019 Imbalanced dataset - SMOTE Algorithm
Hazra et al. [21] 2017 Missing values imputed
Paper Year No. of CDp / images Used Prashant et al. [23] 2014  Tiny objects are removed
Yu et al. [16] 2019 145 CDp  Morphological closing and Filling.
Yuan et al. [17] 2020 150 CDp  Edge detection.
Wu et al. [19] 2016 350 CDp  Non-local filter and lung Otsu
Hazra et al. [21] 2017 422 CDp thresholding are used in noise
Luna et al. [25] 2020 202 CDp removal.
Naresh et al. [23] 2014 111 Images Jose et al. [25] 2020  Trimmed Scores Regression is used
Ye et al. [27] 2020 518 CDp for handling missing data.
Gu et al. [30] 2019 245 CDp  Synthetic minority oversampling
Luna et al. [31] 2019 203 CDp technique and Backward sequential
Sumathipala et al. [32] 2018 158 CDp feature selection used for minority
oversampling technique.
Ye et al. [27] 2020 Principal components analysis (PCA)
B. Data Preparation Akram et al. [29] 2015  Thresholding
a). Data Cleaning: Data from a large set of multiple  Background removal
sources are been collected and MLa is applied to produce the best  Hole Filing of Lung lobes
result. These data collected initially are in the form of raw data.  Contour correction
Converting the raw data into a feasible dataset is the most  Balancing dataset – up sampling
Dekker et al. [33] 2009 Important data that are missing has been
important and time-consuming task. Pre-processing is done to dropped.
clean, integrate, transform, and reduce the data to make it useful Gunaydin et al. [35] 2019 PCA is used to reduce dimensionality
for future analysis. Certain MLa does not support null values and
hence it is necessary to remove or substitute the value. The C. Feature Selection and Extraction
varies preparation method used in the previous study of MLa is Providing all the features as input to the ML algorithm
given in Table II. will reduce its performance as it learns the prediction from
The real-world data contain irrelevant features. Feature selection is for separating the
 Missing value irrelevant or redundant features from the dataset. The main
 Inconsistent value difference between selection and extraction is that selecting the
 Redundant value features keeps the subset of original features while extracting the
 Errors features creates the new ones. Feature selection (Fs) is done to
 Duplicate values select the features which have a linear relationship with the
output variable. Some of MLa have built-in functions in selecting
Need for Data Cleaning the features e.g. Random forest.
 Data cleaning is done to provide qualified data. For a given set of features in the input dataset
Fi = {f1, f2, ……fn} (1)
 Imbalanced and improper dataset will make an ML
Fs finds the subset that increases the ability of classifying the
algorithm to provide incorrect or false decision.
patterns.
 Missing, inconsistent, redundant value should to
handled initially to overcome future troubles during
Need for feature selection
prediction.
 Avoids overfitting.
 Keeps the ML Algorithms in the right path of
b). Data Normalization
prediction.
Normalization is converting all the features in the
dataset to a similar scale. The transformed value will always end  Improves ML algorithms accuracy and performance.
up between 0 to 1. Normalization is acceptable when the  Feature extraction and selection is done to significantly
distribution of the information doesn't follow Gaussian improve ML algorithm’s performance.
dissemination. This can be helpful in calculations that don't Feature extraction is the process of representing two or more
accept any conveyance of the information like K-Nearest variables into a reduced number of variables without removing
Neighbors and Neural Networks. They create new values that relevant variables. It builds the needed information from the
keep the overall dispersion and proportions in the source existing one by reformatting, transforming, and combining the
information while keeping values inside a scale applied across features into a newer one. The reformed attributes are the linear
numeric sections utilized in the model. combination of original ones. This process provides a smaller
Need for Normalization and richer set of attributes. Reduction of data will allow the MLa
 Needed only the features in the dataset have different to speed the learning process for future prediction. These
ranges. Extraction and selection processes used in MLa for predicting LC
are described in Table III.
978-1-7281-7089-3/20/$31.00 ©2020 IEEE 913
Ye et al. [27] 2020  SVM and PCA for selection
Need for feature extraction:  Random forest algorithm for
extraction
 Reduces complexity of data. Akram et al. [29] 2015  Lung region Extraction
 Removing highly correlated features.  Candidate nodule extraction
 Removes redundant data  candidate nodule pruning
 Increases the speed of ML Algorithm  Hybrid Feature Extractions
 Candidate Nodule
 Up-sampling
The key difference between feature selection and extraction is
that selecting the features keeps the subset of the original
Need for Splitting the data
features while extraction creates the new ones.
1. To improve accuracy and efficiency of ML algorithm
D. Splitting the dataset into Training (Tr) and Testing (Ts) 2. To identify the parameters which has linear relationship
dataset with the output variable.
In Machine learning, the dataset is usually split into Tr 3. To compare the different ML Techniques in different
and Ts dataset to evaluate the performance of MLa. One of the dataset.
main configurations is defining the size of Tr and Ts and there is
no ideal split percentage. Tr is used to train the ML model and E. ML algorithms used to predict LC
adjust the parameter to identify the patterns in the dataset. The
Validation data (Vd) is utilized to guarantee better precision and ML techniques handle the data and find the right model. The
effectiveness of the ML Algorithm. The Ts is utilized to perceive main category of ML methods is supervised learning (SL),
how well the ML techniques can anticipate new answers depend unsupervised learning (USL), and reinforcement learning (RL).
on its preparation.  All SL is a form of classification or Regression.
 USL is valuable when the information is uncertain but
The split percentage should meet objective with certain it needs to be investigated.
consideration such as  RL can be model-free or model-based reinforcement
 Cost in the Tr model learning.
 Cost in the Ts model Many ML techniques such as predictions, classifications,
 Select Tr representatives regression, association, and clustering techniques are been used
 Select Ts representatives. in LC tumor diagnosis. The Splitting of dataset and ML techniques
used in various LC prediction and classification are given in Table
Table III: Feature selection and extraction for LC prediction
IV.
Paper Year Feature selection and Extraction
Yu et al. [16] 2019  Random forest Algorithm for F. Performance Metrics and Result
selection
 Pyradiomics for extraction Different MLa used in predicting early LC and their
Yuan et al. [17] 2020  Monte-Carlo algorithm for selection
 Incremental Feature selection for
performance are calculated by certain performance metrics.
extraction These include Accuracy [38], Sensitivity, Specificity, Precision,
Wu et al. [19] 2016  2 Stage feature selection method F1 Score, RMSE, Confusion Matrix, AUC- ROC curve, PR
(correlation-based feature elimination curve. The description of these metrics is provided in Table V.
and univariate feature selection).
 440 radiomic features are extracted
Luna et al. [25] 2020 Backward Sequential Feature Selection
Junior et al. [26] 2018 2277 quantitative features are extracted
Table IV: MLa model used in Prediction of LC

Paper Year MLa Used Accuracy Splitting Tr and Ts
Yu et al. [16] 2019 Random forests 81% Randomly split of Tr and Ts
Yuan et al. [17] 2020 SVM (Support Vector Machine) 96.7% Randomly split of Tr and Ts
Hart et al. [18] 2018 ANN (Artificial neural network) NA 70% Tr and 30% Vd
Wu et al. [19] 2016 Naive Bayes 72% 198 samples – Tr and 152 samples – Ts
Hazra et al. [21] 2017 Logistic Regression 77.40% 80% Tr and 20 % Ts
Bartholomai et al. [22] 2018 Random Forest and Gradient Boost NA 75 % Tr and 25 % Ts
Machine
Naresh et al. [23] 2014 SVM 95.12% four-fifth for Tr and one-fifth for Ts
Ye et al. [27] 2020 SVM 99.4% 70% Tr and 30% Vd
Radhika et al. [28] 2018 SVM 99.2% Tr and Ts are split using k-fold cross validation
technique
Akram et al. [29] 2015 ANN (Artificial neural network) 96.68% 50% Tr and 25% Ts
Gunaydin et al. [35] 2019 ANN and Decision Tree (DT) 82.43% for ANN and 70% Tr and 30% Ts
93.24% for DT
NA – Not Applicable
978-1-7281-7089-3/20/$31.00 ©2020 IEEE 914
Proceedings of the Third International Conference on Intelligent Sustainable Systems [ICISS 2020]
IEEE Xplore Part Number: CFP20M19-ART; ISBN: 978-1-7281-7089-3
Table V: Performance metrices used in pervious Study.
Metrics Reference Article Description

Accuracy [12,17, 19, 21, 23, 27 – 29, 35]  Measures the performance of a model
 No detailed information provided.
 Perform poorly with class imbalance dataset.
Sensitivity [17-19, 21-23,27,29,30]  It is also termed as Recall or true positive rate.
 Out of correctly classified data, how many are relevant data
Specificity [17-19,22,23,29,30]  It is also termed as true negative rate.
 It is defined as the proportion of actual negatives, which got predicted as the true negative.
Precision [16,17,21,23,24,27] Out of all data’s, how many data are correctly classified by the classifier.
F1 Score [17,21,27]  It’s the function of Precision and Recall.
 F1 Score finds the balance between precision and recall.
 A better F1 score has low false positives and low false negatives.
RMSE [20,22]  RMSE is a standard way to measure the error of a model.
 It determines the accuracy of the model.
Confusion Matrix [17,22,23]  Provides the performance of a classifier on test data for which the true values are known.
 Provides visual performance of an algorithm.
AUC – ROC Curve. [12,18,19,25-27,27,29,30-33]  Used for performance measurement of multi class classification problem.
 Used to compare the performance of different models
PR Curve [16,17,21]  Evaluate output quality of the classifier
 Precision-Recall is a useful when the classes are imbalanced.
III. FUTURE WORK REFERENCES
To construct an efficient and accurate ML model for early [1] Bray F, Ferlay J, Soerjomataram I, Siegel R L, Torre L A, Jemal A,
LC, the model can be developed with the following parameters. “Global Cancer Statistics 2018”, doi: 10.3322/caac.21492
[2] V Noronha , R Dikshit, N Raut, A Joshi, C S Pramesh, K George, J P
 Data should be collected from large and highly qualified
Agarwal, Munshi A, Kumar P, “ Epidemiology of lung cancer in India:
authorized centers. “e.g.” www.cancerimagingarchive.net Focus on the differences between non-smokers and smokers”, 2012.
 Data collected should be preprocessed by a powerful volume:49, Page: 74-81.
technique such that no important data is lost. [3] Cheung LC, Katki H A, Chaturvedi A K, Jemal A, Berg C D,
 Highly correlated Features with the output should be “Preventing Lung Cancer Mortality by Computed Tomography
Screening: The Effect of Risk-Based Versus U.S. Preventive Services
identified for best results.
Task Force Eligibility Criteria”, doi:10.7326/M17-2067.
 Using the Hybrid ML model, early prediction of LC can [4] Aberle D R, Adams A M, Berg C D, et al , “ Reduced Lung-Cancer
produce accurate results. Mortality with Low-Dose Computed Tomographic Screening”,
 Several ML tools and various platforms can be made doi: 10.1056/NEJMoa1102873.
[5] Koning H J D, Meza R, Plevritis S K, Haaf K T, Munshi V N, Jeon J,
available for researchers to provide good results. et.al, “Benefits and Harms of Computed Tomography Lung Cancer
 There are also many data analytical tool that can provide Screening Strategies: A Comparative Modeling Study for the U.S.
useful information for future data analysis. Preventive Services Task Force”, 2014; 160(5):311-20.
[6] Swensen J S , Jett J R , Hartman T E, Midthun D E , Mandrekar J S
IV. CONCLUSION , Hillman S L , Sykes A M , Aughenbaugh G L , Bungum A O , Allen
K L , “ CT Screening for Lung Cancer: Five-year Prospective
Currently, ML Algorithms play a significant role in Experience”, 2005.Vol. 235, 259–265.
early LC prediction, and with the help of these techniques [7] www.cancerindia.org.in
available data can be used to make predictions or decisions. [8] Kanashiki M, Tomizawa T, Yamaguchi I, Kurishima K, Nobuyuki H,
Ishikawa H, Kagohashi K, Satoh H, “Volume doubling time of lung
Study work provided a proposed system followed by MLa in cancers detected in a chest radiograph mass screening program”, doi:
predicting early LC which provides the researcher with better 10.3892/ol.2012.780
knowledge in ML Technique for early prediction of LC. [9] Gagne H M, Nelson C J, Kinsey M, Garrison G, Kikut J, Gentchos G,
Moreover, to make ML approaches easier in the field of Seward D, Sidiropoulos N, Folefac E, Leavitt B, Ashikaga T, Dragnev
LC, different types of the dataset used, various data K, Lin S H, Anker C J , “ Effect of Tumor Volume Doubling Time on
preprocessing methods implemented, essential features that are Prognosis for Stage I Non–small Cell Lung Cancers”, Volume 99,
been selected and extracted are been explained in detail. Also, Issue 2, E487, October 01, 2017.
the performance of different MLa is evaluated. The parameters [10] Lindell R M, Hartman T E, Swensen S J, Jett J R, Midthun D E,
used in constructing an efficient and accurate ML model for Tazelaar H D, Mandrekar J N, “Five-year Lung Cancer Screening
Experience: CT Appearance, Growth Rate, Location, and Histologic
early prediction of Lung cancer is a piece of additional
Features of 61 Lung Cancers”, 2007 Feb; 242(2):555-62.
information. This study will help the researchers to identify ML [11] Richter A N, Khoshgoftaar T M , “ A review of statistical and machine
techniques that produces more accuracy and efficiency in the learning methods for modeling cancer risk using structured clinical
field of LC. data”, https://doi.org/10.1016/j.artmed.2018.06.002.
978-1-7281-7089-3/20/$31.00 ©2020 IEEE 915
[12] Hassett M J, Uno H, Cronin A M, Carroll N K, Hornbrook M C and [29] Akram S, Javed M Y, Qamar U, Khanum A, Hassan A. “Artificial
Ritzwoller D, “Detecting Lung and Colorectal Cancer Recurrence Neural Network based Classification of Lungs Nodule using Hybrid
Using Structured Clinical/Administrative Data to Enable Outcomes Features from Computerized Tomographic Images”,
Research and Population Health Management”, Doi: http://dx.doi.org/10.12785/amis/090124 , Appl. Math. Inf. Sci. 9, No.
10.1097/MLR.0000000000000404. 1, 183-195 (2015).
[13] Wilson D O, Weissfeld J L, Fuhrman C R, Fisher S N, Balogh P B, [30] Gu Q, Feng Z, Lianga Q, Li M, Deng J, Ma M, Wang W, Liu J, Liu P,
Landreneau R J, Luketich J D, Siegfried J M, “The Pittsburgh Lung Rong P , “ Machine learning-based radiomics strategy for prediction of
Screening Study (PLuSS) Outcomes within 3 Years of a First cell proliferation in non-small cell lung cancer”,
Computed Tomography Scan”, Am J Respir Crit Care Med Vol 178. https://doi.org/10.1016/j.ejrad.2019.06.025.
pp 956–961, 2008. [31] Luna J M , Chao H H , Diffenderfer E S, Valdes G , Chinniah C , Ma
[14] Williams A W, Tammemagi M C, Mayo J R, Roberts H, Liu G et al, G , Cengel K A , Solberg T D, Berman A T , Simone II C B , “
“Probability of Cancer in Pulmonary Nodules Detected on First Predicting radiation pneumonitis in locally advanced stage II–III non-
Screening CT. N Engl J Med 2013;369:910-9”, DOI: small cell lung cancer using machine learning”, Radiotherapy and
10.1056/NEJMoa1214726. Oncology, Volume 133, April 2019, pages 106 -112.
[15] Ferro J C, Olivera M D, Janela F, Martins H M G, “Preprocessing [32] Sumathipala Y, Shafiq M, Bongen E, Brinton C, Paik D, “Machine
structure clinical data for predictive modeling and decision support: A Learning to Predict Lung Nodule Biopsy Method Using CT Image
roadmap to tackle the challenges”, Appl Clin Inform 2017; 08(01): Features,” https://doi.org/10.1016/j.compmedimag.2018.10.006.
122-123.doi: 10.4338/ACI-2016-03-SOA-0035e. [33] Dekker A, Oberije C D, Hope A, Komati K, Fung G, Yu S, Neve W D,
[16] Yu L, Tao G, Zhu L, Wang G, Li Z, Ye J and Chen Q, “Prediction of Lievens Y, “Survival Prediction in Lung Cancer Treated with
pathologic stage in non-small cell lung cancer using machine learning Radiotherapy”, doi: 10.1109/ICMLA.2009.92.
algorithm based on CT image feature Analysis”, [34] Gliklich R E , Leavy M B , Dreyer N A , “ Tools and Technologies for
https://doi.org/10.1186/s12885-019-5646-9. Registry Interoperability, Registries for Evaluating Patient
[17] Yuan F, Lu L, Zou Q, “Analysis of gene expression profiles of lung Outcomes.2019 Oct. Rockville (MD): Agency for Healthcare Research
cancer subtypes with machine learning algorithms. BBA - Molecular and Quality (US)”, 19(20)-EHC017-EF.Report No:19(20)-EHC017-
Basis of Disease”, https://doi.org/10.1016/j.bbadis.2020.165822. EF.
[18] Hart G R, Roffman D A, Decker R, Deng J, “ A multi-parameterized [35] Gunaydin O, Gunay, Oznur Sengel , “ Comparison of Lung Cancer
artificial neural network for lung cancer risk prediction”, Detection Algorithms ”, doi: 10.1109/EBBT.2019.8741826.
https://doi.org/10.1371/journal.pone.0205264. [36] M. Shalini and S. Radhika, “Machine Learning techniques for
[19] Wu W, Parmar C, Grossmann P, Quackenbush J, Lambin P, Bussink J, Prediction from various Breast Cancer Datasets”, doi:
Mak R and Aerts H J W L, “Exploratory Study to Identify Radiomics 10.1109/ICBSII49132.2020.9167657.
Classifiers for Lung Cancer Histology”, doi: 10.3389/fonc.2016.00071. [37] Vijayakumar, T. "Neural network analysis for tumor investigation and
[20] Lynch C M, Abdollahi B, Fuqua J D, de Carlo A R, Bartholomaic J A, cancer prediction." Journal of Electronics 1, no. 02 (2019): 89-98.
Balgemann R N, Van Berkel V H, Frieboes H B, “Prediction of lung [38] Shakya, Subarna. "Analysis of Artificial Intelligence based Image
cancer patient survival via supervised machine learning classification Classification Techniques." Journal of Innovative Image Processing
techniques”, http://dx.doi.org/10.1016/j.ijmedinf.2017.09.013. (JIIP) 2, no. 01 (2020): 44-54.
[21] Hazra A, Bera N, Mandal A, “Predicting Lung Cancer Survivability [39] M Parkin, Pisani P, Ferlay J, “Global cancer statistics Vol. 49”, C A
using SVM and Logistic Regression Algorithms”, International Journal Cancer J clin 1 9 9 9; 4 9: 3 3 - 6 4.
of Computer Applications (0975 – 8887) Volume 174 – No.2, [40] D M Parkin et al., “Global Cancer Statistics in the year 2000”, The
September 2017 Lancet Oncology Vol 2 September 2001
[22] Bartholomai J A, Frieboes H B, “Lung Cancer Survival Prediction via [41] D M Parkin et al., “Global cancer statistics 2002”, A Cancer Journal
Machine Learning Regression, Classification and Statistical for Clinicians
Techniques. ISSPIT 2018”, doi: 10.1109/ISSPIT.2018.8642753. [42] Ahmedin J DVM et al., “Global Cancer Statistics”, CA CANCER J
[23] Naresh P, Dr. Shettar R, “Early Detection of Lung Cancer Using CLIN 2011;61:69–90.
Neural Network Techniques”, ISSN: 2248-9622, Vol. 4, Issue [43] Lindsey A T et al., “Global Cancer Statistics 2012”, CA CANCER J
8(Version 4), August 2014, pp.78-83. CLIN 2015;65:87–108.
[24] Alam J, Alam S , Hossan A, “ Multi-Stage Lung Cancer Detection and
Prediction Using Multi-class SVM Classifier ”, Feb. 2018.
[25] Luna J M, Chao H H, Shinohara R sT, Ungar L H, Cengel K A, Pryma
D A , Chinniah C , Berman A T , Katz S I , Kontos D, Simone II C B ,
Diffenderfer E S , “Machine learning highlights the deficiency of
conventional dosimetric constraints for prevention of high-grade
radiation esophagitis in non-small cell lung cancer treated with
chemoradiation ”, Volume 22, May 2020, Pages 69-75.
https://doi.org/10.1016/j.ctro.2020.03.007.
[26] Junior J R F, Koenigkam-Santos M, Cipriano F E G, Fabro A T, de
Azevedo-Marques P M, “Radiomics-based features for pattern
recognition of lung cancer histopathology and metastases”, Computer
Methods and Programs in Biomedicine 159 (2018) 23–30.
[27] Ye Z, Sun B, Xiao Z, “Machine learning identifies 10 feature miRNAs
for Lung squamous cell carcinoma”, Gene. Volume 749, 30 July
2020,144669.
[28] Radhika P R, Nair R A S, “A Comparative Study of Lung Cancer
Detection using Machine Learning Algorithms”,
doi: 10.1109/ICECCT.2019.8869001.
978-1-7281-7089-3/20/$31.00 ©2020 IEEE 916

Nishajenipher 2020

Uploaded by

Copyright:

Available Formats

You might also like

Nishajenipher 2020

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Nishajenipher 2020

Uploaded by

Copyright:

Available Formats

Proceedings of the Third International Conference on Intelligent Sustainable Systems [ICISS 2020]

IEEE Xplore Part Number: CFP20M19-ART; ISBN: 978-1-7281-7089-3

A Study on Early Prediction of Lung Cancer Using

978-1-7281-7089-3/20/$31.00 ©2020 IEEE 911

Fig. 2 Proposed system for predicting early LC using MLa

978-1-7281-7089-3/20/$31.00 ©2020 IEEE 912

978-1-7281-7089-3/20/$31.00 ©2020 IEEE 913

Table IV: MLa model used in Prediction of LC

978-1-7281-7089-3/20/$31.00 ©2020 IEEE 914

Table V: Performance metrices used in pervious Study.

Metrics Reference Article Description

III. FUTURE WORK REFERENCES

978-1-7281-7089-3/20/$31.00 ©2020 IEEE 915

978-1-7281-7089-3/20/$31.00 ©2020 IEEE 916

You might also like