13th ICCCNT 2022 Paper 173

Comparative Study of Cancer Classification by
Analysis of RNA-seq Gene Expression Levels

Amruth A1, Ramanan R1, Rhea Paul1, Sarada Jayan2, , Amrita Thakur3*, Nidhin Prabhakar TV1
1
Dept. of Computer Science and Engineering, Amrita School of Computing 2 Dept. of Mathematics, 3Dept. of Chemistry
Amrita School of Engineering, Bengaluru, Amrita Vishwa Vidyapeetham, India
*mail: t_amrita@blr.amrita.edu
Abstract: Cancer classification is in the spotlight of research classes [2]. Genetic inheritance is one of the critical factors in
in the medical domain; genetic inheritance plays a significant causing the disease; hence certain similarities of DNA
role in causing this condition. Certain similarities in DNA sequence have to be present in individuals who have cancer.
sequence can be prevalent in individuals who have cancer. The
foremost intention of this paper is to put forth efficient cancer There have been numerous attempts to use gene expression
classification techniques that provide steady and substantial profiles to increase the precision of tumor categorization. The
accuracy. An unconventional factor that is RNA-seq gene Cancer Genome Atlas Research Network (TCGA) [3] has
expression levels are considered here as opposed to familiar analyzed and profiled an enormous number of cancerous
factors such as physical features or results of various imagery tumors in order to discover molecular abnormalities at the
techniques. The classification is conducted by analyzing RNA- genetic levels. The data generated as a result of this project
seq Gene Expression levels measured by a powerful sequencing
system called the Illumina Hiseq platform. A comparative study
provides an immense opportunity to understand the well-
of the application of various machine learning algorithms - defined similarities, variances, and prevalent themes across
Decision Tree, Random Forest, SVM, KNN, Naïve Bayes, tumor lineages. The dataset generated as a result of this
Multinomial Regression is conducted in this paper. This method project has been used for Cancer Classification in this paper.
significantly reduces the computational complexity as against An unconventional factor, that is RNA-seq gene expression
deep neural network approach which is used with various levels are considered here as opposed to familiar factors such
imagery techniques. Experimental observations suggest that
as physical features or results of various imagery techniques.
SVM provides higher accuracy, cross-validation score, nearly
ideal AUC-ROC curves, and better performance concerning the Analysis of complete human DNA sequence is a highly time-
time required to fit the model and subsequently predict the consuming and resource-consuming task. Most of it would be
cancer type. redundant for the context; hence, mRNA derived from the
DNA sequence (coding strand of DNA) using transcription is
Keywords: Cancer Classification, Machine learning, principal widely used for research purposes. The TCGA project has
component analysis, Random Forest, Decision Tree, SVM, Naïve
profiled DNA sequences of thousands of patients who are
Bayes, Multinomial Regression, KNN
suffering from various types of cancer. The dataset used in
this paper consists of five types of cancerous tumors. They are
I. INTRODUCTION
breast carcinoma (BRCA), kidney renal carcinoma (KIRC),
Cancer is broadly identified as a group of related diseases lung adenocarcinoma (LUAD), prostate adenocarcinoma
that involve unusual cell growth with an alarming potential to (PRAD) and colon adenocarcinoma (COAD). The foremost
divide at a nearly exponential rate without stopping and motive of this paper is to put forth a highly efficient cancer
spreading into surrounding tissues. In 2019, cancer was the classification method by in-depth analysis of RNA-seq Gene
leading cause of deaths globally after cardiovascular diseases. Expression levels measured by a powerful sequencing system
As it is one of the most complex health problems and related called the Illumina Hiseq platform.
to high mortality and morbidity, detection, and diagnosis of
cancer in its primitive stage is of the utmost importance for its II. RELATED WORK
cure. Over the past few decades, there has been a tremendous There has been plenty of research on machine learning
evolution in the techniques used for cancer prediction and algorithms/techniques for their applications in various
classification. With the ultrarapid development of computer- medical domains, they have been reviewed in this section.
aided methodologies in recent years, data mining and machine One of the related work [4] applies and compares ML
learning have played a significantly important role in cancer algorithms to distinguish benign and malignant breast cancer
classification, prediction, and diagnosis. The intelligent tumour. The ML models used are k Nearest Neighbours,
technology enables the researchers to perform highly Binary SVM, decision tree, and Adaboost. The features of the
complex mathematical calculations and develop various dataset are fed into NCA (Neighbourhood Component
prediction/classification algorithms which provide reliable Analysis) -a feature selection model. The primary purpose of
and significant accuracy. Machine learning models show NCA is to reduce the complexity of the model by decreasing
promising potential [1]; Naïve Bayes, Decision Tree, K- the features used to train the model. Another study applies
Nearest Neighbours, Multinomial Logistic Regression, and compares seven ML models on a breast cancer survival
dataset having eight features and a sample of 900 patients.
Random Tree, and Support Vector Machine have expressed
The machine learning models used in this study are Support
positive outcomes in classifying higher-dimensional, real-
Vector Machine (SVM), Trees Random Forest (TRF), Naive
time data. Another approach is to use deep learning methods;
Bayes (NB), AdaBoost (AD), RBF Network (RBFN), 1-
Multiclass Breast Cancer Classification using Convolutional Nearest Neighbour (1NN), and Multilayer Perceptron (MLP).
Neural Network has been achieved using images from The authors concluded that Random Forest showed
BreakHis dataset. The study classified 7909 images into 8
promising results in successful prediction and declared that learning models and the metrics for each learning model is
TRF was the best model for the chosen dataset [5]. A detailed obtained. These metrics such as Confusion matrix, Accuracy,
study has been conducted by authors in [6] related to all the Sensitivity, Precision, Cross Validation Scores, and AUC-
approaches followed in lung cancer prediction to date. They ROC Curves are visualized and compared to find the most
have compared and highlighted the pros and cons of the accurate and suitable machine learning classifier model for
ML/AI algorithms used. It has been concluded that the dataset. The time complexities for each of these machine
Convolutional neural networks trained with Deep Learning learning models also help in determining the best machine
methods provide highly accurate results. An efficient brain learning algorithm.
cancer classification technique by using various machine
learning algorithms has been proposed in [7]. The models
used in this study are Hybrid Classifier (SVM-KNN), K-
Nearest Neighbour (KNN), and Support Vector Machine
(SVM). The authors of this paper have inferred that the
hybrid classifier SVM-KNN is the most suitable model for
analysing the brain cancer MRI dataset. The prediction
accuracy is around 98%. [8] is a comparative study of the
effectiveness of 3 different classifiers for the prediction of
colon cancer by analysis of gene expression levels. The
authors have concluded that Support Vector Machine, along
with t-statistic based feature selection, is one of the most
effective methodologies which has a high prediction
accuracy.
III. MOTIVATION
Deep Learning can extract features from a dataset without
needing to specifically build feature extractors; for example: Fig. 1. Flowchart of the process
Images of breast cancer are categorized using a convolutional
neural network. The accuracy achieved from this model is
A. Principal Component Analysis
fairly acceptable (around 73%). The accuracy can be greatly
improved upon, and time complexity can be significantly PCA is a dimensionality reduction technique that is
reduced by using deep learning approaches. Machine primarily used in applications like machine learning and data
Learning method of approach reduces the computational science [11]. It is extensively used on a massive dataset
complexity as opposed to deep neural network approach, containing statistical distribution. It uncovers the low-
which is used when dealing with various imagery techniques. dimensional patterns of a dataset so that we are able to build
Exploring such machine learning based classification models models off of them, thereby increasing interpretability and
helps analyze the depth and spread of valuable information simultaneously maintaining information loss [12].
over the dataset and preserving time complexity at the same Mathematically derived PCA amounts to the Singular Value
time. Use of RNA-seq gene expression levels provides Decomposition of a column-centred data matrix [12]. The log
quicker and accurate results as compared to data set involving and variance scale are plotted for the given data to find the
physical features. optimal number of principal components required to reduce
IV. METHODOLOGY the complexity of the dataset, and at the same time, capture
maximum data [9].
Several types of Machine Learning classification models
are used for this comparative study. The data must be cleaned
first [1] by replacing the 'NA' (Not Available) elements with
the mean of the particular column to prevent outliers and
increase the accuracy/ reliability of the data. In order to
minimise the dimension, principal component analysis (PCA)
has been utilised which help reduce time complexity for
classification [9]. The data acquired from [10] contains 800
rows and 20531 columns in the paper. The data of 800
patients are arranged in the form of 20531 genes(features),
and each column has a different gene sequence. This
multiclass dataset contains five output classes based on the
gene sequence detected, giving us the different types of Fig. 2. Log and Variance scale for different components of PCA
tumours. The tumours are labelled as 'PRAD':0, 'LUAD': 1,
'BRCA': 2, 'KIRC': 3 and 'COAD': 4. After using PCA to From “Fig.1” it is found that 20 components are optimal for
reduce dimensionality, the data is scaled and normalized. capturing the crucial data. The first two components are used
Further, depending on the factors obtained, various machine to plot it on a 2-Dimensional plane while concurrently
learning models are selected to accurately classify the type of capturing a good amount of data.
cancer. The scaled dataset is fitted onto these machine
The PCA reduced data is then split into train and test data no assumptions made on the functional form of the problem;
with a test data size of 25% (optimal test size for all the however, it works better when the data has lesser dimensions
considered ML models) [12]. Finally, it is scaled accordingly and is noise-free, which is hard to come by in real-life
to provide stable data and reduce generalization errors [13]. scenarios [15]. By rescaling and cleaning the data, we reduce
Before moving on to fitting the dataset into different Machine this problem. Upon fitting the dataset, the KNN classifier
Learning Classifiers, the scatter plot showing the scaled data gives an accuracy of 92.53%.
elements for different cancer types is plotted as in “Fig.2”.
D. Multinomial Logistic Regression
Logistic regression is the technique of modelling the
probability of a discrete outcome when an input variable is
provided. Although standard logistic regression models have
a binary outcome, we consider Multinomial logistic regression
for the given dataset since we can have more than two possible
outcomes [16]. Several aspects such as easy handling of
outliers and no assumptions related to independence among
the dependent variables help consider it as a choice to help
classify and predict using this dataset [15]. At the same time,
problems like producing erroneous results when there are
more categories for separation and this algorithm may lead to
overfitting if the number of observations is lesser than the
number of features, which is true in the case of the given
dataset. Upon fitting the dataset, the Multinomial Logistic
Regression classifier gives an accuracy of 99.50%.
E. Decision Tree
The Decision tree classification algorithm is a visual
classifier that is tree-structured. The internal nodes represent
the dataset features, branches constitute decisions, and each
leaf node constitutes the outcome. The algorithm for Random
Forest is as follows
Fig. 3. Scatter Plot of 2 components of PCA for different Cancer types 1. The root node of the tree, which holds the entire
dataset, is the first node.
2. The dataset's best attribute is found using
B. Naive Bayes
Attribute Selection Measure.
Naïve Bayes is a classification technique based on the 3. The root node is divided into subsets that could
Bayes Theorem. Naïve Bayes surmises independence possibly contain the best values for the chosen
between predictors. It is said to perform well for multiclass attributes.
prediction with a small training dataset with relatively fewer 4. Decision tree node is generated.
dimensions. Although less training data is required, Naïve 5. Using the produced subgroups, new decision
Bayes assumes independent predictors, which is almost trees are recursively built. This is continued until
impossible in real-life scenarios. Therefore, it is a lousy we are unable to further classify the nodes. This
estimator and would not be ideal for representing the given node is the final node also known as the leaf
dataset. Upon fitting the dataset, The Naïve Bayes gives an node
accuracy of 98.50% . This algorithm is suitable for interpreting data visually and
C. K-Nearest Neighbours works best when the entire dataset can be featured. It can also
automatically handle missing values and outliers and does not
KNN algorithm is a classifying algorithm. Predictions for
require scaling. However, this algorithm is highly unstable
the new data instance are made when looking for the (K) most
when new data points are added and unsuitable for large
comparable cases over the entire training set. Then, the output
datasets, thus making it impractical for the given dataset.
variables for all the similar instances are summarized [14].
Upon fitting the dataset, the Decision Tree classifier gives an
KNN follows the below algorithm.
accuracy of 96.51%.
1. The value of K (number of nearest neighbours)
is determined. F. Random Forest
2. The Euclidean distance of K nearest neighbours Random Forest builds a group of decision trees trained
is calculated. with the bagging method (employs a combination of learning
3. K nearest neighbours is found and the data points models to increase the gross accuracy of the result) [11]. The
in each of the category are labelled. algorithm for Random Forest is as follows
4. The new data points to that category are then 1 The number of K data points is selected from the
allocated for which the K number is maximum training set
With the help of Euclidean Distance, we have found which 2 The decision tree associated with the selected
instances are most similar to a new input/test data. There are points are drawn as subsets
3 The number (N) of decision trees to build are
chosen and steps 1 and 2 are continued
4 The new data points are allocated to the category
which has the majority votes
Random Forest classifier supports diversity and higher-
dimensional data, increasing the stability of the result as it is
based on majority / averaging [16]. The model prevents
overfitting by including additional randomness while
growing the “trees”. The algorithm finds the top feature from
a random subset of features.. This results in a better model
[15]. Unlike decision trees, the Random Forest classifier is fit
for use in multiclass problems. These algorithms increase
time complexity, and although they are fast to train, they are
pretty slow for predictions. Upon fitting the dataset, the
Random Forest classifier gives an accuracy of 99.50%
G. Support Vector Machine
Support Vector Machine (SVM) is a supervised
classification and regression algorithm used for classification
and regression [11]. It aims in identifying an N-dimensional
space hyperplane that clearly categorises the provided data
points. The number of features in the dataset affects the
hyperplane dimension. Therefore, the objective is to find the
points closest to the plane/ line from each class. Support
vectors, which are the points that are closest to the line, are
separated from the line by a distance called margin.
Since SVM is focused on the aspect of maximizing the Fig. 4. HeatMap of Confusion Matrix for Machine Learning Models
margin, the algorithm is robust to outliers. It is also very
effective when a higher dimensional dataset with many B. Line Graph of Precision
features is considered. Although the required training time The precision values can easily be calculated for each class
may be higher, it is memory efficient. As a result, it can be for each model by using Confusion Matrix. Precision is the
regarded as a viable option for classifying our dataset because ratio between true positives and total positive predictions
it performs best when there are more dimensions than obtained [18]. Using a line graph will show each class's
samples (20531 > 800). SVM also ensures optimality as it has gradual variation of precision. Thus, we can compare the
the nature of convex optimization. Upon fitting the training model's effectiveness on each class as a classifier and a
dataset and subsequent predictions of cancer type using the predictor. Although we can calculate precision through the
test dataset, using non-linear SVM, the SVM classifier gives confusion matrix, the classification_report function of the
an accuracy of 100%. After fitting the data into these models, sklearn sckit package makes it easier to obtain precision,
recall, f1-score, and accuracy values of each class [16].
the data is analysed using several data visualization
techniques for several parameters to check which model will
be most suitable to classify and predict the given dataset [15].
V. DATA VISUALIZATION
A. Heatmap of Confusion Matrix

Heatmaps are an excellent way to visualize numerical
values. It shows the magnitude of a phenomenon as colour in
two dimensions. Furthermore, it shows how the phenomenon
varies over a range. A confusion matrix is a 2D array
comparing predicted values to true values. It allows us to
easily measure factors such as Accuracy, Precision, Recall,
and AUC-ROC curve. In a multiclass distribution, it is an
important indicator to show the effectiveness of the predicted
values. For the given dataset, there are five classes present.
Hence, a 5x5 confusion matrix is obtained. Therefore, a
heatmap is most suitable to plot the confusion matrix for each
model and observe the variations.
Fig. 5. Line Graph of Precision values for each class fitted with different
ML models
C. Bar Graph for Accuracies and Sensitivities
Bar Graphs represent categorical data with each 'bar'
height proportional to the value they represent. Here the
accuracy and sensitivity of each model's predicted value are
indicated using the bar graphs. Accuracy is the ratio of
predictions that the particular model predicted correctly, but
accuracy is not always a reliable indicator of a model's ability
to make accurate predictions. Therefore, the model's
sensitivity is also plotted as a bar graph. Sensitivity or recall
computes the amount of actual positive cases that got
predicted as positive [18].
Fig. 7. AUC-ROC Curves for Random Forest and SVM ML model
E. Bar Plot for Time Complexity

Observations can be derived from the above-mentioned
comparison parameters that Random-Forest and SVM provide
relatively high accuracy and cross-validation score. Another
Fig. 6. Bar Graph for Accuracy and Sensitivity for each ML model vital factor to consider when dealing with DNA sequence
analysis is the time factor. The time factor can be vaguely
defined here as the time taken by an algorithm to fit the model
D. AUC - ROC curve for the suitable classifiers
and predict the required outcomes. An efficient machine
The AUC (Area Under the Curve) – ROC (Receiver learning model should have a low time factor to carry out the
Operating Characteristics) curve is used to visualize the process effectively. The time factors of the SVM and
performance of multi-classification problems. AUC Random-Forest models are experimentally obtained,
represents the extent of multi class separability, and ROC is a tabulated, and visualized using Box plots. The time taken to
probability curve. It portrays to which degree a model can fit the model for different numbers of principal components is
differentiate between classes. If the AUC value is higher, the compared.
model can predict between different classes more precisely.
The curve is plotted with True Positive Rate (Recall) vs False
Positive Rate (1 – Specificity). If the AUC value comes to 0.5, TABLE I. THE TIME COMPLEXITY OF RANDOM FOREST VS SVM
the model is said to have no class separation capacity. For a Components Random Forest (sec) SVM (sec)
multiclass model, each class will be plotted against the other
classes. Therefore, for a multiclass model, the number of ROC 10 0.126 0.0095
curves will be equal to the number of classes.
20 0.1457 0.0179
50 0.2088 0.0344
80 0.2314 0.0484
100 0.2685 0.0535
200 0.3598 0.0688

300 0.4604 0.0848
0.5392 0.1049 Macro average of 0.76 1.00
400
AUC
500 0.56 0.1203
Time Complexity 0.1457 sec 0.0179 sec
600 0.6091 0.1296 (20 PCA
components)
700 0.7089 0.1477
VII. CONCLUSION
800 0.808 0.152
SVM and Random-Forest prove to be competent
classifying methodologies for the DNA sequence dataset with
accuracy 100% and 99.50% respectively. However, the time
complexity of Random-forest increases at a higher
exponential rate with an increase in the number of principal
components considered compared to that of SVM. When 20
principal components are considered SVM is approximately 8
times faster than Random Forest. AUC-ROC curves of SVM
show relatively ideal characteristics. However, Fig 3.
Scatterplot shows that linear SVM is insufficient to
distinguish between the classes; as the degree of closeness and
overlapping increases, the need to use non-linear SVM
increases. Non-linear SVM comes with its own set of
problems like overfitting. It can be kept in check by using
cross-validation and AUC-ROC scores. Hence, with the
consideration of time complexity and class separable
capabilities, it can be safely concluded that non-linear SVM
has a better performance in classifying BRCA, LUAD,
PRAD, KIRC, and COAD cancer types. Cancer, as is known,
has been evolving and grown to be significantly dangerous.
Learning and updating information and using efficient
predictors in the early stages have proven to help reduce the
seriousness and fatality rates of the disease. DNA consists of
all the information, further research has to be conducted to
uncover all its secrets which will prove helpful to combat life
threatening diseases. With the advancement of computer
Fig. 8. Box plot for Time comparison between SVM and Random Forest technologies, it is now possible.
VI. DATA INTERPRETATION VIII.FUTURE WORK

From the following data visualization and classification Analysis of complete human DNA sequence is a high time-
reports, we can infer that all the classifiers used in this consuming and resource-consuming task. There are few
comparative study have 95% or higher accuracy. However, distributed computing platforms such as Hadoop, Spark, etc.
several other factors such as precision, sensitivity which can greatly reduce computational time by using
measurements, and cross-validation scores indicate that parallel computing and efficient usage of available hardware
Random Forest and SVM classifying techniques are relatively resources. Hence the time taken for the whole process of
more reliable for analysing the mRNA sequence dataset. cancer classification can be significantly reduced by using the
Furthermore, the AUC-ROC curves of SVM show higher above mentioned distributed computing platforms.
class separable capabilities when compared to Random-
Forest's. Additionally, time complexity increases
exponentially with the increase in the number of principal REFERENCES
components considered for the analysis; the time factor of the [1] Kiran Kumar M., Divya Udayan J., Ghananand A.
Random-Forest model increases at a substantially higher (2021) “Efficiency of Different SVM Kernels in
exponential rate when compared to that of SVM. Predicting Rainfall in India.” In: Goyal D., Bălaş V.E.,
Mukherjee A., Hugo C. de Albuquerque V., Gupta A.K.
(eds) Information Management and Machine
TABLE II. RANDOM FOREST VS SVM Intelligence. ICIMMI 2019. Algorithms for Intelligent
Systems
Random Forest SVM [2] P. T. Nguyen, T. T. Nguyen, N. C. Nguyen and T. T. Le,
"Multiclass Breast Cancer Classification Using
Accuracy 99.50% 100% Convolutional Neural Network," 2019 International
Symposium on Electrical and Electronics Engineering
Sensitivity 98.92% 99.42% (ISEE), 2019, pp. 130-134, doi:
10.1109/ISEE2.2019.8920916.
Cross-validation 99.68% 99.68% [3] Weinstein, J., Collisson, E. et al. “The Cancer Genome
score Atlas Pan-Cancer analysis project”. The Cancer Genome
Atlas Research Network, Nat Genet 45, 1113–1120
Micro average of 0.71 0.77 (2013). https://doi.org/10.1038/ng.2764
AUC [4] S. Laghmati, B. Cherradi, A. Tmiri, O. Daanouni, and S.
Hamida, "Classification of Patients with Breast Cancer
using Neighbourhood Component Analysis and Strategies for the Evaluation of Recommendation
Supervised Machine Learning Techniques," 2020 3rd Models.” Fourteenth ACM Conference on
International Conference on Advanced Communication Recommender Systems. Association for Computing
Technologies and Networking (CommNet) Machinery, New York, NY, USA
[5] Montazeri M, Montazeri M, Montazeri M, Beigzadeh A. [14] C. Thallam, A. Peruboyina, S. S. T. Raju and N.
“Machine learning models in breast cancer survival Sampath, "Early Stage Lung Cancer Prediction Using
prediction.” Technol Health Care. Various Machine Learning Techniques," 2020 4th
[6] Kadir T, Gleeson F. “Lung cancer prediction using International Conference on Electronics,
machine learning and advanced imaging Communication and Aerospace Technology (ICECA)
techniques.” Transl Lung Cancer Res. [15] Shah, K., Patel, H., Sanghvi, D. et al. “A Comparative
[7] K. Machhale, H. B. Nandpuru, V. Kapur and L. Kosta, Analysis of Logistic Regression, Random Forest and
"MRI brain cancer classification using hybrid classifier KNN Models for the Text Classification.” Augment
(SVM-KNN)," 2015 International Conference on Hum Res
Industrial Instrumentation and Control (ICIC) [16] R. Dhanya, I. R. Paul, S. Sindhu Akula, M. Sivakumar
[8] Alladi SM, P SS, Ravi V, Murthy US. “Colon cancer and J. J. Nair, "A Comparative Study for Breast Cancer
prediction with genetic profiles using intelligent Prediction using Machine Learning and Feature
techniques.” Bioinformation. Selection," 2019 International Conference on Intelligent
[9] H. Hasan and N. M. Tahir, "Feature selection of breast Computing and Control Systems (ICCS)
cancer based on Principal Component Analysis," 2010 [17] Shujun Huang, Nianguang Cai, Pedro Penzuti Pacheco,
6th International Colloquium on Signal Processing & its Shavira Narrandes, Yang Wang, Wayne Xu.
Applications “Applications of Support Vector Machine (SVM)
Learning in Cancer Genomics.” Cancer Genomics &
[10] “Gene expression cancer RNA-Seq Data Set”, Proteomics
retrieved from
https://archive.ics.uci.edu/ml/datasets/gene+expression [18] A. J. B and S. Palaniswamy, "Comparison of
+cancer+RNA-Seq, last accessed May, 2018 Conventional and Automated Machine Learning
approaches for Breast Cancer Prediction," 2021 Third
[11] T. N. Varunram, M. B. Shivaprasad, K. H. Aishwarya, International Conference on Inventive Research in
A. Balraj, S. V. Savish and S. Ullas, "Analysis of Computing Applications (ICIRCA)
Different Dimensionality Reduction Techniques and
Machine Learning Algorithms for an Intrusion Detection [19] Zeng, Z., Yao, L., Roy, A. et al. “Identifying Breast
System," 2021 IEEE 6th International Conference on Cancer Distant Recurrences from Electronic Health
Computing, Communication and Automation (ICCCA) Records Using Machine Learning.” J Healthc Inform
[12] R. R. Nair, S. H. Karumanchi and T. Singh, "Neuro- Res
Fuzzy based Multimodal Medical Image Fusion," 2020
IEEE International Conference on Electronics,
Computing and Communication Technologies
(CONECCT)
[13] Zaiqiao Meng, Richard McCreadie, Craig Macdonald,
and Iadh Ounis. 2020. “Exploring Data Splitting

13th ICCCNT 2022 Paper 173

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

13th ICCCNT 2022 Paper 173

Uploaded by

Copyright:

Available Formats

Comparative Study of Cancer Classification by

Analysis of RNA-seq Gene Expression Levels

A. Heatmap of Confusion Matrix

Fig. 7. AUC-ROC Curves for Random Forest and SVM ML model

E. Bar Plot for Time Complexity

200 0.3598 0.0688

VI. DATA INTERPRETATION VIII.FUTURE WORK

You might also like