Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

biocybernetics and biomedical engineering 39 (2019) 63–74

Available online at www.sciencedirect.com

ScienceDirect

journal homepage: www.elsevier.com/locate/bbe

Original Research Article

Magnetic resonance imaging-based brain tumor


grades classification and grading via convolutional
neural networks and genetic algorithms

Amin Kabir Anaraki a, Moosa Ayati b,*, Foad Kazemi c


a
Advanced Instrumentation Lab, School of Mechanical Engineering, College of Engineering, University of Tehran,
Tehran, Iran
b
School of Mechanical Engineering, College of Engineering, University of Tehran, Tehran, Iran
c
Iran University of Medical Sciences, Tehran, Iran

article info abstract

Article history: Gliomas are the most common type of primary brain tumors in adults and their early
Received 2 February 2018 detection is of great importance. In this paper, a method based on convolutional neural
Received in revised form networks (CNNs) and genetic algorithm (GA) is proposed in order to noninvasively classify
16 June 2018 different grades of Glioma using magnetic resonance imaging (MRI). In the proposed
Accepted 10 October 2018 method, the architecture (structure) of the CNN is evolved using GA, unlike existing methods
Available online 18 October 2018 of selecting a deep neural network architecture which are usually based on trial and error or
by adopting predefined common structures. Furthermore, to decrease the variance of
Keywords: prediction error, bagging as an ensemble algorithm is utilized on the best model evolved
Brain tumor by the GA. To briefly mention the results, in one case study, 90.9 percent accuracy for
Magnetic resonance imaging classifying three Glioma grades was obtained. In another case study, Glioma, Meningioma,
Medical image classification and Pituitary tumor types were classified with 94.2 percent accuracy. The results reveal the
Convolutional neural networks effectiveness of the proposed method in classifying brain tumor via MRI images. Due to the
Genetic algorithms flexible nature of the method, it can be readily used in practice for assisting the doctor to
Bagging ensemble algorithm diagnose brain tumors in an early stage.
© 2018 Nalecz Institute of Biocybernetics and Biomedical Engineering of the Polish
Academy of Sciences. Published by Elsevier B.V. All rights reserved.

brain, while metastatic brain tumors originate from other body


1. Introduction
parts. Tumors can be cancerous (or malignant) or noncancer-
ous (or benign). Malignant brain tumors grow fast and spread
Brain tumor referred to the aggregation of abnormal cells in to other areas of the brain and spine and compared to benign
some tissues of the brain. According to the brain tumors origin, tumors, they are more life-threatening. A more detailed
they are divided into two categories, primary and metastatic categorization classifies tumors into four grades where, the
brain tumors. The origin of primary brain tumors is in the higher the grade, the tumor is more malignant. Due to the

* Corresponding author at: School of Mechanical Engineering, College of Engineering, University of Tehran, P.O.B. 11155-4563, Tehran, Iran.
E-mail addresses: amin.kabir@ut.ac.ir (A. Kabir Anaraki), m.ayati@ut.ac.ir (M. Ayati), foadkazemi@gmail.com (F. Kazemi).
https://doi.org/10.1016/j.bbe.2018.10.004
0208-5216/© 2018 Nalecz Institute of Biocybernetics and Biomedical Engineering of the Polish Academy of Sciences. Published by Elsevier
B.V. All rights reserved.
64 biocybernetics and biomedical engineering 39 (2019) 63–74

presence of brain tumors in the center of the nervous system support vector machine, neural network, Hybrid intelligent
of human body, even benign tumors may incapacitate the techniques and probabilistic neural network [10–14].
brain and cause irrecoverable effects. In [15] classification schemes and their performance in
Gliomas are considered as the most common type of order to classify several types of brain tumors and grades of
primary brain tumor in adults [1]. According to the World Gliomas are investigated. In their proposed method, the region
Health Organization grading system [2], gliomas are diagnosed of interest is first defined and then some features such as the
in grades of severity from I to IV. Grade I tumors have cells that shape of the tumor are extracted from the MR images. In order
are benign and are approximately normal in appearance. to select the appropriate features, support vector machines
Grade II tumors have cells that appear to be slightly abnormal. (SVM) with recursive feature elimination has been used. By
Grade III tumors have cells that are malignant and clearly observing their results, it is seen that their proposed method in
abnormal. The most severe type of brain tumors that contain binary classifications has obtained high accuracies, but the
fast-spreading and abnormal cells are considered as Grade IV. accuracy of the multi-class classification according to the
Glioblastoma multiforme (GBM) are quintessential tumors of confusion matrix provided in this paper, is low. In a recent
this type. Meningioma tumors, arise from a layer of tissue article, [16], the classification performance of three tumor
called the meninges. Meninges cover the brain and spinal cord types using fully connected and convolutional neural net-
and act as protector. They are mostly considered as benign works (CNNs) is compared. As described in this article, the
tumors, because they grow at a slow pace and are also less performance of various structures of the convolutional net-
likely to spread. Pituitary tumors develop in the pituitary gland works has been tested and, eventually, a relatively shallow
and account for 14% of all primary intracerebral tumors, with network with two convolutional layers, two max-pooling
most of them are due to spontaneous mutation and some are layers, and two fully connected layers is used for classification.
due to inherited genetic defects [3]. These tumors are also It has also been mentioned that the use of Vanilla preproces-
benign and they are much less likely to spread. Although these sing has been effective in classification accuracy. In another
tumors are considered benign, they can cause serious recent study [17], CNN is used to classify healthy and
health problems due to their presence in sensitive areas of unhealthy brain images as well as high-grade and low-grade
the brain [4]. Glioma tumors. A modified version of the famous AlexNet was
Early detection plays a major role in treatment and recovery used as their network architecture. Despite the valuable works
of the patient [5]. Diagnosing a brain tumor and its grade being done in this area, developing a robust and practical
usually undergoes a complicated and time-consuming pro- method to classify brain MR images still requires more effort.
cess. Usually, the patient refers to MRI when the brain tumor Convolutional neural networks have had many remarkable
has grown sufficiently and several harassing symptoms have successes in solving complex problems of machine learning
appeared. After examining the brain images, if tumor exis- and currently are considered as the most successful method
tence is suspected, the patient's brain biopsy comes to action. for image processing [18]. Instead of matrix multiplication,
Unlike MR, biopsy has an invasive procedure and in some convolution operators are used in most layers of these
cases, it may even take up to a month for a definite answer. networks. This contributes to the superiority of convolutional
MRI specialists perform techniques such as perfusion to grade networks in solving problems with high computational costs.
tumor and biopsy to confirm. It should be noted that in recent This is very important since the MRI datasets in MRI-based
years some novel methods have been introduced in order to diagnosis include thousands of images with different qualities
grade brain tumors other than biopsy. In particular, distin- and types. Another advantage of this method compared to
guishing high-grade and low-grade glioma using perfusion MR shallow machine learning methods, is automatic feature
imaging has been able to resolve some biopsy drawbacks. For extraction. In conventional methods, a method was usually
these reasons, utilization of a computer-aided system for proposed for extracting features, and to further reduce the
detection is helpful. An automatic efficient system for brain dimensions, a method was used to select the dominant
tumor classification assists doctors in interpretation of features. Recently, CNN has also been widely used in the
medical images and supports decision of specialists in an processing of medical images using deep neural networks
early stages of tumor growth. In this study, brain tumor such as grade classification [19], segmentation [20,21] and skull
grading is done by spending much less time and as is stripping of brain tumor images [22].
confirmed in this paper with high accuracy. Furthermore, In this article, a method based on CNN is proposed to
the whole process of classification is non-invasive. classify three grades of Gliomas with MR images. Selecting an
Considerable attention has been paid to medical image appropriate deep neural network architecture for a specific
analysis for diagnosis purposes. Recently, the emergence of purpose, consists of a challenging procedure which is usually
modern machine learning algorithms and their proven done by trial and error or employing a common architecture.
efficiency in solving various problems in the field of artificial Unlike conventional schemes, in the proposed method the
intelligence, have also doubled the interest to the field of architecture of the convolutional neural network is evolved
health-related topics and algorithms [6]. Many researches on using GA. Networks with different number of layers and
classifying various tumors using MRI, especially MR brain parameters are investigated by GA and the network with the
images, artificial neural networks and evolutionary algorithms best performance on the dataset was selected for further
have been done [7–9] and various methods have been processing. Afterwards, a model averaging method called
implemented as well. Previous studies indicate that normal bagging is utilized on the best model evolved by the GA.
and abnormal classes in brain MR images are easily distin- Bagging is an ensemble method and is employed in order to
guished using shallow machine learning algorithms such as decrease the variance of final diagnosis. The proposed method
biocybernetics and biomedical engineering 39 (2019) 63–74 65

is used in two case studies. In first case, three Glioma grades In the case of normal MRIs, in average six middle MR slices
are classified with 90.9% accuracy. In second case, for are selected at identical intervals for each subject. These
demonstrating the strength of the proposed method, three images are used to distinguish between healthy and tumorous
different types of tumors from another MRI database were brains. Images of brain tumors that have been identified with
used as the input to CNN, and the final performance of infusion of contrast material are also used. Due to differences
diagnosis was 94.2%. Results confirm that proposed method is in the size and location of tumors, the number of slices varies
applicable on different brain MRI datasets in order to assist the from case to case.
specialist in early detection. After training the classifier network's parameters, healthy
The remainder of the paper is organized as follows. In and tumor slices can be classified. In other words, by
Section 2, a brief explanation about the datasets utilized as examining all brain-MR slices, the classifier recognizes
input of the networks is given. CNNs are discussed in details in between normal and abnormal slices. This not only helps to
Section 3. In Section 4, the proposed method for selecting an identify the tumor existence but also determines the approxi-
appropriate architecture based on GA is presented. Experi- mate location of the tumor in the brain. Also, the number of
mental results are presented and discussed in Sections 5 and 6, slides that contain a trace of tumor informs about the
respectively. Finally, Section 7 is dedicated to conclusions. approximate size and grade of the tumor.

2.2. Pre-processing and normalization


2. Data preparation
In magnetic resonance imaging (MRI), specific image appear-
In this section, datasets of this paper are introduced and the ances are obtained by setting some parameters such as
type of the input data and pre-processing steps are discussed. radiofrequency pulses and gradients. T1 and T2 are the most
common MRI sequences and each provides particular details
2.1. Datasets about tissues. For normal cases, T1 images are employed in
this study. In order to reduce the number of normal brain
Most of the datasets used in this research are obtained from images, six sections with approximately equal intervals have
four databases that are available online for research purposes. been selected from MR images of each normal subject (person).
Normal brain MR images were obtained from the brain These six selected sections belong to a normal person are
development website (IXI dataset) [23]. IXI dataset is a shown in Fig. 1.
collection of nearly 600 MR images from normal subjects In the gadolinium-enhanced T1 MR images, due to the
(without any lesion). MR images of Glioma tumors were injection of a contrast agent called Gadolinium, tumor
collected from the cancer imaging archive datasets [24]. boundary can be better identified. The aforementioned images
REMBRANDT dataset [25] contains the pre-surgical magnetic are used to classify tumor grades in this study. For instance,
resonance multi-sequence images from 130 patients which Fig. 2 depicts axial MR images of 3 different grades of Glioma
suffer from low or high grade Gliomas. TCGA-GBM data tumor with gadolinium injection. In addition, Fig. 3 gives
collection [26], contains glioblastoma multiform brain MR examples of three brain tumors images belong to Case Study II.
images of 199 patients and TCGA-LGG dataset [27] includes low In this study, gadolinium-enhanced T1 images are
grade Gliomas data, collected from 299 patients. In addition, employed. All images that are used as the proposed network's
brain MR of 60 patients were obtained from the neurosurgery inputs are normalized using data rescaling which projects the
section of Hazrat-e Rasool General Hospital at Tehran. It data into the [0,1] range and they are dimensioned 128  128
should be noted that except for the normal brain MRs, only the pixels.
T1 axial images with injection of contrast agent are used in this In addition, in the proposed method one or several slices of
study. Data from the mentioned databases are considered as an MRI image are used and it is not necessary to use all slices or
the Case Study I. 3D images.
In addition, to evaluate the proposed method, the axial
brain tumor images of Cheng et al. [28] are also employed. They 2.3. Data augmentation
have provided a MR brain tumor dataset containing T1-
weighted images from 233 patients with Meningioma, Glioma, Training a machine learning model on larger dataset is the
and Pituitary brain tumor types. In this paper, this dataset is best way to generalize it and to reduce overfitting probability
utilized in the Case Study II. Data of both case studies have [29]. Creating fake data and adding to dataset is simple and
annotations (or labels) assigned by a specialist. straightforward and is done in the data augmentation step.

Fig. 1 – An example of the six selected sections from the axial MR of a normal person.
66 biocybernetics and biomedical engineering 39 (2019) 63–74

Fig. 2 – Examples of three different grades of Gliomas axial brain images: (a) Glioma Grade IV; (b) Glioma Grade III; (c) Glioma
Grade II.

Fig. 3 – Examples of axial brain images from the public dataset provided by Cheng et al.: (a) Glioma; (b) Meningioma;
(c) Pituitary tumors.

In the proposed method, some manipulated images were recognize complex concepts. In other words, it enables the
added to the training set by applying random changes to the multilayer models to learn representations of data with
original data. Rotation of 108, 208 or 308 clockwise or multiple levels of abstraction [18].
counterclockwise, translating 15 pixel to right or left, scaling Convolutional neural networks (CNNs) are one of the most
to 0.75 of the original size, mirroring tumor MR images and a efficient supervised methods of deep learning which have
combination of these changes were performed and resulted made remarkable improvements in image processing field.
images were added to the original datasets. The train and test Generally, convolutional, pooling, and fully connected are
datasets are randomly selected from this new dataset. three main layers of a convolutional network. In convolutional
After selecting images and increasing the number of data, layers, the network uses different kernels to convolve the
8000 normal MR images and 8000 glioma MR images were input image to create various feature maps. Applying this layer
provided for train and test. More specifically, there are 4000 will significantly reduce the number of parameters (weight
GBM (grade IV) images, 2000 grade III and 2000 grade II tumor sharing) of the network and the network learns the correlation
images in the Glioma class. Note that 500 images are randomly between the neighbor pixels (local connectivity) [30].
excluded from the dataset of each class and they are used for There are two stages of training in every convolutional
test purposes. neural network. In feedforward step input images are fed to
Cheng et al. dataset consists of 989 axial images and same the network. In other words, dot product of the input vector
data augmentation process is applied on it. 1521 images are and parameters vector of each neuron is performed and look at
used for train and 115 images from each class are employed for convolution operator in each layer is applied. Afterwards, the video
test. output is computed. By using a loss function, the network (3brow
output is compared with the desired output (correct answers) n1blue)
and the error rate is computed then, based on the error, back
3. Convolutional neural networks propagation stage begins. Calculation of the gradient of each
parameter is done in this step using the chain rule and finally
Structure, layers, and parameters of convolutional neural all the parameters are updated. This is repeated for an
networks are described in this section. Deep learning adequate number of iterations. More detailed explanations are
algorithms are subsets of machine learning algorithms in given below.
the world of artificial intelligence. Using simple concepts, deep It should be noted that in order to keep the size of the
learning enables the computer to create, characterize, and output unchanged, the input volume is padded with zeros
biocybernetics and biomedical engineering 39 (2019) 63–74 67

a = 1. Performance of the evolving networks is also


examined in the presence of the scaled exponential linear
unit (SELU) activation function [33], see (4). By adding a little
twist to ELU the SELU comes into existence. The corresponding
equations of these functions are given below where, a = 1.6732
and l = 1.0507.


x if x  0
seluðxÞ ¼ l (4)
aex a if x < 0
Fig. 4 – An example of applying a convolution layer with
zero-padding method.
3.3. Pooling

A pooling layer usually comes after a convolutional layer and


around the border (zero-padding). There are two padding reduces the size of the feature maps and number of
methods valid padding and same padding. In this paper same parameters of the network which cause a decrease in
padding is used for all the convolutional layers. Fig. 4 gives an computational costs. Due to the consideration of neighboring
example of applying (or convolving) a 3  3 filter on a 4  4 pixels in calculations, pooling layers are also invariant to small
input matrix. As it is seen, by adding zeros to the input matrix, changes. One of the most widely used pooling methods is
the size of the output matrix remains same as the input matrix called max-pooling. In this paper, a max-pooling layer always
i.e. 4  4. comes after each convolutional layer with filters of size 2  2
applied with a stride of 2 and take the maximum over four
3.1. Weight initialization numbers (Fig. 5).

Proper initial weights can speed up network convergence. 3.4. Regularization


Various approaches for weight initialization have been
introduced in literature. In this study, after investigating the The main issue in machine learning is to generate an
impact of using different initializers, it was observed that the algorithm that performs well not only on the training data,
best performance is obtained by employing 'He' initializer with but also on the new entries. Several regularization methods for
normal distribution [31]. deep learning have been proposed. This paper uses dropout
which provides a computationally inexpensive, yet powerful
3.2. Activation function method for regularization. It randomly removes some nodes of
the fully connected layer in the training phase, to prevent
Generally, a nonlinear operator or activation function is used overfitting. On the other hand, dropout is considered as an
in deep networks after convolutions. The presence of this ensemble method, since it provides different networks during
function riches the model in comparison with a linear model. training.
It is well-known that applying the rectified linear unit (ReLU)
activation function in deep networks increases the training 3.5. Loss function
speed. ReLU simply projects negative values to zero and
is defined in (1). One of the important aspects of designing a deep neural
network is the selection of the loss function to be minimized.
 Categorical cross-entropy function (H) is usually a good
x if x  0
reluðxÞ ¼ (1) candidate and has been used here. It is defined for two
0 if x < 0
distributions ( p and q) over discrete variable x and is given by:
In many cases Leaky ReLU, defined in (2), has performed
better than ReLU. It allows a small, non-zero gradient when the X
function is not active (or for negative values). Hðp; qÞ ¼  pðxÞlnðqðxÞÞ (5)
x

 where q(x) is the estimate for true distribution p(x).


x if x  0
leakyreluðxÞ ¼ (2)
ax if x < 0

Usually, a = 0.3. Recently using exponential linear units


(ELUs) has led to an increase in training speed and classifica-
tion accuracy [32]. ELU accept negative values allowing it to
push mean unit activations closer to zero like batch normali-
zation but with less computational cost.


x if x  0
eluðxÞ ¼ (3) Fig. 5 – An example of a max-pooling layer.
aðex 1Þ if x < 0
68 biocybernetics and biomedical engineering 39 (2019) 63–74

3.6. Training 4.1. Genetic algorithms

In order to train a deep network, the loss function must be The mechanism of natural selection is simulated in GAs. After
minimized by a gradient-based optimization algorithm. each generation the fittest individual of the population will
Stochastic gradient descent (SGD) is widely used as an survive and produce more offspring in the next generation.
optimizer in deep learning [18]. Recently a method for Sometimes a mutation occurs and an offspring with a new
stochastic optimization called Adaptive moment estimation characteristic is created. After several generations superior
(Adam) is presented in [34]. It is demonstrated that Adam individuals appear to be more likely to survive [38].
works better than customary optimization algorithms. In this study, GA is implemented to evolve the best
Furthermore, its computational efficiency in the presence structure of the CNN by choosing proper parameters for the
of large dataset is a privilege of this method. Learning rate for network. These parameters are number of convolutional and
updating the weights will remain constant in SGD algorithm, max-pooling layers, number of filters and size of them,
however Adam algorithm computes adaptive learning rates number of fully connected layers, activation function, dropout
by estimating the first moment (the mean) and the second probability, optimization method and learning rate. The
moment (the uncentered variance) of the gradients. Other values associated with these parameters are specified in
optimizers like Adagrad, Adadelta, Adamax and Nadam were Table 1.
also examined in this study. In Adagrad optimizer, the Considering the parameters of Table 1, over one million
learning rate adapts to the parameters. This happens by different architectures are possible for the CNN. Directly
doing larger updates for infrequent parameters compared to searching these possible architecture to find the best is not
frequent parameters [35]. Unlike Adagrad which accumulates possible however, GA will ease this search. The flowchart of the
all past squared gradients, Adadelta limits the size of the genetic algorithm used in this research is illustrated in Fig. 6.
window of the accumulated previous gradients [36]. Adamax Initially, 50 networks with random parameters are created as
is a variant of Adam optimizer based on the infinity norm. initial populations. Each network will be trained by 80 percent
Nadam combines Adam and Nesterov Accelerated Gradient of data and 20 percent of data are used as validation dataset.
optimizers [37]. In Nadam, before computing the gradient, Validation accuracy is considered as a criterion for retaining or
parameters with the momentum step are updated and this rejecting the network in the next generation. In the proposed
makes it possible to take more precise steps in the gradient method, GA is stopped if early stopping criterion is satisfied or
direction. the number of generations exceeds the pre-specified maximum
The values of all these optimizers' parameters, other than number of generations. Early stopping criterion is when no
the learning rate, are selected based on the author's improvement happens in the validation accuracy (loss func-
recommendation. tion) of three sequent epochs. The early stopping criterion is
implemented in order to reduce computational costs. In this
paper, maximum number of generations is 15.
4. Designing the network architecture According to the validation accuracy, 40 percent of best
networks or elites will retain and move to the next generation.
Typically, a desirable network architecture is found by With 10 percent probability rejected networks also have a
testing various common network structures. This process chance to be transferred to the next generation. In other
requires a lot of trial and error and, of course high words, top 20 networks are directly entering the next
computational cost. In this study, various CNN architectures generation, and 30 other rejected networks might retain and
for the task of MRI image classification are evolved using enter next generation by 0.1 probability. The other members of
genetic algorithm (GA) [38]. Instead of training and compar- the next generation are created through applying selection
ing more than one million different architectures, by and crossover operators. In addition, there is 20 percent
employing GA and comparing less than 500 architectures a chance that a network structure is randomly mutated. This
suitable architecture was discovered. Thus, the computa- process is repeated until stopped based on the flowchart, and
tional costs are decreased. finally the network architecture that has the best performance
More details on selecting the network architecture is selected as the main network architecture for the
are given below. classification.

Table 1 – The parameters employed to evolve the best CNN structure and their associated values.
Number of convolutional + max pooling layers 2, 3, 4, 5, 6
Number of fully connected + dropout layers 1, 2, 3
Number of filters 16, 24, 32, 48, 64, 96, 128
Kernel sizes 2, 3, 4, 5, 6, 7
Number of fully connected neurons 128, 192, 256, 384, 512
Activation functions ReLU, Leaky ReLU, ELU, SELU
Feedforward optimizers SGD, ADAM, ADAMAX, NADAM, ADAGRAD, ADADELTA
Learning rate 1e4, 1e3, 1e2
Dropout rate 0.1, 0.2, 0.3, 0.4, 0.5
biocybernetics and biomedical engineering 39 (2019) 63–74 69

Fig. 6 – Genetic Algorithm Flowchart.

4.2. Bagging as training set and the remaining images are placed in the
validation set. Using the new training set all the variables are
Bagging (or Bootstrap Aggregating) is an ensemble method optimized. Here, the optimization is performed with 10,000
that reduce the generalization error by combining multiple iterations and this process is repeated 5 times. It was observed
models or classifiers [39]. Bagging is a subset of a general that implementing this method improve the results.
method for machine learning called the model averaging.
Different models do not usually create similar errors on the
test set, for this reason the model averaging works. The idea is 5. Results
to train separate classifiers and then evaluate the output for
test samples by each classifier. A compound classifier is then In this section, evaluation criteria are described and the results of
created as the aggregation of each particular classifiers. the proposed method for two case studies are presented, refer to
In order to decrease the variance of classification error, bagging Section 2.1. As previously mentioned, convolutional networks
is used on the best model evolved by the GA. First, all training and were created with random architectures, and in every iteration
validation images are concatenated to form a combined set. This of 15 GA generations better networks were evolved. Classifica-
combined set then permuted and new training set and validation tion accuracy of validation dataset is considered as genetic
set are randomly selected. 75 percent of the combined set is used algorithm criteria for improving networks' architecture. Accord-
70 biocybernetics and biomedical engineering 39 (2019) 63–74

superior network as the activation function. Like Case Study I,


Adam is preferred to the rest of the optimization methods.
In Fig. 8(b), the structure of this network is presented in more
details.
In order to observe the performance of the models obtained
by GA, for each case study 100 epochs were trained. 20 percent
of training set is used for validation. From Fig. 9 it is seen that
the training process is done properly. The loss function almost
steadily reduced and accuracy is increased with a suitable
slope. In addition, validation has followed training to a
satisfactory level. Therefore, it is concluded that the archi-
tectures proposed by GA are sufficiently efficient.

Fig. 7 – Average validation accuracy of consecutive 5.1. Evaluation


generations of GA.
The performance of a classification algorithm is evaluated in
variety of ways. Here, the confusion matrix is used to check the
ing to Fig. 7, the average validation accuracy of 15 consecutive GA performance (Fig. 10). This matrix gives valuable information
generations had an overall increasing rate. Network that resulted about the actual and predicted labels provided by the proposed
in the highest accuracy on validation dataset has been selected classification method. Using this information, the perfor-
and used as the classifier. In addition, applying bagging mance is assessed from different aspects.
ensemble method on the best model evolved by the GA, The diagonal values represent true positives (TP). Simply
increases the classification accuracy by almost 2%. put, the number of samples that classifier detects the condition
Case Study I when the condition is present. True negatives (TN) are those
For classifying Glioma grades, it has been found that the that do not detect the condition when the condition is absent.
superior networks proposed by the genetic algorithm, all consist False positives (FP) count the cases which classifier detects the
of five convolutional and max-pooling layers as well as one fully condition when the condition is absent. False negatives (FN) are
connected layer. Furthermore, rectified linear unit is used in those that do not detect the condition when the condition is
these superior networks as the activation function. The number present. Values corresponding to the mentioned definitions are
of neurons and dropout rate are the only difference between derived from the confusion matrices and are given in Table 2.
these networks. On the other hand, Adam optimization method By calculating various quantities and ratios, the performance of
is performed better than others, and the cost functions of the best the proposed method for classifying MR brain tumor images are
networks are optimized using this algorithm. Fig. 8(a) illustrates characterized and compared in this table, as well.
the architecture of the proposed CNN for this case study.
Case Study II
6. Discussion
In this case study, the network that has achieved the best
performance for classifying three types of tumors, has six
convolutional and max-pooling layers and one fully connected From Table 2 it is seen that the proposed method in almost all
layer with 384 neurons. ELU has been used in the structure of cases has detected tumors with high precision. Also, in order to

Fig. 8 – Best CNN architectures provided by GA: (a) Case Study I; (b) Case Study II.
biocybernetics and biomedical engineering 39 (2019) 63–74 71

Fig. 9 – Loss and accuracy variations during 100 epochs for: (a, b) Case Study I; (c, d) Case Study II. Solid lines and dash dot
lines are related to the training dataset and validation dataset, respectively.

evaluate more precisely, various criteria have been investigated. about 95% of grade II and grade III tumors were correctly
In Case Study I, Normal images were classified exceptional. classified. By dividing correct predictions by all test data, multi-
Glioblastoma multiform tumors that are the most common and class classification accuracy is calculated. Thus, in 4 classes, the
most malignant brain tumors were classified with an excellent classification accuracy is 93.1% and if only classification of
sensitivity of 97.4% and a total accuracy of 96.1%. In addition, Glioma grades is considered, 90.9% accuracy is obtained.
72 biocybernetics and biomedical engineering 39 (2019) 63–74

Fig. 10 – Confusion matrices for (a) Case Study I and (b) Case Study II.

Table 2 – Evaluation of the proposed method.

TP FP TN FN
Case Study I
Normal 499 6 1494 1
Grade II 442 32 1468 58
Grade III 434 34 1466 66
Grade IV 487 66 1434 13

Case Study II
Glioma 113 10 220 2
Meningioma 101 5 225 14
Pituitary 111 5 225 4

TPR TNR PPV NPV FPR FNR FDR ACC


Case Study I
Normal 0.998 0.996 0.988 0.999 0.004 0.002 0.012 0.997
Grade II 0.884 0.979 0.932 0.962 0.021 0.116 0.068 0.955
Grade III 0.868 0.977 0.927 0.957 0.023 0.132 0.073 0.950
Grade IV 0.974 0.956 0.881 0.991 0.044 0.026 0.119 0.961

Case Study II
Glioma 0.983 0.957 0.919 0.991 0.043 0.017 0.081 0.965
Meningioma 0.878 0.978 0.953 0.941 0.022 0.122 0.047 0.945
Pituitary 0.965 0.978 0.957 0.983 0.022 0.035 0.043 0.974

TP = True positive (hits).


TN = True negative (correct rejections).
FP = False positive (false alarms or type I error).
FN = False negative (miss or type II error).
TPR = True positive rate (hit rate or sensitivity) = TP/(TP + FN).
TNR = True negative rate or specificity = TN/(TN + FP).
PPV = Positive predictive value or precision = TP/(TP + FP).
NPV = Negative predictive value = TN/(TN + FN).
FPR = False positive rate (fall-out) = FP/(FP + TN).
FNR = False negative rate (miss rate) = FN/(TP + FN).
FDR = False discovery rate = FP/(TP + FP).
ACC = Overall accuracy = (TP + TN)/(P + N) = (TP + TN)/(TP + FP + FN + TN).

According to the results obtained for Case Study II, overall it has also had a great performance here. An important aspect
accuracies for classifying Glioma, Meningioma and Pituitary for providing an efficient neural network for specific problems,
tumors were 96.5, 94.5 and 97.4 percent respectively. In this is to determine the proper architecture. In [40] it is demon-
case, 94.2% of the network predictions were correct. By strated that using convolutional neural networks with
reviewing the results of the classifications, the great ability complex structures do not guarantee a better result compared
of the proposed classification method to classify different MR to simpler structures. In this article, a fully automated
brain images is confirmed. procedure for selecting the network structure and its param-
The ability of CNNs and on a larger scale deep learning eters has been used. As previously stated, the genetic
algorithms to process and classify images has been proved in algorithm has had an acceptable performance in finding a
many researches. As it is clear from the results of this study, suitable network for our particular application and the usage
biocybernetics and biomedical engineering 39 (2019) 63–74 73

Table 3 – Comparison of the proposed method data. Therefore, it is necessary to compare different network
with related works. architectures to reach the desired objective. Because it is
Approach Classification impossible to evaluate all possible cases, in this research GA is
accuracya performed to specify an appropriate network architecture with
far less computations. Applying Bagging algorithm on the best
Case Study I – Glioma Grade II/Grade III/Grade IV
SVM + RFE 62.5% network suggested by GA was also effective and has improved
(Zacharaki et al. the accuracy of the classification according to Table 2. This
[14]) table reflects the success of the proposed method to classify
Proposed method 90.9% brain tumors type via MR images.
Case Study II – Glioma/Meningioma/Pituitary Although it has been shown in this paper that the proposed
Vanilla 91.43% method yields better performance compared to similar
preprocessing literature, larger datasets with several tumor types and other
+ shallow CNN CNN structures and deep learning algorithms in future works
(Paul et al. [15])
may lead to better performances.
Proposed method 94.2%
a
Classification accuracy = (correct predictions/number of all data),
using test dataset.
references

of bagging method was also beneficial. Table 3 compares the [1] Wen PY, Kesari S. Malignant gliomas in adults. N Engl J Med
performance of the proposed method with similar tasks. As is 2008;359(5):492–507.
[2] Louis DN, Perry A, Reifenberger G, Von Deimling A,
seen, the proposed method is superior to existing methods for
Figarella-Branger D, Cavenee WK, et al. The 2016 World
classifying similar MR brain tumor classes.
Health Organization classification of tumors of the central
The binary classification of the tumor grades proposed by nervous system: a summary. Acta Neuropathol 2016;131
Zacharaki et al. [15] had achieved 62.5% accuracy for (6):803–20.
classifying three glioma grades. As specified in Table 3, the [3] Laws ER, Ezzat S, Asa SL, Rio LM, Michel L, Knutzen R.
present method of this paper is about 30 percent more Pituitary disorders: diagnosis and management. John Wiley
accurate. Paul et al. [16] have proposed a CNN classifier using a & Sons; 2013.
[4] Black PM. Brain tumors. N Engl J Med 1991;324(22):1555–64.
dataset provided by Cheng et al. [28]. Using the same dataset,
[5] Kelly PJ. Gliomas: survival, origin and early detection. Surg
as is shown in Table 3, the proposed method also has
Neurol Int 2010;1.
performed better accuracy. [6] Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F,
Ghafoorian M, et al. A survey on deep learning in medical
image analysis. Med Image Anal 2017.
7. Conclusions [7] Lo C-S, Wang C-M. Support vector machine for breast MR
image classification. Comput Math Appl 2012;64(5):1153–62.
[8] Trigui R, Mitéran J, Walker PM, Sellami L, Hamida AB.
In summary, in this study a CNN-based method for classifying
Automatic classification and localization of prostate cancer
Glioma brain tumor MR images is proposed. Genetic algorithm using multi-parametric MRI/MRS. Biomed Signal Process
was utilized to search for a CNN structure that produces better Control 2017;31:189–98.
results. The proposed method not only has been grading [9] Rasti R, Teshnehlab M, Phung SL. Breast cancer diagnosis in
Glioma tumors with high precision, but also has been very DCE-MRI using mixture ensemble of convolutional neural
successful in classifying images of various types of brain networks. Pattern Recogn 2017;72:381–90.
[10] Chaplot S, Patnaik L, Jagannathan N. Classification of
tumors.
magnetic resonance brain images using wavelets as input
In the proposed algorithm, classification of various grades
to support vector machine and neural network. Biomed
of Glioma and two other widespread tumor types are carried Signal Process Control 2006;1(1):86–92.
out with high precision. There is no requirement to perform [11] El-Dahshan E-SA, Hosny T, Salem A-BM. Hybrid intelligent
time-consuming processes such as skull stripping or segmen- techniques for MRI brain images classification. Digit Signal
tation and the decision is just made by the raw data of MR Process 2010;20(2):433–41.
images. Proposed method can be manipulated as a secondary [12] Zhang Y, Dong Z, Wu L, Wang S. A hybrid method for MRI
brain image classification. Expert Syst Appl 2011;38
option for early detection in a non-invasive procedure. In
(8):10049–53.
addition, the time required for the classification is very short in
[13] Saritha M, Joseph KP, Mathew AT. Classification of MRI
comparison to the time required for analyzing biopsy. brain images using combined wavelet entropy based spider
Consequently, the proper action can be taken on proper time web plots and probabilistic neural network. Pattern Recogn
with respect to the severity of the tumor. Lett 2013;34(16):2151–6.
Most of the proposed schemes in this area comprise of the [14] Kalbkhani H, Shayesteh MG, Zali-Vargahan B. Robust
region-of-interest definition, manual feature extraction, fea- algorithm for brain magnetic resonance image (MRI)
classification based on GARCH variances series. Biomed
ture selection and finally classification. In contrast, the
Signal Process Control 2013;8(6):909–19.
proposed deep learning method extracts useful features [15] Zacharaki EI, Wang S, Chawla S, Soo Yoo D, Wolf R, Melhem
automatically, and thus providing a feature extracting method ER, et al. Classification of brain tumor type and grade using
is not required. Using a network architecture that performs MRI texture and shape in a machine learning scheme. Magn
well on some data may lead to poor results for another set of Reson Med 2009;62(6):1609–18.
74 biocybernetics and biomedical engineering 39 (2019) 63–74

[16] Paul JS, Plassard AJ, Landman BA, Fabbri D. Deep learning for tumor region augmentation and partition. PLoS ONE
brain tumor classification. Proc of SPIE. 2016. pp. 1013710–1. 2015;10(10). e0140381.
[17] Khawaldeh S, Pervaiz U, Rafiq A, Alkhawaldeh RS. [29] Goodfellow I, Bengio Y, Courville A. Deep learning. MIT
Noninvasive grading of glioma tumor using magnetic Press; 2016.
resonance imaging with convolutional neural networks. [30] LeCun Y, Bengio Y. Convolutional networks for images,
Appl Sci 2017;8(1):27. speech, and time series. Handb Brain Theory Neural Netw
[18] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 1995;3361(10):1995.
2015;521(7553):436–44. [31] He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers:
[19] Mohan G, Subashini MM. MRI based medical image surpassing human-level performance on imagenet
analysis: survey on brain tumor grade classification. classification. Proceedings of the IEEE International
Biomed Signal Process Control 2018;39:139–61. Conference on Computer Vision; 2015. p. 1026–34.
[20] Pereira S, Pinto A, Alves V, Silva CA. Brain tumor [32] Clevert D-A, Unterthiner T, Hochreiter S. Fast and accurate
segmentation using convolutional neural networks in MRI deep network learning by exponential linear units (elus);
images. IEEE Trans Med Imaging 2016;35(5):1240–51. 2015, arXiv preprint arXiv:1511.07289.
[21] Havaei M, Davy A, Warde-Farley D, Biard A, Courville A, [33] Klambauer G, Unterthiner T, Mayr A, Hochreiter S. Self-
Bengio Y, et al. Brain tumor segmentation with deep neural normalizing neural networks; 2017, arXiv preprint
networks. Med Image Anal 2017;35:18–31. arXiv:1706.02515.
[22] Kleesiek J, Urban G, Hubert A, Schwarz D, Maier-Hein K, [34] Kingma D, Ba J. Adam: a method for stochastic
Bendszus M, et al. Deep MRI brain extraction: a 3D optimization; 2014, arXiv preprint arXiv:1412.6980.
convolutional neural network for skull stripping. [35] Duchi J, Hazan E, Singer Y. Adaptive subgradient methods
NeuroImage 2016;129:460–9. for online learning and stochastic optimization. J Mach
[23] IXI Dataset. Available from: Learn Res 2011;12(July):2121–59.
http://brain-development.org/ixi-dataset/. [36] Zeiler MD. ADADELTA: an adaptive learning rate method;
[24] Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, 2012, arXiv preprint arXiv:1212.5701.
et al. The Cancer Imaging Archive (TCIA): maintaining and [37] Sutskever I, Martens J, Dahl G, Hinton G. On the importance
operating a public information repository. J Digit Imaging of initialization and momentum in deep learning.
2013;26(6):1045–57. International Conference on Machine Learning. 2013. pp.
[25] Scarpace L, Flanders AE, Jain R, Mikkelsen T, Andrews DW. 1139–47.
Data from REMBRANDT. Cancer Imaging Archive 2015. [38] Deepa SN. Introduction to genetic algorithms. Berlin
[26] Scarpace L, Mikkelsen T, Cha S, Rao S, Tekchandani S, Heidelberg: Springer-Verlag; 2008.
Gutman D, et al. Radiology data from the cancer genome [39] Dietterich TG. Ensemble methods in machine learning.
atlas glioblastoma multiforme [TCGA-GBM] collection. Multiple Classifier Syst 2000;1857:1–15.
Cancer Imaging Archive 2016. [40] Pan Y, Huang W, Lin Z, Zhu W, Zhou J, Wong J, et al. Brain
[27] Pedano N, Flanders AE, Scarpace L, Mikkelsen T, Eschbacher tumor grading based on neural networks and convolutional
JM, Hermes B, et al. Radiology data from the cancer genome neural networks. Engineering in Medicine and Biology
atlas low grade glioma [TCGA-LGG] collection. Cancer Society (EMBC), 37th Annual International Conference of
Imaging Archive 2016. the IEEE. 2015. pp. 699–702.
[28] Cheng J, Huang W, Cao S, Yang R, Yang W, Yun Z, et al.
Enhanced performance of brain tumor classification via

You might also like