Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

J. Cent. South Univ.

(2018) 25: 1084−1098


DOI: https://doi.org/10.1007/s11771-018-3808-6

An improved brain emotional learning algorithm for


accurate and efficient data analysis

MEI Ying(梅英)1, 2, TAN Guan-zheng(谭冠政)1

1. School of Information Science and Engineering, Central South University, Changsha 410083, China;
2. Electrical and Information Engineering College, Hunan University of Arts and Science,
Changde 415000, China
© Central South University Press and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Abstract: To overcome the deficiencies of high computational complexity and low convergence speed in traditional
neural networks, a novel bio-inspired machine learning algorithm named brain emotional learning (BEL) is introduced.
BEL mimics the emotional learning mechanism in brain which has the superior features of fast learning and quick
reacting. To further improve the performance of BEL in data analysis, genetic algorithm (GA) is adopted for optimally
tuning the weights and biases of amygdala and orbitofrontal cortex in BEL neural network. The integrated algorithm
named GA-BEL combines the advantages of the fast learning of BEL, and the global optimum solution of GA.
GA-BEL has been tested on a real-world chaotic time series of geomagnetic activity index for prediction, eight
benchmark datasets of university California at Irvine (UCI) and a functional magnetic resonance imaging (fMRI)
dataset for classifications. The comparisons of experimental results have shown that the proposed GA-BEL algorithm is
more accurate than the original BEL in prediction, and more effective when dealing with large-scale classification
problems. Further, it outperforms most other traditional algorithms in terms of accuracy and execution speed in both
prediction and classification applications.

Key words: prediction; classification; brain emotional learning; genetic algorithm

Cite this article as: MEI Ying, TAN Guan-zheng. An improved brain emotional learning algorithm for accurate and
efficient data analysis [J]. Journal of Central South University, 2018, 25(5): 1084–1098. DOI: https://doi.org/10.1007/
s11771-018-3808-6.

very popular due to the characteristics of


1 Introduction self-learning, self-adaptive and high generalization
capability. However, the traditional ANNs have
Data analysis is an important task of machine been proven to have some significant drawbacks,
intelligence in the field of information and it can such as low training speed, high computational
extract useful information from large-scale data, complexity and the convergence rate is hard to meet
which is helpful to provide guidance for the next the requirements of real-time applications. To
decision. Nowadays, many artificial intelligence overcome these deficiencies, researchers have to
methods have been proposed for classification, study new neural networks and new machine
pattern recognition, prediction and fitting problems, learning algorithms.
such as support vector machine (SVM) [1], Thanks to the neurobiology and cognition
k-nearest neighbor (k-NN) [2] and artificial neural research of emotion, emotional intelligence is
network (ANN) [3]. Among these methods, ANN is playing an important role in artificial intelligence in
Foundation item: Project(61403422) supported by the National Natural Science Foundation of China; Project(17C1084) supported by
Hunan Education Department Science Foundation of China; Project(17ZD02) supported by Hunan University of Arts
and Science, China
Received date: 2017−03−20; Accepted date: 2017−10−20
Corresponding author: TAN Guan-zheng, PhD, Professor; Tel: +86–13873646526; E-mail: 63641214@qq.com; ORCID: 0000-0003-
1237-7166
J. Cent. South Univ. (2018) 25: 1084–1098 1085

recent years, and it has attracted an increasing bee colony algorithm (ABC) [15], differential
interest around the world. Several bio-inspired brain evolution (DE) [16] and genetic algorithm (GA)
emotional learning (BEL) models have been [17]. However, the standard ABC suffers from a
proposed and successfully applied in intelligent slow convergence speed for its solution search
engineering applications [4, 5]. These BEL models equation [15]. Although DE is relatively simple and
are based on a computational model called easily to converge, it may easily fall into local
amygdala-orbitofrontal model [6], which was minimum in the searching process [18]. GA tends to
inspired by LEDOUX’s anatomical findings [7] of efficiently explore various regions of the decision
emotional learning mechanism in mammalian brain. space, and find an optimal solution with a high
BEL-based models mimic the high speed of probability [19]. It has been demonstrated that the
emotional learning in brain, which have the performance of neural network can be substantially
superior features of fast learning and quick reacting. improved by optimizing the weights with GA
Thus, they are also widely used in classification, [20–22] and the learning process of network is
prediction and control applications [8–10]. regarded as the process of searching for optimum in
In amygdala-orbitofrontal model [6], the the weight space. Therefore, the present work
reward signal plays an important role to adjust the intends to integrate GA with the novel BEL neural
weights of amygdala and orbitofrontal cortex in network to determine properly weights of the
emotional learning process, but it is not clearly network. The integrated algorithm named GA-BEL
defined so far. Many researchers proposed different takes advantage of the fast learning and low
versions of BEL models based on amygdala- computational complexity of BEL, as well as the
orbitofrontal model, as well as different reward global optimum solution of GA. Thus, GA-BEL is
signal determinations. LUCAS et al [11] proposed expected to achieve better performance than the
the BEL-based intelligent controller (BELBIC) original BEL in data analysis. The GA-BEL has
which had been successfully applied in intelligent been tested on a chaotic time series of geomagnetic
engineering applications. ABDI et al [12] applied a index for real-world prediction, eight datasets of
modified BEL model to predict short-term traffic benchmark university California at Irvine (UCI)
flow and defined the reward signal as the and a functional magnetic resonance imaging
multiplication of some related weights. CHEN et al (fMRI) dataset for classifications. The comparisons
[13] presented a BEL-based controller for a four- of experimental results indicate the superiority of
wheel drive robot. Although these BEL-based the proposed GA-BEL in terms of accuracy and
models achieved success in the applications, they execution speed.
were based on reinforcement learning to adjust the
weights of Amygdala and Orbitofrontal cortex. 2 Related works
These methods are model-sensitive and cannot be
generalized to other issues. LOTFI et al [14] 2.1 Anatomical foundation
proposed a novel BEL-based pattern recognizer The limbic system theory [23] is the neural
(BELPR), instead of employing reward-based basis for emotional brain studies. Figure 1(a) [14]
reinforcement learning, it employed activation shows the limbic system in the brain and its
functions and target values to update the weights of components, including the sensory cortex, thalamus,
BEL network in the learning phase. The BELPR amygdala, orbitofrontal cortex etc. There are two
can be learned by using pattern-target examples and main parts among these components. One is
it is model-free and time-saving. However, the amygdala, which is properly situated to reach the
pattern-target learning method reduced the emotional stimulus and plays a pivotal role in
precision of the process, and the performance of emotional learning process. Another is orbitofrontal
accuracy in data analysis needs to be further cortex, which assists amygdala to process emotional
improved. stimulus. LEDOUX [7] stated that emotional
In this study, we aim to optimize the BEL stimuli can reach amygdala by two different paths,
network and make it more accurate. Recently, as shown in Figure 1(b). One is long and precise
several evolutionary algorithms have been available path, coming from the sensory cortex,
to apply in optimization problems, such as artificial and another is short but imprecise path, coming
1086 J. Cent. South Univ. (2018) 25: 1084–1098
responsible for processing emotional stimulus.
In amygdala-orbitofrontal model, Si is sensory
input; Aj is the internal output of the amygdala; Oj is
the internal output of the orbitofrontal cortex. The
reward signal Rew is used to update weights of
amygdala and orbitofrontal cortex in emotional
learning, and the learning rules are expressed as
follows [6]:
vi   ( Si max(0, Rew   A j )) (1)
j

wi   ( Si  (O j  Rew )) (2)


j

where Δvi and Δwi represent the weights of


amygdala and orbitofrontal cortex, respectively; α
and β are learning rates, which are used to adjust
the learning speed. The reward signal Rew is used to
Figure 1 Limbic system and emotion circuits in brain: adjust the weights in the emotional learning
(a) Limbic system [14]; (b) Emotion circuits process.
Various modified BEL models that are based
directly from the Thalamus. LEDOUX [7] argued on amygdala-orbitofrontal model have been
that due to the existence of short path in the proposed [11, 13], as well as the reward signal
emotional brain, emotional stimuli were processed determination. In Ref. [12], the reward signal Rew is
much faster than normal stimuli. defined as:
Rew =  w j rj (3)
2.2 Previous BEL model j

Motivated by LEDOUX’s anatomical findings where rj stands for the factor of the reinforcement
[7] in the mammalian brain, MORÉN and agent and wj represents the related weight.
BALKENIUS [6] firstly proposed the amygdala- CHEN et al [13] proposed a BEL controller for
orbitofrontal model in 2000 and the framework of a four-wheel drive robot and the reward signal Rew
amygdala-orbitofrontal model is shown in Figure 2. is defined as:
In amygdala-orbitofrontal model, amygdala and
orbitofrontal cortex are two crucial parts of de
Rew =r1e+ r2  edt  r3 (4)
emotional learning and reacting. Amygdala receives dt
emotional stimuli from the Sensory cortex and where r1, r2, r3 stand for the factors of weights; e
Thalamus as well as the external reward signal, and represents the error.
it interacts with the orbitofrontal cortex and reacts Although these BEL models achieved success
to the emotional stimuli based on the reward signal. in the special applications, they applied reward-
The orbitofrontal cortex receives sensory input based reinforcement learning to adjust the weights
from the sensory cortex and evaluates the of amygdala and orbitofrontal cortex, and they were
amygdala’s response to prevent inappropriate model-sensitive and cannot be generalized to other
learning connections. They interact frequently to issues.
mimic the functionality of the emotional brain LOTFI et al [14] firstly proposed the model-
free version of BEL-based pattern recognizer,
which employed the target value (T) of input pattern
in the learning phase and the reward signal Rew is
defined as follows:
Rew = T (5)

Figure 2 Framework of amygdala-orbitofrontal model Thus, the supervised learning rules in BEL
J. Cent. South Univ. (2018) 25: 1084–1098 1087
model are described as follows: neural network is shown in Figure 3. Similar to the
amygdala-orbitofrontal model, it consists of four
v kj 1  (1   )v kj   max(T k  Eak , 0) Pjk ,
common components including thalamus, sensory
j  1, 2,  , n  1 (6) cortex, orbitofrontal cortex and amygdala.
wkj 1  wkj   ( E k  T k ) Pjk , j  1, 2,  , n (7) Amygdala and orbitofrontal cortex are the two main
parts, which are mainly responsible for emotional
where vkj and wkj represent the weights of amygdala learning and response.
and orbitofrontal cortex, respectively; Tk is the The model is presented as multi-input single-
target value associated with the kth input pattern Pk; output architecture, amygdala receives m input
Eka is the internal output of amygdala; Ek is the final patterns S=[S1, S2, …, Sm] from the Sensory cortex
output of the BEL model; γ is the decay rate in the and Ath from the thalamus. Ath is calculated by
amygdala learning rule; α and β are learning rates. Ath  max( S1 , S2 ,  , Sm ) (8)
The model can be employed to learn the
pattern-target relationship of an application by As shown in Figure 3, vi is the amygdala
using BEL algorithm, but this method reduces the weight and ba is the bias of amygdala neuron. For
precision of the process. Thus, we employed GA to each sensory input, there is an amygdala node Ai to
optimally tune the weights of BEL neural network. receive. EA is the internal output of the amygdala
At the same time, we improved BEL neural network and it is calculated by
for prediction and classification applications.
 Ai  Si  vi , i  1, 2,  , m
 (9)
3 Implementation  Am 1  Ath  vm 1
m 1 m

3.1 Single-output BEL neural network for EA   Ai  ba   Si  vi  Ath vm 1  ba (10)


i 1 i 1
prediction
In contrast to the previous BEL-based models, For each sensory input, there is an
in this study, we applied fitness function in GA orbitofrontal cortex node Oi to receive. EO is the
instead of reward-based reinforcement learning to output of the orbitofrontal cortex that is used to
update the weights of amygdala and orbitofrontal inhibit the amygdala’s output, which is calculated
cortex in emotional learning. Therefore, motivated by
by the network in Ref. [9], we deleted the reward Oi  Si  wi , i  1, 2,  , m (11)
signal in the BEL neural network. In addition, m m
according to the biological interaction between EO   Oi  bo   Si  wi  bo (12)
i 1 i 1
amygdala and orbitofrontal cortex in the emotional
learning, we added the bias for each part to make where wi represents orbitofrontal cortex weight; bo
the model more accurate. The improved BEL-based represents the bias of orbitofrontal cortex neuron.

Figure 3 Improved single-output BEL network for prediction


1088 J. Cent. South Univ. (2018) 25: 1084–1098

Finally, the final output is simply calculated by m represents the number of input features; n
E  E A  EO (13) represents the number of sample classes; each
amygdala-orbitofrontal cortex unit interacts
where E is the final output that represents the separately in the learning process.
correct response of amygdala. In this study, the weights and biases of
In the single-output BEL neural network, the amygdala and orbitofrontal cortex in BEL neural
number of input patterns determines the neurons network is optimized by GA. The data analysis with
number in the thalamus and sensory cortex units. GA-BEL can be divided into three parts: BEL
The BEL neural network can be learned by neural network structure determination, parameters
pattern-target examples and it is model free and can optimization and prediction or classification output
be utilized in prediction application. of BEL neural network.

3.2 Multiple-output BEL neural network for 3.3 Optimizing BEL neural network with GA
classification 1) Chromosome representation
The single-output BEL neural network can be For the advantage of high precision, real
extended to multiple-output network for encoding is adopted to acquire optimal results. One
classification. The number of sample classes real number string represents one chromosome,
determines the number of orbitofrontal cortex and which consists of connection weights and biases of
amygdala units. Thus, the extended BEL model can orbitofrontal cortex and amygdala. According to the
be applied in binary and multiple classifications. In structure of BEL neural network, each chromosome
the proposed m–n architecture as shown in Figure 4, is initialized as follows [9]:

Figure 4 Improved multiple-output BEL network for classification


J. Cent. South Univ. (2018) 25: 1084–1098 1089
Ch  [ w1 ,  , wm , bo , v1 ,  , vm 1 , ba ] (14)
cik (n  1)   cik (n)  (1   )c kj (n)
where wm and bo represent orbitofrontal cortex  k k k
(17)
c j (n  1)   c j (n)  (1   )ci (n)
weights and bias, respectively; vm and ba represent
amygdala weights and bias, respectively. The values where cki and ckj represent the chromosomes which
of the weights and biases are chosen in [–1, 1]; occur crossover in the kth bit; n denotes the number
m is the number of input features; the number of of iterations; α is a random number uniformly
genes in each chromosome is 2m+3. distributed in [0, 1].
2) Fitness function (c) Mutation. To further enlarge the searching
Fitness function is used to evaluate the space, mutation is another way to create new
adaptability of each individual in the whole individual. Mutation operation is needed to change
population, and the individual fitness will provide one or some gene values of chromosome. The
reference to selection operation. The learning method of mutation is given as follows:
process aims to minimize the total error for all  c j ( g )  (cij ( g )  cmax ) * f ( g ), r1  0.5
training samples. As a consequence, we select the cij ( g  1)   i
j j
 ci ( g )  (cmin  ci ( g )) * f ( g ), r1  0.5
root mean squared error as the evaluation criterion  2
of weights, and the fitness function is defined as  f ( g )  r2 (1  g / Gmax )
(18)
1 n
Fitness (Chk )   ( Ek  Tk )2
n k 1
(15) where c ji (g) represents the chromosome which
occurs mutation in the jth bit; cmax is the upper limit
where Ek is the response to the kth input pattern of allele; cmin is the lower limit of allele; g
with given weights and biases in Chk, which can be represents current generation; Gmax is the maximum
calculated by Eq. (13); Tk is the target value of kth iteration; r1 and r2 are random numbers within the
input pattern; n is the number of pattern-target pairs. range [0, 1].
3) Operations After the operations of selection, crossover and
(a) Selection. To choose better individual for mutation, the best chromosome can be found which
subsequent iteration, we adopt the roulette wheel represents the best combination of weights and
selection scheme [18]. The fitness of each biases in BEL network. The original weights and
individual corresponds to the proportion of biases will be updated by the genes of the best
selection. There are three steps: Firstly, find out the chromosome, and the trained BEL network can be
best chromosome and the worst chromosome in the used for data analysis.
current generation according to the fitness value. The pseudo-code of GA-BEL is given as
Secondly, copy the best individual if it is better than follows:
the best individual from the previous generation. At
last, replace the worst individual with the best one begin
from the previous generation. The selection t←0; /* t represents the number of current
probability pi for each individual i is given as generation*/
follows: Initialize genetic parameters, cmax, cmin, Gmax,
 fi  k / Fi etc.;
 n Initialize a population P(t) of chromosomes
 (16)
 p i  f i /  fj defined in Eq. (14);
 j 1
for i=1:m /*m represents the number of
where Fi is the fitness value of individual i; k is the samples*/
coefficient; n is the number of individuals in a Calculate BEL network output according to
group. Eq. (8)–Eq. (13);
(b) Crossover. To enlarge the diversity and Evaluate the fitness of P(t) according to
searching space, the crossover operation is needed Eq. (15);
to produce two new individuals by exchanging Obtain the best found result from P(t);
information between the parent individuals [24]. We while termination criterion is not fulfilled do
defined the crossover rules as follows: t←t+1;
1090 J. Cent. South Univ. (2018) 25: 1084–1098
Select P(t) from P(t−1) by selection many data-driven models and it can be downloaded
process according to Eq. (16); from the World Data Center for Geomagnetism,
Alter P(t) by crossover according to Kyoto, Japan. Here, 1000 pattern-target pairs of
Eq. (17) based on the crossover hourly Dst index in year of 2000 are used for online
probability; prediction.
Mutate offspring according to Eq. (18) 4.1.2 Performance evaluation
based on the mutation probability; Prediction performance can be evaluated by
Update the weights and biases of the mean squared error (MSE) and linear correlation
BEL network; coefficient (R), which are expressed as follows [14]:
Calculate BEL network output according 1 n
  Pi  Ti 
2
to Eq. (8)–Eq. (13); MSE  (19)
n i 1
Evaluate the new fitness function of P(t)
n
according to Eq. (15);
Obtain the best found result from P(t)
  Pi  Pmean   Ti  Tmean 
i 1
R (20)
and compare with P(t–1); n  Pstd  Tstd
Replace the worst chromosome of P(t)
The single-output BEL neural network in
with the best result of P(t–1);
Figure 3 can be used in time series prediction
end while
problems. In this study, we considered the first
Output the best chromosome; /* The best
four-sequence samples as input patterns, and the 5th
chromosome represents the combination of
sequence as the target pattern. Consider the
the best weights and biases in BEL network
following time series: Dst1, Dst2, Dst3,…, Dstt, …,
according to Eq.(14)*/
Dstt+1 is the predicted value at time t+1 which can
end
be calculated by
4 Simulation results Dst(t  1)  E (Dst(t ), Dst(t  1), Dst(t  2), Dst(t  3))
(21)
In this section, two case studies are where E is the final output of single-output BEL
constructed to evaluate the performance of the neural network which can be calculated by Eq. (13).
proposed GA-BEL algorithm. The first experiment 4.1.3 Results and discussions
is built to show the performance of GA-BEL in The single-output BEL neural network can be
prediction. The second experiment is arranged to used in time series prediction problems. For the
test the GA-BEL on classification. The comparative BEL network, there are four features as sensory
experiments are also carried out in the two cases. input, and one output of amygdala-orbitofrontal
Both experiments are performed in MATLAB cortex unit. Thus, the input nodes, hidden nodes and
R2013a running in Intel core-i7 3.4GHz CPU with output node are set to 4, 3, 1, respectively. The
8.00 GB RAM and Windows 7 operation system. initial weights and biases of amygdala and
orbitofrontal cortex units are randomly selected
4.1 Case 1: Experimental results on prediction between [–1, 1]. In GA, the population groups are
4.1.1 Dataset description set to 200 and each population size is set to 11; the
In this work, one real-world prediction maximal generation is set to 100; the crossover and
problem has been studied. The geomagnetic storm the mutation rates are 0.8 and 0.001, respectively.
disturbs the earth’s magnetosphere and causes Finally, the results of prediction for the Dst index
harmful damage to the ground-based are presented in Figure 5.
communication. Therefore, it is essential to predict Figure 5(a) shows the MSE corresponding to
the geomagnetic activity indices for alert systems each epoch during the evolution. As illustrated in
development. The disturbance storm time (Dst) Figure 5(a), the best MSE is obtained at 100 epoch
index [25] is one of the main indices of after the process is in steady state. Figure 5(b)
geomagnetic storms, which is usually defined to illustrates the predicted versus desired output of the
measure the intensity of geomagnetic storms. Dst is Dst index. Target on the X-axis represents the
a chaotic time series which has been utilized by desired output. Output on the Y-axis represents the
J. Cent. South Univ. (2018) 25: 1084–1098 1091

Figure 5 Prediction results of Dst index with GA-BEL: Figure 6 Prediction results of Dst index with BEL:
(a) Mean squared error; (b) Linear correlation analysis of (a) Mean squared error; (b) Linear correlation analysis of
training samples; (c) Linear correlation analysis of training samples; (c) Linear correlation analysis of
testing samples testing samples

actual output of the GA-BEL model. The linear As illustrated in Figure 6(a), the best MSE is
relationship between target and output is also given obtained at epoch 2. After the process is in steady
on the left side. As illustrated in Figures 5(b) and state, Figure 6(b) illustrates the predicted versus
(c), the values of R are 0.95897 and 0.97128 in the desired output of the Dst index. As illustrated in
training and testing samples, respectively. It Figure 6(b), the BEL obtained R=0.94462 in testing
indicates that the linear correlation between the samples, which is much less than R=0.97128 of
output and the desired value is very good in the GA-BEL in testing samples.
testing samples. We also compared GA-BEL with the
For comparison, we apply the original BEL to traditional multilayer perceptron (MLP) [26].
test on the same Dst index dataset. Figure 6 Table 1 presents the detailed MSE and R obtained
presents the prediction results in steady states. from the GA-BEL, BEL and LM-BP in the testing
1092 J. Cent. South Univ. (2018) 25: 1084–1098
samples. It is obvious that fast training is the main information for each subject, including age, gender,
feature of BEL-based algorithms compared to verbal IQ, performance IQ, etc. There are several
LM-BP algorithm [26]. Although the original BEL imaging sites: NeuroImage (NI), New York
needs less epochs to reach the steady state, University Medical Center (NYU), Peking
GA-BEL achieves higher correlation and lower University (Peking), etc. We adopted the widely
mean square error, which implies that GA-BEL is used ADHD-200 dataset provided by Peking
more accurate than BEL in prediction. University and the description is shown in Table 3.

Table 1 Comparisons of GA-BEL, BEL and LM-BP on Table 3 ADHD-200 dataset description
Dst prediction Control subject ADHD subject
Total No.
Model Learning Epoch MSE R of image Female Male Average Female Male Average
Age Age
ENN BEL 2 0.2086 0.94462 194 46 68 11 10 70 12
ENN GA-BEL 100 0.0014 0.97128
MLP LM-BP 783 4.1075 0.92183
We preprocessed the ADHD-200 dataset using
SPM8 toolbox [29], and chose 125 samples (68
4.2 Case 2: Experimental results on classification
controls, 57 ADHD) for this study. After feature
4.2.1 Datasets description
extraction based on wavelet transform and
UCI [27] datasets are usually used to evaluate
reduction [30], the feature vectors are sent to the
the performance of the algorithm. In this study, we
chose eight benchmark datasets for testing, proposed GA-BEL for classification. By combining
including binary and multiclass datasets, which are all ADHD subtypes in one category, the
of relatively high or low dimensions, large or small classification on ADHD-200 dataset is posed as a
sizes, and the details are summarized in Table 2. two-class classification, that is, ADHD’s subjects
and normal controls.
Table 2 Datasets description 4.2.2 Performance evaluation
Attribute Dataset Sample Feature Class Classification performance can be evaluated
by the confusion matrix as described in Figure 7, in
Iris 150 4 3
Low dims, which measures such as accuracy, precision and
small size Breast
699 9 2
Cancer recall are commonly used to assess the performance
Low dims, Banana 5300 2 2 of bankruptcy classification systems. Among them,
large size SVMguide1 7089 4 2 the classification accuracy (row 3-column 3, blue
High dims, Heart 270 13 2 area) is the main evaluation index, which is
small size Wine 178 13 3 calculated as follows [30, 31]:

High dims, Satimage 6435 36 6 TP  TN


Accuracy   100% (22)
large size Segment 2310 19 7 TP  FP  TN  FN

where TP is the number of true positives; TN is the


The fMRI data contain numerous number of true negatives; FP is the number of false
spatiotemporal observations, which can reflect
changes of functional networks in the brain. FMRI
data are always used for investigating healthy brain
function and detecting abnormalities. Attention
deficit hyperactivity disorder (ADHD) is one of the
most common diseases in young children. In this
study, the fMRI dataset called “ADHD-200” is used
for fMRI classification problem, which can be
downloaded from the competition website [28]. The
ADHD-200 data are time-series of 3D images of
which size is 49*58*47. It consists of resting state
fMRI data as well as different phenotypic Figure 7 Confusion matrix
J. Cent. South Univ. (2018) 25: 1084–1098 1093
positives; FN is the number of false negatives [30,
31]. Taking the classification on Breast Cancer
dataset for example, the detailed explanations of TP,
TN, FP and FN are given as follows:
TP: It means that some breast cancer cases
(positives) are correctly classified as patients with
breast cancer.
TN: It means that some healthy persons
(negatives) are correctly classified as healthy
persons.
FP: It means that some healthy persons are
incorrectly classified as patients with breast cancer.
FN: It means that some cases with breast
cancer are incorrectly classified as healthy persons.
The performance of execution speed can be
evaluated by the computing time in training and
testing process.
4.2.3 Classification results
1) Classification on Breast Cancer dataset
As mentioned above, the input patterns
determine the number of neurons in Sensory cortex
units, and the classes of samples determine the
number of output units. Therefore, the learning
parameters vary according to the samples. For the
Breast Cancer dataset, parameters setting is
separated for BEL and GA, as shown in Table 4. We
used 70% of samples as training data, and the
remaining 30% of samples were used to serve the
validation and test purposes. The simulation results
are shown in Figure 8.

Table 4 Parameters setting for GA-BEL algorithm


(Breast Cancer dataset)
Algorithm Parameter Setting
Training sample 489
Testing sample 210
Input node 9
BEL
Hidden node 5
Output node 2
Random initial weights [–1, 1]
Population group 200
Figure 8 Classification results of GA-BEL (Breast
Maximal generation 100
Cancer dataset): (a) Best and mean fitness during
GA Population size 21 evolution; (b) Confusion matrix of training samples;
Crossover probability 0.8 (c) Confusion matrix of testing samples
Mutation probability 0.01
value, GA-BEL obtains the best chromosome. The
As shown in Figure 8(a), GA-BEL comes to confusion matrices in Figures 8(b, c) show the
convergence at iteration 100 where the value of classification accuracies (row 3-column 3, blue
mean fitness is nearly equal to the best fitness area) are 96.1% and 97.6% in training and testing
1094 J. Cent. South Univ. (2018) 25: 1084–1098
samples, respectively. The precision and recall are
also given in the two confusion matrices. The whole
process finished after 50 trials have been conducted,
and finally the average results were recorded.
2) Classification on Heart dataset
To evaluate the GA-BEL on relatively small
sizes and high dimensions dataset, we chose Heart
dataset for test. Heart dataset has 270 samples with
13 features. According to the features of Heart
dataset, the input nodes, hidden nodes and output
nodes are respectively set to 13, 6, 2. We used 70%
of samples as training data, while the rest is used to
serve the validation and classification purposes. In
GA, the population groups are set to 850 and each
population size is set to 29, the crossover and the
mutation rates are 0.7 and 0.03, respectively. To
find the best value of the iteration number, different
numbers of iterations from 100 to 900 were tried.
Finally, GA-BEL achieves the best performance
when the iteration number is equal to 700. Figures 9
and 10 show the results when the maximal
generations are set to 500 and 700, respectively.
Compared with Figures 9(a) and 10(a), the
best fitness value is improved. According to the
confusion matrices in Figures 9(b, c) and 10(b, c),
we can see that the GA-BEL achieves the testing
accuracy (row 3-column 3, blue area) with 86.4%
and 88.9% in 500 generations and 700 generations,
respectively. We obtained the average accuracy and
computing time after 50 trials have been conducted.
Next, GA-BEL was tested on Banana, Iris,
SVMguide1, ADHD-200, etc. in turn. For each
problem, 50 trials have been conducted, and finally
the average accuracies were recorded, as well as the
computing time.
4.2.4 Comparisons and discussions
The performance of GA-BEL is compared
with SVM and the original BEL on the same
datasets. The SVM is performed using the SVM
toolbox [32] with Gaussian kernel. 50 trials have
been conducted for each algorithm. The results in
terms of average accuracy and computing time are Figure 9 Classification results of GA-BEL (Heart dataset,
recorded. Detailed comparison results are given in 500 generations): (a) Best and mean fitness during
Table 5. For each dataset, the best accuracies and evolution; (b) Confusion matrix of training samples;
the shortest training time are highlighted. (c) Confusion matrix of testing samples
As observed from Table 5, comparing to SVM,
the BEL-based methods have the superior feature of training of SVM involves a quadratic programming
fast training for they mimic the high speed of problem, so the computational complexity is
emotional processing in the emotional brain, and usually high. Thus, the execution speed is lower
the computational complexity is low. While the than BEL-based method. Although, SVM achieves
J. Cent. South Univ. (2018) 25: 1084–1098 1095
Orbitofrontal cortex in the BEL neural network, and
the BEL network can be substantially improved
after optimization. Thus GA-BEL is more accurate
than BEL. Moreover, for the GA-BEL may
encourage a grouping effect, it requires less training
time for large size samples, so it is more efficient
when dealing with large-scale classification
problems.
For further comparison, we list the
performance of the previous methods which
investigated on the classification problem and seven
datasets are specially chosen including high or low
dimensions, large or small sizes. For each dataset,
we report the testing accuracy and average
computing time of each method. As shown in
Table 6, GA-BEL achieves better classification
accuracy and faster execution speed than the
previous methods for most datasets.

5 Conclusions

An improved brain emotional learning


algorithm (BEL) based on GA is proposed. The
proposed GA-BEL algorithm mimics the high speed
of emotional learning mechanism in the brain,
which has the superior features of fast learning and
low computational complexity. In addition, GA is
applied to optimize the weights and biases of
amygdala and orbitofrontal cortex in BEL neural
network, which makes the BEL model more
accurate.
A modified version of BEL neural network
based on amygdala-orbitofrontal model is
developed for prediction, as well as the extended
architecture for classification. Instead of using
reward-based reinforcement learning to adjust the
learning rules, in this study, GA-BEL is employed
to learn the pattern-target relationship of an
application, it is model-free and can be generalized
to classification, prediction and pattern recognition
Figure 10 Classification results of GA-BEL (Heart applications.
dataset, 700 generations ): (a) Best and mean fitness Two case studies were carried on benchmark
during evolution; (b) Confusion matrix of training prediction and classification problems, a chaotic
samples; (c) Confusion matrix of testing samples time series of geomagnetic index for real-world
prediction application, eight benchmark UCI
higher accuracies in few cases. datasets and a fMRI dataset for classification
Compared with the original BEL algorithm, application. Results demonstrate the proposed GA-
GA-BEL shows a significant improvement in terms BEL achieves better generalization performance at
of classification accuracy. GA-BEL employs GA to faster execution speed compared to other
optimize the initial weights of amygdala and algorithms for most datasets.
1096 J. Cent. South Univ. (2018) 25: 1084–1098

Table 5 Performance comparison of different algorithms


SVM BEL GA-BEL
Dataset Accuracy/% Accuracy/% Accuracy/%
Time/s Time/s Time/s
Training Testing Training Testing Training Testing
Breast cancer 98.1 97.3 5.41× 10−2 93.6 95.8 9.45× 10−3 96.3 97.5 7.51× 10−3
Iris 99.2 94.9 2.64× 10−1 98.6 98.5 2.17× 10−2 98.7 98.3 2.91× 10−2
Banana 89.8 88.3 9.37× 10−1 92.6 90.3 7.17× 10−2 91.7 91.5 6.31× 10−2
SVMguide1 96.6 96.3 5.37 97.1 97.3 1.15 97.8 97.5 4.85× 10−1
Heart 85.3 83.2 3.71× 10−2 84.6 85.2 2.52× 10−3 86.1 88.3 2.93× 10−3
−2 −2
Wine 98.5 98.3 3.02× 10 96.6 95.1 2.83× 10 97.4 97.2 3.69× 10−2
Satimage 92.1 90.3 12.50 92.5 91.1 3.51 93.6 91.7 2.27
−1
Segment 97.8 91.3 5.15 97.2 95.6 3.67× 10 97.9 96.9 2.12× 10−1
ADHD-200 66.2 64.7 4.13 76.5 74.1 3.95× 10−2 80.7 79.3 2.73× 10−2

Table 6 Classification results obtained with proposed method and previous methods
Dataset Sample Study Algorithm Accuracy/% Time/s
Breast ZHANG et al [33] o-GSCCA 62.9 1.64×10−2
699
Cancer This study GA-BEL 97.5 7.51×10−3
LUO et al [34] L1-L2-ELM 98.0 9.72×10−1
Iris 150
This study GA-BEL 98.3 2.91×10−2
HUANG et al [35] ELM( Gaussian) 89.8 4.69×10−2
Banana 5300
This study GA-BEL 91.5 6.31×10−2
BAI et al [36] SELM Gaussian) 84.4 4.20×10−3
Heart 270
This study GA-BEL 88.3 2.93×10−3
LUO et al [34] L1-L2-ELM 98.3 1.89
Wine 178
This study GA-BEL 97.2 3.69×10−2
BAI et al [36] SELM (Gaussian) 90.1 2.41
Satimage 6435
This study GA-BEL 91.7 2.27
RIAZ et al [37] SVM 64.7 4.13
ADHD-200 125
This study GA-BEL 79.3 2.73×10−2

This study introduces emotional intelligence radiation necrosis based on texture analysis in MRI [J].
into artificial intelligence, which opens a lot of Journal of Magnetic Resonance Imaging, 2015, 42(5):
1362–1368.
research gates for the bio-inspired research. In
[2] YAMASHITA Y, WAKAHARA T. Affine-transformation and
future work, other optimization method can be 2D-projection invariant k-NN classification of handwritten
employed, e.g., particle swarm optimization (PSO), characters via a new matching measure [J]. Pattern
which employs different strategies and Recognition, 2016, 52(C): 459–470.
computational effort to find a solution. Moreover, it [3] SHI Tian, KONG Jian-yi, WANG Xing-dong, LIU Zhao,
is appropriate to combine GA with PSO to further ZHENG Guo. Improved Sobel algorithm for defect detection
improve the performance of the BEL network in the of rail surfaces with enhanced efficiency and accuracy [J].
Journal of Central South University, 2016, 23(11):
future investigation, and apply them in real-time
2867–2875.
applications.
[4] KHOOBAN M H, JAVIDAN R. A novel control strategy for
DVR: Optimal bi-objective structure emotional learning [J].
References International Journal of Electrical Power & Energy Systems,
2016, 83: 259–269.
[1] LARROZA A, MORATAL D, PAREDES-SÁNCHEZ A. [5] SHARMA M K, KUMAR A. Performance comparison of
Support vector machine classification of brain metastasis and brain emotional learning-based intelligent controller
J. Cent. South Univ. (2018) 25: 1084–1098 1097
(BELBIC) and PI controller for continually stirred tank algorithm for neural network ensemble [J]. Lecture Notes in
heater (CSTH) [J]. Lecture Notes in Electrical Engineering, Computer Science, 2004, 3173: 323–331.
2015, 335: 293–301. [21] WU Jian-shen, LONG Jin, LIU Ming-zhe. Evolving RBF
[6] MORÉN J, BALKENIUS C. A computational model of neural networks for rainfall prediction using hybrid particle
emotional learning in the Amygdala [C]// Proceedings of the swarm optimization and genetic algorithm [J].
6th International Conference on the Simulation of Adaptive Neurocomputing, 2015, 148(2): 136–142.
Behaviour. MIT Press, 2000: 115–124. [22] HOSSEINI Z, NAKHAE I M. Estimation of groundwater
[7] LEDOUX J E. Emotion circuits in the brain [J]. Annual level using a hybrid genetic algorithm-neural network [J].
Review of Neuroscience, 2000, 23: 155–184. Pollution, 2015, 1(1): 9–21.
[8] SHARAFI Y, SETAYESHI S, FALAHIAZAR A. An [23] LEDOUX J E. Emotion and the limbic system concept [J].
improved model of brain emotional learning algorithm based Concepts in Neuroscience, 1991, 2: 169–199.
on interval knowledge [J]. Journal of Mathematics and [24] SRINIVAS M, PATTANAIK L M. Genetic algorithms: A
Computer Science, 2015, 14: 42–53. survey [J]. Computer, 1994, 27(6): 17–27
[9] LOTFI E. Wind power forecasting using emotional neural [25] LOTFI E, AKBARZADEH-T M R. Adaptive brain
networks [C]// Proceedings of the IEEE International emotional decayed learning for online prediction of
Conference on Systems, Man and Cybernetics. San Diego, geomagnetic activity indices [J]. Neurocomputing, 2014,
USA: MIT Press, 2014: 311–316. 126(3): 188–196.
[10] SHARBAFI M A, LUCAS C, DANESHVAR R. Motion [26] HAGAN M T, DEMUTH H B, BEALE M. Neural network
control of omni-directional three-wheel robots by brain- design [M]. Beijing: China Machine Press, 2002: 357.
emotional-learning-based intelligent controller [J]. IEEE [27] UCI machine learning repository [EB/OL]. [2017–03–02].
Transactions on Systems Man & Cybernetics Part C, 2010, http://archive.ics. uci.edu/ml.
40(6): 630–638. [28] ADHD-200 database. [2017–03–02]. http://fcon_1000.projects.
[11] LUCAS C, DANIAL S, NIMA S. Introducing Belbic: Brain nitrc.org/indi/adhd200/.
emotional learning based intelligent controller [J]. Intelligent [29] SPM toolbox [2017–03–02]. http://www.fil.ion.ucl.ac.uk/
Automation & Soft Computing, 2004, 10(1): 11–21. spm/.
[12] ABDI J, MOSHIRI B, ABDULHAI B, SEDIGH A K. [30] TAN Ying, ZHANG Tao, TAN Rui, SHEN Xiao-tao, XIAO
Forecasting of short-term traffic-flow based on improved Jing-zhong. Classification based Wavelet Translate and SVM
neuro-fuzzy models via emotional temporal difference in the ADHD [J]. Journal of University of Electronic Science
learning algorithm [J]. Engineering Applications of Artificial and Technology of China, 2015, 44(5): 789–794.
Intelligence, 2012, 25(5): 1022–1042. [31] ZUO Wan-li, WANG Zhi-yan, LIU Tong, CHEN Hui-ting.
[13] CHEN Jian-ping, WANG Jian-bin, YANG Yi-min. Velocity Effective detection of Parkinson’s disease using an adaptive
compensation control for a four-wheel drive robot based on fuzzy k-nearest neighbor approach [J]. Biomedical Signal
brain emotional learning [J]. CAAI Transactions on Processing & Control, 2013, 8(4): 364–373.
Intelligent Systems, 2013, 8(4): 361–366. [32] LIBSVM: A library for support vector machines [EB/OL].
[14] LOTFI E, AKBARZADEH T M R. Brain emotional [2017–03-05]. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
learning-based pattern recognizer [J]. Cybernetics & Systems, [33] ZHANG Zhao, ZHAO Ming-bo. Binary- and multi-class
2013, 44(5): 402–421. group sparse canonical correlation analysis for feature
[15] CUI Lai-zhong, LI Geng-hui , LIN Qiu-zhen , DU Zhi-hua, extraction and classification [J]. IEEE Transactions on
GAO Wei-feng, CHEN Jian-yong, LU Nan. A novel artificial Knowledge & Data Engineering, 2013, 25(10): 2192–2205.
bee colony algorithm with depth-first search framework and [34] LUO Xiong, CHANG Xiao-hui, BAN Xiao-juan. Regression
elite-guided search equation [J]. Information Science, 2016 and classification using extreme learning machine based on
(367, 368): 1012–1044. L1-norm and L2-norm [J]. Neurocomputing, 2016, 174:
[16] CUI Lai-zhong, LI Geng-hui, LIN Qiu-zhen, CHEN 179–186.
Jian-yong, LU Nan. Adaptive differential evolution [35] HUANG Guang-bin, ZHOU Hong-ming, DING Xiao-jian,
algorithm with novel mutation strategies in multiple ZHANG Rui. Extreme learning machine for regression and
sub-populations [J]. Computers & Operations Research, multiclass classification [J]. IEEE Transactions on Systems
2016, 67: 155–173. Man & Cybernetics: Part B, Cybernetics, 2012, 42(2):
[17] HOLLAND J H. Adaptation in Natural and Artificial 513–529.
Systems [M]. Cambridge, UK: MIT Press, 1992. [36] BAI Zuo, HUANG Guang-Bin, WANG Dan-wei. Sparse
[18] DAS S, ABRAHAM A, KONAR A. Automatic clustering extreme learning machine for classification [J]. IEEE
using an improved differential evolution algorithm [J]. IEEE Transactions on Cybernetics, 2014, 44(10): 1858–1870.
Transactions on Systems, Man, and Cybernetics-Part A: [37] RIAZ A, ALONSO E, SLABAUGH G. Phenotypic
Systems and Humans, 2008, 38(1): 218–237. integrated framework for classification of ADHD using
[19] COOK D F, RAGSDALE C T, MAJOR R L. Combining a fMRI [C]// Proceedings of the 13th International Conference
neural network with a genetic algorithm for process Image Analysis and Recognition. Springer International
parameter optimization [J]. Engineering Applications of Publishing, 2016: 217–225.
Artificial Intelligence, 2000, 13(4): 391–396. (Edited by YANG Hua)
[20] SHEN Z Q, KONG F S. Optimizing weights by genetic
1098 J. Cent. South Univ. (2018) 25: 1084–1098

中文导读

基于改进大脑情感学习算法的有效数据分类

摘要:提出了采用遗传算法优化大脑情感学习模型的方法。大脑情感学习(BEL)模型是一种计算模型,
由 Morén 等人于 2000 年根据神经生理学上的发现提出。该模型根据大脑中杏仁体和眶额皮质之间的
情感学习机制建立,不完全地模拟了情感刺激在大脑反射通路中的信息处理过程。大脑情感学习模型
具有结构简单、计算复杂度低、运算速度快的特点。为了进一步提高模型的精度,采用遗传算法优化
调整大脑情感学习模型的权值,构造具有强泛化能力的大脑情感学习数据分析模型,并用于数据预测
与数据分类两方面。在数据预测方面,采用典型的磁暴环电流指数 Dst 时间序列作为测试数据。实验
结果表明,从均方差 MSE 和线性相关性 R 指标来看,GA-BEL 算法的误差小、相关度高,说明该算法用
于预测的有效性。在分类方面,采用 8 个典型的 UCI 数据集和一个典型的头部磁共振图像数据集(fMRI)
作为测试数据。分类实验结果表明,GA-BEL 算法的分类正确率高,运算速度快于传统算法,说明该算
法用于分类的有效性。

关键词:预测;分类;大脑情感学习;遗传算法

You might also like