Sensors: Relieff-Based Eeg Sensor Selection Methods For Emotion Recognition

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

sensors

Article
ReliefF-Based EEG Sensor Selection Methods for
Emotion Recognition
Jianhai Zhang 1 , Ming Chen 1 , Shaokai Zhao 1 , Sanqing Hu 1, *, Zhiguo Shi 2 and Yu Cao 3
1 College of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, China;
jhzhang@hdu.edu.cn (J.Z.); cming163@163.com (M.C.); lnkzsk@126.com (S.Z.)
2 Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310012,
China; shizg@zju.edu.cn
3 Department of Computer Science, The University of Massachusetts Lowell, Lowell, MA 01854, USA;
ycao@cs.uml.edu
* Correspondence: sqhu@hdu.edu.cn; Tel.: +86-571-8687-8578

Academic Editor: Rongxing Lu


Received: 29 April 2016; Accepted: 14 September 2016; Published: 22 September 2016

Abstract: Electroencephalogram (EEG) signals recorded from sensor electrodes on the scalp can
directly detect the brain dynamics in response to different emotional states. Emotion recognition
from EEG signals has attracted broad attention, partly due to the rapid development of wearable
computing and the needs of a more immersive human-computer interface (HCI) environment.
To improve the recognition performance, multi-channel EEG signals are usually used. A large set
of EEG sensor channels will add to the computational complexity and cause users inconvenience.
ReliefF-based channel selection methods were systematically investigated for EEG-based emotion
recognition on a database for emotion analysis using physiological signals (DEAP). Three strategies
were employed to select the best channels in classifying four emotional states (joy, fear, sadness
and relaxation). Furthermore, support vector machine (SVM) was used as a classifier to validate
the performance of the channel selection results. The experimental results showed the effectiveness
of our methods and the comparison with the similar strategies, based on the F-score, was given.
Strategies to evaluate a channel as a unity gave better performance in channel reduction with an
acceptable loss of accuracy. In the third strategy, after adjusting channels’ weights according to their
contribution to the classification accuracy, the number of channels was reduced to eight with a slight
loss of accuracy (58.51% ± 10.05% versus the best classification accuracy 59.13% ± 11.00% using
19 channels). In addition, the study of selecting subject-independent channels, related to emotion
processing, was also implemented. The sensors, selected subject-independently from frontal, parietal
lobes, have been identified to provide more discriminative information associated with emotion
processing, and are distributed symmetrically over the scalp, which is consistent with the existing
literature. The results will make a contribution to the realization of a practical EEG-based emotion
recognition system.

Keywords: ReliefF; EEG; sensor selection; emotion recognition

1. Introduction
The effort to integrate emotions into a human-computer interaction (HCI) system has attracted
broad attention for a long time and has resulted in a new research field of affective computing [1].
Affective computing aims to give computers the ability to perceive human emotions and then make
proper decisions to respond. Affective computing has been becoming a worldwide research hotspot
in recent years, partly due to the quick development of wearable computing and the urgent needs
of a more immersive HCI environment. The key element in affective computing is to detect human

Sensors 2016, 16, 1558; doi:10.3390/s16101558 www.mdpi.com/journal/sensors


Sensors 2016, 16, 1558 2 of 15

emotions accurately in real time by employing pattern recognition and machine learning techniques.
Although numerous studies have been performed, emotion recognition is still an extremely challenging
task, especially for the purpose of practical usage.
A variety of methods for detecting human emotion have been proposed using physical or
physiological measurements in the past few decades, such as facial expression, speech, body gesture,
electrocardiograph (ECG), skin conductance (SC), electromyogram (EMG), respiration, pulse, etc. [2–4].
Although some encouraging results have been achieved, these measurements are an indirect mapping
of human emotions. Recently, electroencephalogram (EEG)-based emotion recognition has received
increasing attention. In comparison with the aforementioned measurements, EEG technique can
directly detect the brain’s dynamics in response to different emotional states, so it is expected to
provide more objective and comprehensive information for emotion recognition. Furthermore, with
the advantages of non-invasion, low-cost, portability and high temporal resolution, it is possible to use
an EEG-based emotion recognition system for real-world applications.
Although the excellent temporal resolution makes EEG a primary option for a real-time emotion
recognition system, the poor spatial resolution of EEG presents a big challenge for the task. To overcome
this problem, multi-channel EEG sensor signals are used, often acquired from 32, 64 or even more
sensor electrodes distributed on the whole scalp according to the International 10–20 system. It is
supposed that the wealthy sensors can provide more discriminative information of different emotional
states and improve the recognition performance. Unfortunately, the rule of “the more the better” is not
always right in this situation. In fact, additional EEG channels usually include noisy and redundant
channels, which would instead deteriorate the recognition performance [5–7]. Moreover, a large
set of sensor channels will increase the computational complexity and cause users inconvenience.
Two solutions have been used to tackle this problem, consisting of feature selection and EEG sensor
channel selection. So far, related studies are mainly focused on selecting the least useful features to
achieve the best performance. Even if the reduction of features usually leads to a reduction of channels
involved, it cannot guarantee the selected channels are optimum. In addition, reduction in channels is
more important than features reduction from a practical point of view. Fewer channels can not only
reduce the computational complexity and improve the performance, but also facilitate the practical
usage. It is therefore necessary to investigate the problem of sensor channel selection directly.
Various strategies have been used for sensor selection in research of EEG-based emotion
recognition. In some studies, sensors were selected according to the expert’s experience. Bos [8], Zhang
and Lee [9] and Schmidt et al. [10] selected sensors for classifying two emotional states according to
the well-known frontal asymmetry of the alpha power for valence. A set of symmetric sensor electrode
pairs were used to recognize more emotional states by Valenzi et al. [11] and Lin et al. [12], employing
hemisphere asymmetrical characteristics of emotion processing in the human brain. These works
mainly focused on few specific features at specific scalp locations. However, the state-of-the-art
findings in neuroscience recommend investigating the correspondence between emotional states and
whole brain regions [13]. Therefore, finding the optimal subset of sensor channels from the full set
automatically has attracted increasing attention recently. Two main approaches have been used in
sensor selection algorithms for EEG signal processing [14]. One of them was directly based on a feature
selection algorithm. Firstly, a specific feature selection method was used to find the top-N features
according to some criteria, and then the electrodes containing these selected features were chosen.
Wang et al. [15] applied minimum redundancy maximum relevance (mRMR) feature selection method
to find the best electrode positions of TOP-30 features among all 64 electrodes. The average selected
sensor number was 25 with five subjects. Employing a feature selection method based on the F-score
index, Lin et al. [12] reduced the number of required electrodes from 24 to 18 at the expense of a slight
decrease in classification accuracy. The results of this kind of approach showed its effectiveness, but not
its optimum for channel selection in most cases. Another kind of approach for sensor channel selection
in literature adopted a classification algorithm to evaluate the candidate sensor subsets generated by
a search algorithm. In Jatupaiboon et al. [16], support vector machine (SVM) was used to evaluate
Sensors 2016, 16, 1558 3 of 15

a different electrode montage from seven pairs (14 channels) for classifying positive and negative
emotions. Finally, the number of electrodes was reduced from seven pairs to five pairs almost without
any loss of accuracy. With sufficient searching, optimal results could be obtained with this kind of
approach, but it depends on the specific classification algorithm and is computationally expensive.
In the present study, we systematically investigated a channel selection scheme based on
ReliefF [17] in EEG-based emotion recognition. The ReliefF algorithm is a widely used feature selection
method. Different to the aforementioned feature-selection-based channel selection method, after
obtaining the weights of all features by ReliefF, two strategies were used to quantify the importance
of every sensor channel as a unity by considering the weights of all features belonging to this
channel. Then the channels were directly selected according to their quantified importance in emotion
recognition. SVM as classifier was used to evaluate the effectiveness and performance of our methods.
Furthermore, similar strategies using the F-score as a feature selection method were also investigated
and compared.
The rest of this paper is structured as follows. Section 2 presents the material and methods,
including the description of the database for emotion analysis using physiological signals (DEAP),
feature extraction, channel selection based on an ReliefF algorithm, and the an SVM classifier. Section 3
is dedicated to the obtained results. The proposed methods are assessed in the task of classifying
four emotional states with SVM and compared with the F-score methods. The problem of selecting
subject-independent channels for emotion recognition is also investigated. Related discussion is given
in Section 4. Finally, we conclude the proposed approach in Section 5.

2. Materials and Methods

2.1. DEAP Database


This study was performed on the publicly available database DEAP [18], which consists of a
multimodal dataset for the analysis of human emotional states. A total of 32 EEG channels and eight
peripheral physiological signals of 32 subjects (aged between 19 and 37) were recorded whilst watching
music videos. The 40 one-minute long videos were carefully selected to elicit different emotional
states according to the dimensional, valence-arousal, emotion model. The valence-arousal emotion
model, first proposed by Russell [19], places each emotional state on a two-dimensional scale. The first
dimension represents valence, ranged from negative to positive, and the second dimension is arousal,
ranged from calm to exciting. In DEAP, each video clip is rated from 1 to 9 for arousal and valence
by each subject after the viewing, and the discrete rating value can be used as a classification label in
emotion recognition [20–22].
In this study, for simplicity, the preprocessed DEAP dataset in MATLAB format was used to test
our channel selection algorithms for classifying four emotional states (joy: valence ≥5, arousal ≥5; fear:
valence <5, arousal ≥5; sadness: valence <5, arousal <5; relaxation: valence ≥5, arousal <5). In the
preprocessing procedure, the sampling rate of the EEG signal was down sampled from 512 Hz to
128 Hz and a band pass frequency filter from 4.0–45.0 Hz was applied. In addition, electrooculography
(EOG) artifacts have been removed from the EEG signal. For each subject, there are 40 one-minute trials
which were divided into four categories labeled as joy, fear, sadness or relaxation, separately. In our
experiment, about half of the trials for each category were randomly selected for channel selection
(channel-selection dataset), and the rest were used to test the performance of the channel selection
results (performance-validation dataset). To make sure that there are relatively enough data in every
category for the channel selection and the test, only the subjects in their dataset, with a number of
trials for every category that is no less than 5, were considered. Thus, sixteen subjects (1, 2, 5, 7–11,
14–19, 22, 25) were chosen. Candra et al. [23] reported that the effective window size for arousal and
valence recognition using the DEAP database was between 3–10 and 3–12 seconds respectively. So,
each 60-s trial was segmented into 15 4-s samples with non-overlapping for the purpose of increasing
Sensors 2016, 16, 1558 4 of 15

the number of samples. Finally, we got a total of 600 (40 trials × 15 samples) samples for each subject.
All the samples derived from the same trial share the same category label.

2.2. Feature Extraction


Different types of features have been used in EEG-based emotion recognition, including time,
frequency or time-frequency domain features [24]. However, power features from different frequency
bands are still the most popular in the context of emotion recognition. In this work, a series of band
pass filters were used to translate the raw EEG data from 32 channels of each sample to the ta (4–8 Hz),
alpha (8–13 Hz), beta (13–30 Hz) and gamma (30–45 Hz). Then, the power of every specific frequency
band was calculated using a 512-point fast Fourier transform (FFT), so 128 (32 channels × four features)
features were obtained for each sample.
The power of a specific frequency band corresponding to channel T in sample R is calculated as

2
1 N RT
N k∑
bp RT = X N ( k ) , (1)
=1

where, X NRT ( k ) is the FFT of the EEG signals for channel T in sample R, N is the length of FFT and

equals the sample length 512 (4 s). Z-score normalization was adopted for each feature [25]. For a
feature fR belonging to sample R, the normalized value was computed by

fR − µ f
f Rnorm = , (2)
σf

where, µf and σf are, respectively, the mean and the standard deviation of feature f over all samples.

2.3. Channel Selection Based on ReliefF


ReliefF is a widely used feature selection method in classification problems, due to its effectiveness
and simplicity of computation. A key idea of ReliefF is to evaluate the quality of features according to
their abilities to discriminate among samples that are near to each other. The ability is quantified as a
weight of every feature. Channel selection can be performed based on the results of ReliefF feature
selection, just as the typical feature-selection-based channel selection method aforementioned.
In our work, 128 features were used to discriminate four emotional states (joy, fear, sadness,
relaxation) with the channel-selection dataset—about 300 samples of every subject. Firstly, a ReliefF
algorithm was used to rank all the 128 features according to their weights. At the beginning, all the
weights for 128 features were set to zero. For each sample Ri , the k nearest neighbors from the same
class of Ri (nearest hits Hj ) and k nearest neighbors from each of the different classes (nearest misses
Mj (C)) were found in terms of the features’ Euclidean distance between two samples. Then, the weights
W(F) for all features were updated according to Equation (3). The above process was repeated until all
the samples in the channel-selection dataset had been chosen, and the final W(F) was our estimations
of the qualities of 128 features. The whole process is implemented with the following steps:

1. set all the weights of 128 features to zero, W ( F ) = W [ f 1 , f 2 , . . . , f 128 ] := 0.0;


2. for i = 1 to the number of samples in the channel-selection dataset do
3. select a sample Ri ;
4. find k nearest hits Hj ( j = 1, 2, . . . , k );
5. for each class C 6= class(Ri ) do
6. find k nearest misses Mj (C)(j = 1, 2, . . . , k) from class C;
7. end;
8. for l = 1 to 128 do
Sensors 2016, 16, 1558 5 of 15

k
W ( f l ) = W ( f l ) − ∑ di f f ( f l , Ri , Hj )/(m × k)+
j =1
k
; (3)

(C )
[ 1− P(Pclass ( Ri )) ∑ di f f ( f l , Ri , M j (C ))]/(m × k )
C 6=class( Ri ) j =1

9. end;
10. end;
where, the nearby neighbor’s number k is 10, which is safe for most purposes [17] and the prior
probability of class C is P(C) (estimated by the channel-selection sets). Function di f f ( f , R1 , R2 )
calculates the difference between values of the feature f for two samples R1 and R2 , which is defined as:

|value( f , R1 ) − value( f , R2 )|
di f f ( f , R1 , R2 ) = (4)
max( f ) − min( f )

where, value(f,R) is the value of feature f in sample R, max(f ) and min(f ) are respectively the maximum
and minimum of the feature f over all samples.
Through the above described process, it can be easily understood that a large weight means the
feature is important to discriminate between samples and a small one means that it is less important.
Therefore, all features can be ranked in terms of their weights. While selecting channels, the top-N
features were chosen according to the rank of every feature, and then the channels containing these
features were selected. Although the reduction of features usually led to a reduction of channels
involved, the actual effect was not obvious [12].
However, the ultimate goal of sensor channel selection is to achieve the best accuracy for
classification by using the least number of sensors. From this perspective, a natural strategy is
evaluating the importance of channels directly for classification, treating a channel rather than a feature
as a unit. If we accept that the weight of a feature obtained from ReliefF reflects its capability to
discriminate different classes, we can use the weights of all features belonging to a channel to estimate
the channel’s contribution to class discriminability.
We defined the mean of the weights of all features belonging to a channel as this channel’s weight.
In our work, 32 channels were used and each channel contained four features. The weight of channel
T is computed as
1 N
W ( T ) = ∑ W ( f i ), (5)
N i =1

where, W(fi ) is the weight of the i-th feature (fi ) belonging to channel T, and N is the number
of features of channel T. Then, channels can be ranked according to their weights, and the
best channels for the classification task can be easily selected. This method was denoted as
mean-ReliefF-channel-selection (MRCS).
Now, we can select channels, independent of a classifier, for a further classification task. It is
perhaps effective for different kinds of classifiers, but it cannot be guaranteed to be optimal for a specific
classifier. As mentioned in Section 1, a search algorithm can find the optimal channel set for a specific
classifier, but it is computationally expensive. We propose a strategy to iteratively adjust the weights
of channels according to their contribution to classification accuracy for a specific classifier, as in
Figure 1. At the beginning, the weight of every channel is initialized to the mean of the ReliefF weights
of all features belonging to this channel as W 0 (T). Then, on each subject’s channel-selection dataset,
the average accuracy over a varying number of channels is obtained using a specific classifier by
adding the channels one by one according to their weights, labeled as S0 (n) (the average classification
accuracy, when top-n channels are used, n = 1 to 32). Then the contribution of the top n-th channel in
the i-th iteration can be computed as

Si ( n ) − Si ( n − 1 )
Ci (n) = , (6)
Si ( n − 1 )
Sensors
Sensors 2016,
2016, 16, 1558 66 of
of 15
15

this contribution value of a channel could be positive or negative. Then, the weight of channel T is
this contribution
updated as value of a channel could be positive or negative. Then, the weight of channel T is
updated as
Wi ( TW (T )Wi−
) i= W1i (1T  (1
(T))× (1+ , T )),
Ci C1 (in−T1))(n (7)
(7)
where, nTT are
are the
the rank
rank ofof channel
channel TT in
in all
all 32
32 channels.
channels. ItIt means
means that
that the
the weight
weight of
of aa channel
channel increases
increases
when
when its contribution
contribution is positive and decreases in the reverse case. The The above
above process
process is
is repeated
repeated
until the absolute value of the max negative contribution of all channels is less than
until the absolute value of the max negative contribution of all channels is less than ε or iterates ε or iterates 50
times. In this
50 times. paper,
In this ε equals
paper, 0.01.
ε equals Then,
0.01. Then, the importance
the importanceofofchannels
channelswillwillbe
beevaluated
evaluated and
and selected
selected
according
according toto the
the final
final weights
weights W.W. The
The purpose
purpose of of this
this method
method isis to
to further
further optimize
optimize the
the channel
channel
selection
selection performance
performance against
against aa specific
specific classifier
classifier based
based on on MRCS,
MRCS, denoted
denoted asas X-MRCS
X-MRCS (X (X can
can be
be
replaced
replaced with classifier name, such as SVM-MRCS).

Figure
Figure1.
1.The
Theupdate
updateprocess
processof
ofchannels’
channels’ weights
weights considering
considering their
their contribution
contribution to
to classification
classification accuracy.
accuracy.

2.4.
2.4. Classifiers
Classifiers to
to Evaluate
Evaluate the
the Performance
Performance
To
To validate
validateandandcompare
comparethe effectiveness
the effectiveness of the channel
of the channel selection results
selection of ourofmethods,
results SVM
our methods,
was
SVMemployed
was employed as classifiers to recognize
as classifiers to recognizefourfour
emotional
emotional states using
states usingthetheselected
selectedsensors
sensorson on aa
performance-validation dataset. The central idea of SVM is to separate data
performance-validation dataset. The central idea of SVM is to separate data from two classes by finding from two classes by
finding a hyperplane
a hyperplane with the
with the largest largestmargin.
possible possible margin.
The The SVM
SVM classifier classifier
used used was
in our work in our work was
implemented
implemented by LIBSVM (a library for support vector machine developed
by LIBSVM (a library for support vector machine developed by Chang and Lin [26]) with the by Chang and Linradial
[26])
with the radial basis function (RBF) kernel. The one-versus-one strategy
basis function (RBF) kernel. The one-versus-one strategy for multi classification problems was applied.for multi classification
problems
To reducewas applied. To reduce
the computational cost,the
wecomputational cost, we
just let the penalty just let the
parameter be penalty
C and the parameter be C and
kernel parameter
the kernel parameter be γ as default. For each subject, the average classification
be γ as default. For each subject, the average classification accuracy was computed using a scheme accuracy was
computed using a scheme of five 10-fold cross-validations to increase the reliability
of five 10-fold cross-validations to increase the reliability of our results. In a 10-fold cross validation, of our results. In
aall10-fold cross validation,
the validation samples all arethe validation
divided into 10samples
subsets. areNine
divided intoare
subsets 10 used
subsets.
for Nine subsets
training, and are
the
used
remaining one is used for testing. This process was repeated five times with different subset times
for training, and the remaining one is used for testing. This process was repeated five splits.
with different subset
The classification splits.
accuracy wasThe classification
evaluated by theaccuracy
ratio of a was evaluated
correctly by the
classified ratioofofsamples
number a correctly
and
classified number of samples and the total number of samples. Then,
the total number of samples. Then, the average classification accuracy curve, over a varying numberthe average classification
accuracy
of selectedcurve, overacross
channels a varying number was
all 16 subjects, of selected
given tochannels
illustrateacross all 16 subjects,
and compare was given
the performance to
of the
illustrate and compare
channel selection the performance
methods. Furthermore,ofthe therelated
channel selection
results weremethods. Furthermore,
statistically evaluated by thearelated
paired
results were statistically evaluated by a paired
t-test method to further support and validate our conclusion. t-test method to further support and validate our
conclusion.
Sensors 2016, 16, 1558 7 of 15
Sensors 2016, 16, 1558 7 of 15

3. 3.Results
Results

In In
thisthis section,
section, we evaluated
we evaluated our ReliefF-based
our ReliefF-based sensorsensor selection
selection algorithms
algorithms in the
in the task task of
of classifying
classifying four emotional states on a DEAP dataset, with SVM as the classifier.
four emotional states on a DEAP dataset, with SVM as the classifier. Just as mentioned in SectionJust as mentioned in 2,
16 Section
subjects2,(1,162,subjects
5, 7–11, (1, 2, 5,22,
14–19, 7–11,
25) 14–19,
among22, 32 25)
wereamong
chosen 32inwere chosen in ourInexperiment.
our experiment. addition, allInthe
addition, all the trials for each subject were divided into two subsets according to the
trials for each subject were divided into two subsets according to the aforementioned rules. About half
aforementioned rules. About half of the trials were used as the first subset for channel selection, and
of the trials were used as the first subset for channel selection, and the rest were used to validate the
the rest were used to validate the performance of the channel selection results. Three ReliefF-based
performance of the channel selection results. Three ReliefF-based sensor selection strategies were
sensor selection strategies were employed. In the first strategy, after obtaining the weights of all
employed. In the first strategy, after obtaining the weights of all features, sensor channels containing
features, sensor channels containing the top-N features were selected. For the second method MRCS,
the top-N features were selected. For the second method MRCS, the mean of the weights of all features
the mean of the weights of all features belonging to a channel, was used as this channel’s weight and
belonging to a were
all channels channel, wasaccording
ranked used as this channel’s
to their weightsweight and all channels
for selection. were ranked
In SVM-MRCS, channelsaccording
were
to selected
their weights for selection. In SVM-MRCS, channels were selected considering
considering their weights and their contribution to classification accuracy against thetheir weights and
their contribution
classifier to classification
SVM. After obtaining accuracy against weight
every channel’s the classifier SVM.
for each Afteron
subject obtaining every channel’s
the channel-selection
weight for each subject on the channel-selection dataset, the classification accuracy of
dataset, the classification accuracy of selected top-N channels, with five 10-fold cross-validations, selected top-N
on
channels, with five 10-fold cross-validations, on the performance-validation
the performance-validation dataset was achieved. Then, the average classification accuracy dataset wasacross
achieved.
16
Then, the average
subjects classification
over a varying numberaccuracy across
of channels was 16 subjects
given to showover
theaeffectiveness
varying number of ourofmethods.
channelsFor was
given to show the
comparison, effectiveness
similar of our
strategies methods.
were For comparison,
also applied to the similar
F-score strategies
algorithm. were also applied
Finally, the
subject-independent
to the F-score algorithm. channel
Finally,selection problem was investigated.
the subject-independent channel selection problem was investigated.

3.1.3.1.
Subject-Dependent
Subject-DependentChannel
ChannelSelection
Selection
Firstly,the
Firstly, thesubject-dependent
subject-dependent ReliefF
ReliefF weights
weights of of 128
128 features
features were
werecomputed
computedononthethe
channel-selection dataset.
channel-selection dataset. Then SVM
SVM was used to validate the feature selectionresults
was used to validate the feature selection resultsononthethe
performance-validation dataset. Figure 2 shows the average classification accuracy
performance-validation dataset. Figure 2 shows the average classification accuracy using SVM over using SVM over
a varyingnumber
a varying numberofoffeatures
features across
across 16
16subjects.
subjects.TheTheclassification accuracy
classification curve
accuracy was obtained
curve by
was obtained
adding the features one by one, according to their ReliefF weight. Features with
by adding the features one by one, according to their ReliefF weight. Features with larger weight larger weight will be
added first. From Figure 2, it shows that the best classification performance was
will be added first. From Figure 2, it shows that the best classification performance was achievedachieved at a certain
point
at a of the
certain feature
point number
of the feature (point A, 62.59%
number (point±A, 12.81%
62.59% accuracy
± 12.81%using 60 features
accuracy using and6030features
channelsand
involved), and then degraded with the increase of the feature number, just as we had expected. The
30 channels involved), and then degraded with the increase of the feature number, just as we had
results show the effectiveness of ReliefF as a feature selection algorithm in emotion recognition.
expected. The results show the effectiveness of ReliefF as a feature selection algorithm in emotion
However, though the number of features was reduced dramatically, the reduction of the number of
recognition. However, though the number of features was reduced dramatically, the reduction of the
channels involved was not obvious (at point C, 20 features still need 15 channels). Table 1
number of channels involved was not obvious (at point C, 20 features still need 15 channels). Table 1
demonstrates the average number of channels involved and the average classification accuracy
demonstrates
when using the average
top-N number
features acrossof16channels
subjects.involved andTable
In addition, the average classification
2 presents the average accuracy when
proportion
using top-N
of each features
class acrossamong
of features 16 subjects. In addition,
the top-N features. Table
It can2bepresents the average
found that the high proportion of each
frequency bands
class of features among the top-N features. It can be found that the high frequency
(beta, gamma) play a more important role in emotion processing. Similar results have also been bands (beta, gamma)
play a moreinimportant
reported role in emotion
other literatures processing.
[10,16,27], which prove Similar results haveofalso
the significance beenweight
ReliefF reportedto in other
some
literatures
extent. [10,16,27], which prove the significance of ReliefF weight to some extent.

65

60
A(60, 62.59%12.81%)
accuracy(%)

55 30 channels

B(30, 59.26%12.46%)
50 21 channels
C(20, 56.99%11.71%)
45 15 channels

40
0 20 40 60 80 100 120
the number of features

Figure
Figure 2. Average
2. Average classification
classification accuracy
accuracy over
over varying
varying number
number of features
of features across
across 16 subjects
16 subjects with
with SVM.
SVM. Point A achieves the best classification
Point A achieves the best classification accuracy.accuracy.
Sensors 2016, 16, 1558 8 of 15

Table 1. Average results (standard deviation) using top-N features and classifier SVM across 16 subjects.
Sensors 2016, 16, 1558 8 of 15
Top-N Features Average Number of Channels Involved Average Accuracy (%)
4 3.25 (0.77) 51.04 (8.32)
Table 1. Average results (standard deviation) using top-N features and classifier SVM across 16 subjects.
8 6.56 (1.36) 53.47 (11.21)
12 Features
Top-N Average Number9.44of(2.06)
Channels Involved 55.13 (10.77)
Average Accuracy (%)
16 12.13 (2.58) 55.49 (11.31)
4 3.25 (0.77) 51.04 (8.32)
20 8 14.88
6.56 (2.99)
(1.36) 56.99
53.47 (11.71)
(11.21)
24 12 17.50
9.44 (3.39)
(2.06) 57.58
55.13 (12.14)
(10.77)
28 16 12.13 (2.58)
19.50 (3.65) 55.49 (11.31)
58.93 (12.83)
32 20 14.88 (2.99)
21.44 (4.21) 56.99 (11.71)
59.70 (12.38)
24 17.50 (3.39) 57.58 (12.14)
36 28 23.25 (4.09)
19.50 (3.65) 60.69
58.93 (12.74)
(12.83)
40 32 24.50
21.44(3.72)
(4.21) 61.09
59.70 (12.54)
(12.38)
50 36 23.25(2.96)
27.25 (4.09) 60.69 (12.74)
61.86 (12.80)
60 40 24.50 (3.72)
29.63 (2.60) 61.09 (12.54)
62.59 (12.81)
50 27.25 (2.96) 61.86 (12.80)
80 31.13 (1.15) 60.76 (11.54)
60 29.63 (2.60) 62.59 (12.81)
80 31.13 (1.15) 60.76 (11.54)
Table 2. The average proportion of each class of features among the top-N features across 16
subjects.
Table 2. The average proportion of each class of features among the top-N features across 16 subjects.
The Average Proportion of Each Class of Features
Top-N Features The Average Proportion of Each Class of Features
Top-N Features Theta Alpha Beta Gamma
4 Theta
0.03 Alpha0.02 Beta 0.27 Gamma 0.68
8 4 0.04
0.03 0.020.09 0.27 0.18 0.68 0.69
16 8 0.04
0.07 0.090.11 0.18 0.25 0.69 0.57
16 0.07 0.11 0.25 0.57
32 0.10 0.12 0.29 0.49
32 0.10 0.12 0.29 0.49
64 64 0.13
0.13 0.130.13 0.35 0.35 0.39 0.39
96 96 0.20
0.20 0.210.21 0.29 0.29 0.30 0.30
128 128 0.25
0.25 0.250.25 0.25 0.25 0.25 0.25

With all
With all the
thefeatures’
features’weights
weightscomputed
computedabove,
above,we we could
could evaluate
evaluate thethe importance
importance of of channels
channels in
in the emotion classification task directly by means of the weights of all features belonging
the emotion classification task directly by means of the weights of all features belonging to a channel, to a
channel, according
according to the MRCS to the MRCSFigure
method. method. Figure
3 shows the3 average
shows the average classification
classification accuracy usingaccuracy using
SVM over a
SVM over a varying number of channels across 16 subjects. Channels were
varying number of channels across 16 subjects. Channels were added one by one according to their added one by one
according
weights to their(5)).
(Equation weights (Equation
Different (5)).feature-selection-based
to the first Different to the first channel
feature-selection-based
selection method,channel
all the
selectionofmethod,
features all the
the selected featureswould
channels of the be
selected
fed to channels would be fed to a classifier.
a classifier.

60

A(29, 59.98%10.50%)
55
accuracy(%)

B(16, 58.05%9.83%)
C(10, 56.56%8.75%)

50

45
0 5 10 15 20 25 30
the number of channels

Figure 3.
Figure 3. Average
Average classification
classification accuracy
accuracy over
over varying
varying number
number of
of channels
channels using
using SVM
SVM based
based on the
on the
MRCS method. Point A achieves the best classification accuracy.
MRCS method. Point A achieves the best classification accuracy.
Sensors 2016, 16, 1558 9 of 15
Sensors 2016, 16, 1558 9 of 15

In Figure 3, employing the MRCS method, the best classification performance was achieved by
using the
the top
top 29
29channels.
channels.ItItisisworth
worthnoting
notingthat,
that,when
whenthe number
the number of of
channels
channels was
wasreduced
reduced from 29
from
(point A) A)
29 (point to 16
to (point B), B),
16 (point thethe
average accuracy
average accuracyonly slightly
only reduced.
slightly reduced. It shows
It showsgood
goodperformance
performance in
channel
in channel reduction. Comparing
reduction. pointpoint
Comparing C in Figures 2 and 3, there
C in Figures 2 andwas 3, athere
similar wasaverage classification
a similar average
accuracy (56.99%
classification ± 11.71%
accuracy vs. 56.56%
(56.99% ± 11.71% ± 8.75%), but ±the
vs. 56.56% number
8.75%), butofthe
channels
number involved reduced
of channels from
involved
15 to 10. from 15 to 10.
reduced
above two
The above twochannel
channelselection
selectionmethods
methodswerewere independent
independent of of
thethe classifier.
classifier. CanCanwe we improve
improve the
the performance
performance by considering
by considering channels’
channels’ contribution
contribution to classification
to classification accuracy, accuracy, whena using
when using specifica
specific classifier,
classifier, followingfollowing
the process theofprocess
Figure 1?of Figure 41?takes
Figure 4 takes
subject 7 as subject
an example7 as toanillustrate
examplethe to
illustrate
result of thetheadjustment
result of with
the adjustment
SVM as the with SVM
classifier as the classifier
(SVM-MRCS method). (SVM-MRCS method).
All the adjustment was Alldone
the
adjustment
on was done on
the channel-selection the channel-selection
dataset (Figure 4a) accordingdataset (Figure
to the 4a) according
procedure in Figure 1,toand thethe
procedure
adjustment in
Figurewas
result 1, and the adjustment
evaluated result was evaluateddataset
on the performance-validation on thewith
performance-validation
the same classifier (Figure dataset4b).with
All the
same classifier
average (Figureaccuracy
classification 4b). All was
the average
computed classification
using a schemeaccuracy was
of five computed
10-fold using a scheme
cross-validations. of
It can
fiveseen
be 10-fold
fromcross-validations.
Figure 4b that theItperformance
can be seen wasfromimproved
Figure 4bbythat the performance
adjusting was improved
channels’ weights according by
adjusting
to channels’ weights
their contribution according
to classification to their(paired
accuracy contribution to classification
t-test, significant accuracy
difference (paired
p < 0.05 when t-test,
the
significant
number difference
of selected p < 0.05
top-N when isthe
channels number
less of selected top-N channels is less than 20).
than 20).

90 80

85

80 70
accuracy(%)

accuracy(%)

75
60
70

65
50
60 MRCS MRCS
SVM-MRCS SVM-MRCS
55 40
0 5 10 15 20 25 30 0 5 10 15 20 25 30
the number of channels the number of channels
(a) (b)
Figure 4.
Figure 4. SVM
SVMaverage
averageclassification accuracy
classification over
accuracy a varying
over number
a varying of channels
number for subject
of channels 7 based
for subject 7
on MRCS (blue solid line) and SVM-MRCS methods (red hashed line). (a) adjustment
based on MRCS (blue solid line) and SVM-MRCS methods (red hashed line). (a) adjustment on the
on
channel-selection
the dataset;
channel-selection (b) evaluation
dataset; on the
(b) evaluation performance-validation
on the dataset.
performance-validation dataset.

The average classification accuracy over a varying number of channels based on the
The average classification accuracy over a varying number of channels based on the SVM-MRCS
SVM-MRCS method across 16 subjects was shown in Figure 5. It shows that the best classification
method across 16 subjects was shown in Figure 5. It shows that the best classification accuracy can be
accuracy can be achieved by using the top 19 channels (point A). More importantly, the accuracy
achieved by using the top 19 channels (point A). More importantly, the accuracy curve becomes almost
curve becomes almost stable when the top eight channels (point B) were used (with a slight
stable when the top eight channels (point B) were used (with a slight reduction of 1% compared to
reduction of 1% compared to point A using 19 channels). Figure 6 and Table 3 show the comparison
point A using 19 channels). Figure 6 and Table 3 show the comparison of the average performance
of the average performance across 16 subjects between MRCS and SVM-MRCS using SVM. From
across 16 subjects between MRCS and SVM-MRCS using SVM. From the comparison, SVM-MRCS
the comparison, SVM-MRCS demonstrates better performance than MRCS in channel selection
demonstrates better performance than MRCS in channel selection (paired t-test, with significant
(paired t-test, with significant difference p < 0.05 when the number of selected channels is between 6
difference p < 0.05 when the number of selected channels is between 6 and 14). Comparing these three
and 14). Comparing these three channel selection methods, although the best classification
channel selection methods, although the best classification performance of the latter two methods is
performance of the latter two methods is slightly reduced (about 3%), the number of channels was
slightly reduced (about 3%), the number of channels was more dramatically reduced than the first
more dramatically reduced than the first one. Especially, while a lower number of channels (less
one. Especially, while a lower number of channels (less than 10 for example) is needed, MRCS and
than 10 for example) is needed, MRCS and SVM-MRCS have an obvious advantage. However,
SVM-MRCS have an obvious advantage. However, because a specific classifier was employed in the
because a specific classifier was employed in the SVM-MRCS method, computational complexity
SVM-MRCS method, computational complexity increased and the selection result perhaps was not the
increased and the selection result perhaps was not the optimum for other classifiers.
optimum for other classifiers.
Sensors 2016, 16, 1558 10 of 15
Sensors 2016, 16, 1558 10 of 15
Sensors 2016, 16, 1558 10 of 15

60
60

A(19, 59.13%11.00%)

accuracy(%)
55 A(19, 59.13%11.00%)

accuracy(%)
55
B(8, 58.51%10.05%)
B(8, 58.51%10.05%)
50
50

45
45
0 10 5 15 20 25 30
0the 5
10 number
15 of 20 25
channels 30
the number of channels
Figure 5. Average
5. Average classification
classification accuracy
accuracy over a varyingnumber
number of channels using SVM based on the
Figure
Figure 5. Average classification accuracyover
overaavarying
varying number ofofchannels
channels using
using SVMSVM based
based on the
on the
SVM-MRCS method. Point A achieves the best classification accuracy.
SVM-MRCS method. Point A achieves the best classification accuracy.
SVM-MRCS method. Point A achieves the best classification accuracy.
60
60

55
accuracy(%)

55
accuracy(%)

50
50

MRCS
45 MRCS
SVM-MRCS
45 SVM-MRCS
0 5 10 15 20 25 30
0 5 10 number
the 15 of 20 25
channels 30
the number of channels
Figure 6. Average performance comparison across 16 subjects between MRCS and SVM-MRCS using SVM.
Figure 6. Average performance comparison across 16 subjects between MRCS and SVM-MRCS using SVM.
Figure 6. Average performance comparison across 16 subjects between MRCS and SVM-MRCS
using SVM. Table 3. The comparison of MRCS and SVM-MRCS methods using top-N channels.
Table 3. The comparison of MRCS and SVM-MRCS methods using top-N channels.
Average Accuracy (Standard Deviation) Using Top-N channels
Top-N Table 3.MRCSThe Average Accuracy
comparison of MRCS
SVM-MRCS
(Standard Deviation) Using
and SVM-MRCS
Top-N
Top-N
methods MRCS
channels
using top-N channels.
SVM-MRCS
Top-N MRCS SVM-MRCS Top-N MRCS SVM-MRCS
3 51.95 (8.23) 52.86 (8.41) 12 56.93 (8.95) 58.40 (10.20) *
34 51.95
52.94(8.23)
(7.49) 52.86
55.08(8.41)
(8.98) 12
13 56.93 (8.95) 58.40 (10.20) *
Average Accuracy (Standard Deviation) Using57.06
Top-N(9.01)
Channels 58.54 (10.08) *
45 52.94 (7.49)
54.51 (8.30) 55.08 (8.98)
56.39 (8.95) 13
14 57.06 (9.01)
57.38 (9.38) 58.54
58.78(10.08)
(10.27)* *
56Top-N54.51 MRCS
(8.30)
54.66 (8.89) 56.39SVM-MRCS
(8.95)
57.39 (9.52) ** 14
15
Top-N MRCS
57.38 (9.38)
57.79 (9.34)
SVM-MRCS
58.78 (10.27)
58.73 (10.40) *
67 54.66
55.34(8.89)
(8.90) 57.39
57.71(9.52)
(9.53)**** 15
16 57.79
58.05(9.34)
(9.83) 58.73
58.97 (10.40)
(10.75)
3 51.95 (8.23) 52.86 (8.41) 12 56.93 (8.95) 58.40 (10.20) *
78 55.34 (8.90)
(8.92) (7.49)57.71
58.16(9.53)
(9.47)**(8.98) 16 58.05 (9.83) 58.97 (10.75)
4 56.55 52.94 55.08 * 17 13 58.34
57.06(9.63)
(9.01) 58.5459.08 (10.92)
(10.08) *
89 56.55 (8.92) 58.16 (9.47) * 17 58.34 (9.63) 59.08 (10.92)
5 56.62 54.51
(8.98) (8.30) 58.51 (10.05) *
56.39 (8.95) 18 14 58.21
57.38(9.61)
(9.38) 58.7859.10 (11.05)
(10.27) *
910 56.62 (8.98)
(8.75) (8.89)58.51 (10.05) ** 18 58.21 (9.61) 59.10 (11.05)
6 56.56 54.66 58.4357.39
(9.90)(9.52) ** 19 15 58.22
57.79(9.80)
(9.34) 58.73 59.13 (11.00)
(10.40)
10
11 7 56.56 56.53(8.75)
(8.72) (8.90)58.43 (9.90)
58.3157.71
(10.05) ** 19
20 16 58.22
58.09(9.80) 59.13 (11.00)
55.34 (9.53) ** 58.05(9.79)
(9.83) 58.97 58.93 (10.95)
(10.75)
11 56.53 (8.72) 58.31 (10.05) * 20 58.09 (9.79) 58.93 (10.95)
The 8significant56.55 (8.92)
difference 58.16
was tested (9.47)
under the*same number
17 of selected
58.34 (9.63) (paired
channels 59.08 (10.92)
t-test, * p < 0.05,
The significant difference was tested under the same
* number 18 of selected channels
(9.61)(paired59.10
t-test,(11.05)
* p < 0.05,
** p 9< 0.01). 56.62 (8.98) 58.51 (10.05) 58.21
** p <100.01). 56.56 (8.75) 58.43 (9.90) * 19 58.22 (9.80) 59.13 (11.00)
11
3.2. Comparison 56.53
with (8.72)as Feature
F-Score 58.31Selection
(10.05) * Method 20 58.09 (9.79) 58.93 (10.95)
3.2. Comparison with F-Score as Feature Selection Method
The significant difference was tested under the same number of selected channels (paired t-test, * p < 0.05,
F-Score is another widely used simple and effective feature selection method, which also can
** p < 0.01). is another widely used simple and effective feature selection method, which also can
F-Score m
measure the ability of features in discriminating different classes. For a set X k  Rm , k = 1, 2, ..., n, l
measure the ability of features in discriminating different classes. For a set X k  R , k = 1, 2, ..., n, l
(l ≥ 2) is the
3.2. Comparison number
with of as
F-Score classes, nj is
Feature the number
Selection of samples belonging to class j (j = 1, 2, ..., l). The
Method
(l ≥ 2) is the number of classes, nj is the number of samples belonging to class j (j = 1, 2, ..., l). The
F-score value of the i-th feature is defined as
F-score value
F-Score of the i-th
is another feature
widely is defined
used simpleasand effective feature selection method, which also can
measure the ability of features in discriminating different classes. For a set Xk ∈ Rm , k = 1, 2, ...,
n, l (l ≥ 2) is the number of classes, nj is the number of samples belonging to class j (j = 1, 2, ..., l).
The F-score value of the i-th feature is defined as
Sensors 2016, 16, 1558 11 of 15
Sensors 2016, 16, 1558 11 of 15
l

l n j
( X ij  X i ) 2
j 2
j 1
Fi  ∑ nnj ×( X i − X i ) , (8)
j=l 1 j
Fi =
 j
l n j ( Xj ki  Xj i 2)
,2 j (8)

j 1 ∑
k 1 (X − Xi )
ki
j =1 k =1
j
where, X i and X i are the average value of the i-th feature on the class j dataset and the whole
j
where, X i and j X i are the average value of the i-th feature on the class j dataset and the whole dataset,
dataset,
j X ki is the i-th feature of the k-th sample of the class j.
Xki is the i-th feature of the k-th sample of the class j.
Wereplaced
We replacedReliefF
ReliefF with
with the
theF-score
F-scoretotocompute
computethe thefeatures’
features’weights,
weights, and
and implemented
implemented the the
abovethree
above threechannel
channel selection
selection strategies
strategies with
with the newtheF-score
new weights.
F-score weights.
The resultsThe
andresults and the
the comparison
comparison are shown in Figure 7. It can be seen from the comparison that our
are shown in Figure 7. It can be seen from the comparison that our RelifF-based channel selection RelifF-based channel
selection gave
methods methods bettergave better performance
performance (paired
(paired t-test, t-test,
p < 0.05 whenp < 0.05 when the
the number number features
of selected of selected
is
features is less than 77 in the first method (Figure 7a), and p < 0.05 for all 32 cases
less than 77 in the first method (Figure 7a), and p < 0.05 for all 32 cases in the other two methods in the other two
methods
(Figure (Figure 7b,c).
7b,c).

65
A(60, 62.59%12.81%) A(29, 59.98%10.50%)
60
60
accuracy(%)
accuracy(%)

55 A'(68, 59.55%12.08%) 55
A'(30, 59.01%10.45%)
50
50

45
ReliefF ReliefF
F-score 45 F-score
40 0 5 10 15 20 25 30
0 20 40 60 80 100 120
the number of features the number of channels
(a) (b)

60 A(19, 59.13%11.00%)

55
accuracy(%)

A'(28, 58.87%10.72%)

50

45
ReliefF
F-score
40
0 5 10 15 20 25 30
the number of channels
(c)
Figure 7.
Figure 7. Comparison
Comparisonbetween
betweenReliefF and
ReliefF andF-score as aasfeature
F-score selection
a feature method
selection with with
method SVM.SVM.
(a) a
feature-selection-based
(a) channel
a feature-selection-based selection
channel method;
selection (b) a(b)
method; channel selection
a channel method
selection based
method on on
based the
channel’s weight obtained from the mean of the ReliefF or F-score weights of all features belonging
the channel’s weight obtained from the mean of the ReliefF or F-score weights of all features belonging
tothis
to thischannel;
channel;(c)
(c)a achannel
channelselection
selection method
method based
based ononthethe channel’s
channel’s weight
weight adjusted
adjusted according
according to
to the
the channel’s contribution to classification accuracy.
channel’s contribution to classification accuracy.

3.3. Subject-Independent Channel Selection


3.3. Subject-Independent Channel Selection
The above results demonstrate the effectiveness of our methods, but all the experiments were
The above results demonstrate the effectiveness of our methods, but all the experiments were
performed subject-dependently. Channels were selected individually, and different among subjects.
performed subject-dependently. Channels were selected individually, and different among subjects.
How to find common characteristics among different individuals is a natural problem. The study of
How to find common characteristics among different individuals is a natural problem. The study of
Sensors2016,
Sensors 2016,16,
16,1558
1558 12of
12 of15
15

selecting subject-independent channels related to emotion processing is crucial in the design of an


selecting subject-independent
on-line practical channels
emotion recognition related
system. to emotion processing
Furthermore, is crucial
the results can help usintothe design of the
understand an
on-line practical emotion recognition system. Furthermore,
relationship between brain regions and emotional processing. the results can help us to understand the
relationship
In thisbetween
study,brain
to regions and the
evaluate emotional processing.
importance of channels in emotion recognition
In this study, to evaluate
subject-independently, the importance
every channel was of channels
weightedin emotion recognition subject-independently,
by accumulating the weight of the
every channel was weighted by accumulating the
corresponding channels’ 16 subjects, and then ranked. Forweight of the corresponding channels’
purpose of generality, the 16channel’s
subjects,
and thenforranked.
weight Forsubject
a specific the purpose of generality,
was obtained from the the channel’s
mean of the weight
ReliefF for a specific
weights of all subject was
the features
obtained from the mean of the ReliefF weights of all the features belonging
belonging to this channel. The subject-independent weight of the kth channel was computed as to this channel.
The subject-independent weight of the kth channel was computed as
SN
W (Tk )  
SN W (Tsk ) , (9)
W ( Tk ) = ∑ W (Tsk ),
s1 (9)
s =1
where, SN is the number of subjects (16 subjects), W(Tsk) is the weight of the kth channel of the
subjectSN
where, s. is the number of subjects (16 subjects), W(Tsk ) is the weight of the kth channel of the subject s.
Then, the
Then, the average
average accuracy,
accuracy, over
over aa varying
varying number
number ofof the
the weighted
weighted common
common channels
channels inin
classifying four
classifying four emotional
emotional states
states across
across 16
16 subjects,
subjects, was
was given
given with
with classifier
classifier SVM,
SVM, which
which isis also
also
obtainedon
obtained onthethe performance-validation
performance-validation dataset
dataset by 10-fold
by five five 10-fold cross-validations.
cross-validations. The isresult
The result shown is
shown
in Figurein8.Figure
Table 8. Table
4 lists 4 top
the lists15
the top 15 common
common channels channels and
and Figure Figure the
9 shows 9 shows
top 5, the
10, top 5, 10, 15
15 common
common channels’
channels’ locations in locations
a 10–20in a 10–20 system.
system.

60
B(12, 57.67%10.02%)

55
A(32, 58.75%10.78%)
accuracy(%)

50
C(5, 52.32%7.25%)
45

40
0 5 10 15 20 25 30
the number of channels

Figure8.
Figure 8. The
The subject-independent
subject-independent channel
channelselection
selectionresult.
result.

Table4.4.The
Table Thetop
top15
15common
commonchannels
channelsacross
across16
16subjects.
subjects.

Channel Rank Electrode Brain Lobe


Channel Rank Electrode Brain Lobe
1 Fp1 Frontal
21 Fp1 T7 Frontal
Temporal
32 T7
PO4 Temporal
Parietal
3 PO4 Parietal
44 Pz Pz Parietal
Parietal
55 Fp2 Fp2 Frontal
Frontal
66 F8 F8 Frontal
Frontal
77 Oz Oz Occipital
Occipital
8 T8 Temporal
89 P4
T8 Temporal
Parietal
910 O1 P4 Parietal
Occipital
1011 AF4 O1 Occipital
Frontal
1112 FC5AF4 Frontal
Frontal
1213 C3
FC5 Central
Frontal
14 FC2 Frontal
1315 P3 C3 Central
Parietal
14 FC2 Frontal
15 P3 Parietal
Sensors 2016, 16, 1558 13 of 15
Sensors 2016, 16, 1558 13 of 15

(a) (b) (c)


Figure 9. Top 5 (a); 10 (b); 15 (c) subject-independent channels according to a 10–20 system.
Figure 9. Top 5 (a); 10 (b); 15 (c) subject-independent channels according to a 10–20 system.

Figure 8 shows that the average accuracy curve tends to be stable when the top-12 channels
Figure 8 shows
were selected, and that
the the average
average accuracy
accuracy of curve
57.67%tends to be stable
± 10.02% was when the top-12
obtained, channels
compared withwere
the
selected, the average accuracy of 57.67% ±
accuracy of 58.75% ± 10.78% by using the top-32 channels. This result shows the possibility of
and 10.02% was obtained, compared with the accuracy to
design ±
58.75% a 10.78% bysubject-independent
practical using the top-32 channels. Thisrecognition
emotion result shows the possibility
system to design
using less a practical
EEG sensors. In
subject-independent
addition, from Table emotion recognition
4, most top channels system using at
were located less EEG sensors.
frontal In addition,
and parietal lobes andfrom Table 4,
distributed
most top channels were located at frontal and parietal lobes and distributed
symmetrically. This result was consistent with the previous report [12,18]. symmetrically. This result
was consistent with the previous report [12,18].
4. Discussion
4. Discussion
Now, we will explain and discuss several important issues using the results presented above.
Now, we will explain and discuss several important issues using the results presented above.
The necessity of channel selection in EEG-based emotion recognition has been vividly verified by
The necessity of channel selection in EEG-based emotion recognition has been vividly verified by
the results. Using any method proposed in this paper, similar trends were derived which showed
the results. Using any method proposed in this paper, similar trends were derived which showed
that the best classification performance was achieved at a certain point, and then degraded with the
that the best classification performance was achieved at a certain point, and then degraded with the
increase of the channel number involved, just as we had expected. However, it should be noted that
increase of the channel number involved, just as we had expected. However, it should be noted
the decrease after achieving the best performance was slight, if any, so the significance of EEG
that the decrease after achieving the best performance was slight, if any, so the significance of EEG
channel reduction mainly lies in reducing computational complexity and facilitating user
channel reduction mainly lies in reducing computational complexity and facilitating user convenience,
convenience, rather than improving the classification performance. Furthermore, most of the
rather than improving the classification performance. Furthermore, most of the accuracy curves
accuracy curves become almost stable before arriving at the best point. For the perspective of
become almost stable before arriving at the best point. For the perspective of practical usage, the point
practical usage, the point achieving the best classification accuracy may not be the reasonable choice.
achieving the best classification accuracy may not be the reasonable choice. We hope to find a point as
We hope to find a point as a tradeoff between the classification accuracy and the number of features
a tradeoff between the classification accuracy and the number of features (channels) involved. In this
(channels) involved. In this respect, the MRCS and SVM-MRCS methods can give better
respect, the MRCS and SVM-MRCS methods can give better performance in channel reduction with an
performance in channel reduction with an acceptable loss of accuracy. It is easy to understand the
acceptable loss of accuracy. It is easy to understand the better performance of the latter two methods
better performance of the latter two methods in channel selection, because they directly evaluate a
in channel selection, because they directly evaluate a channel as a unity and use all information
channel as a unity and use all information contained in this channel while implementing a
contained in this channel while implementing a classification task. The last method achieved the
classification task. The last method achieved the best performance among these three methods, but it
best performance among these three methods, but it should be noted that a classifier was used to
should be noted that a classifier was used to adjust channels’ weights in channel selection. Although
adjust channels’ weights in channel selection. Although the result derived from the third method is
the result derived from the third method is classifier-dependent, it is still useful in practical
classifier-dependent, it is still useful in practical applications where a classifier is usually determined.
applications where a classifier is usually determined.
From the results of subject-independent channel selection (Table 4 and Figure 9), it is shown
From the results of subject-independent channel selection (Table 4 and Figure 9), it is shown
that, with the exception of the electrodes from the frontal and parietal lobe, electrodes located in the
that, with the exception of the electrodes from the frontal and parietal lobe, electrodes located in the
temporal and occipital lobe also play an important role. This phenomenon is probably related with
temporal and occipital lobe also play an important role. This phenomenon is probably related with
the stimuli type. In a DEAP dataset, emotional states were elicited by using music videos, including
the stimuli type. In a DEAP dataset, emotional states were elicited by using music videos, including
acoustic and visual stimuli which are mainly processed in temporal and occipital regions respectively.
acoustic and visual stimuli which are mainly processed in temporal and occipital regions
It is an interesting problem for further research.
respectively. It is an interesting problem for further research.
5. Conclusions
5. Conclusions
We have systematically investigated ReliefF-based channel selection algorithms for EEG-based
We have systematically investigated ReliefF-based channel selection algorithms for EEG-based
emotion recognition. Three methods were employed to quantify channels’ importance in classifying
emotion recognition. Three methods were employed to quantify channels’ importance in classifying
four emotional states (joy, fear, sadness and relaxation). The experiment results showed that
channels involved in the classification task could be reduced dramatically with an acceptable loss of
Sensors 2016, 16, 1558 14 of 15

four emotional states (joy, fear, sadness and relaxation). The experiment results showed that channels
involved in the classification task could be reduced dramatically with an acceptable loss of accuracy
using our methods. The strategy of evaluating a channel as a unity gave better performance in channel
reduction. In addition, the subject-independent channel selection problem was also investigated and
achieved a good performance. The channels selected subject-independently were mostly located at the
frontal and parietal lobes; they have been identified to provide discriminative information associated
with emotion processing, and these channels are distributed almost symmetrically over the scalp.
The results are important for the practical EEG-based emotion recognition system.

Acknowledgments: This work was supported in part by the National Natural Science Foundation of China
under Grant 61100102, 61473110, in part by the International Science and Technology Cooperation Program of
China under Grant 2014DFG12570, and in part by the U.S. National Science Foundation under Grant 1156639 and
Grant 1229213.
Author Contributions: J.H. Zhang and S.Q. Hu conceived and designed the overall system and algorithms;
M. Chen and S.K. Zhao helped design and implement the algorithms; M. Chen analyzed the data; Z.G. Shi
contributed useful ideas in algorithm designing; Y. Cao contributed a lot in manuscript revision and gave some
useful advice; J.H. Zhang and M. Chen wrote the paper.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Picard, R.W. Affective computing. IGI Glob. 1995, 1, 71–73.
2. Petrushin, V.A. Emotion in Speech: Recognition and Application to Call Centers. Available online:
http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.42.3157&rep=rep1&type=pdf (accessed on
18 September 2016).
3. Picard, R.W.; Vyzas, E.; Healey, J. Toward machine emotional intelligence: Analysis of affective physiological
state. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 1175–1191. [CrossRef]
4. Anderson, K.; McOwan, P.W. A real-time automated system for the recognition of human facial expressions.
IEEE Trans. Syst. Man Cybern. Part B Cybern. A Publ. IEEE Syst. Man Cybern. Soc. 2006, 36, 96–105. [CrossRef]
5. Lal, T.N.; Schröder, M.; Hinterberger, T.; Weston, J.; Bogdan, M.; Birbaumer, N.; Schölkopf, B. Support vector
channel selection in bci. IEEE Trans. Biomed. Eng. 2004, 51, 1003–1010. [CrossRef] [PubMed]
6. Sannelli, C.; Dickhaus, T.; Halder, S.; Hammer, E.M.; Müller, K.R.; Blankertz, B. On optimal channel
configurations for smr-based brain-computer interfaces. Brain Topogr. 2010, 23, 186–193. [CrossRef] [PubMed]
7. Arvaneh, M.; Guan, C.; Ang, K.K.; Quek, C. Optimizing the channel selection and classification accuracy in
EEG-based bci. Biomed. Eng. IEEE Trans. 2011, 58, 1865–1873. [CrossRef] [PubMed]
8. Bos, D.O. EEG-based emotion recognition The Influence of Visual and Auditory Stimuli. Emotion 2006, 1359,
667–670.
9. Zhang, Q.; Lee, M. Letters: Analysis of positive and negative emotions in natural scene using brain activity
and gist. Neurocomputing 2009, 72, 1302–1306. [CrossRef]
10. Schmidt, L.A.; Trainor, L.J. Frontal brain electrical activity (EEG) distinguishes valence and intensity of
musical emotions. Cogn. Emot. 2001, 15, 487–500. [CrossRef]
11. Valenzi, S.; Islam, T.; Jurica, P.; Cichocki, A. Individual classification of emotions using EEG. J. Biomed.
Sci. Eng. 2014, 7, 604–620. [CrossRef]
12. Lin, Y.P.; Wang, C.H.; Jung, T.P.; Wu, T.L.; Jeng, S.K.; Duann, J.R.; Chen, J.H. EEG-based emotion recognition
in music listening. IEEE Trans. Biomed. Eng. 2010, 57, 1798–1806. [PubMed]
13. Hamann, S. Mapping discrete and dimensional emotions onto the brain: Controversies and consensus.
Trends Cogn. Sci. 2012, 16, 458–466. [CrossRef] [PubMed]
14. Alotaiby, T.; El-Samie, F.E.A.; Alshebeili, S.A.; Ahmad, I. A review of channel selection algorithms for EEG
signal processing. EURASIP J. Adv. Signal Process. 2015, 2015, B172. [CrossRef]
15. Wang, X.-W.; Nie, D.; Lu, B.-L. EEG-Based Emotion Recognition Using Frequency Domain Features and
Support Vector Machines. In Proceedings of the International Conference on Neural Information Processing
(ICONIP 2011), Shanghai, China, 13–17 November 2011; pp. 734–743.
Sensors 2016, 16, 1558 15 of 15

16. Jatupaiboon, N.; Pan-Ngum, S.; Israsena, P. Emotion classification using minimal EEG channels and frequency
bands. In Proceedings of the International Joint Conference on Computer Science and Software Engineering,
Khon Kaen, Thailand, 29–31 May 2013; pp. 21–24.
17. Robnik-Šikonja, M.; Kononenko, I. Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn.
2003, 53, 23–69. [CrossRef]
18. Koelstra, S.; Muhl, C.; Soleymani, M.; Lee, J.S.; Yazdani, A.; Ebrahimi, T.; Pun, T.; Nijholt, A.; Patras, I. Deap:
A database for emotion analysis using physiological signals. IEEE Trans. Affect. Comput. 2011, 3, 18–31.
[CrossRef]
19. Russell, J.A. A circumplex model of affect. J. Pers. Soc. Psychol. 1980, 39, 1161–1178. [CrossRef]
20. Frydenlund, A.; Rudzicz, F. Emotional affect estimation using video and EEG data in deep neural networks.
Lect. Notes Comput. Sci. 2015, 9091, 273–280.
21. Pham, T.; Ma, W.; Tran, D.; Tran, D.S. A Study on the Stability of EEG Signals for User Authentication.
In Proceedings of the International IEEE/EMBS Conference on Neural Engineering, Montpellier, France,
22–24 April 2015; pp. 122–125.
22. Liu, Y.; Sourina, O. Real-Time Subject-Dependent EEG-Based Emotion Recognition Algorithm; Springer:
Heidelberg, Germany, 2014; pp. 199–223.
23. Candra, H.; Yuwono, M.; Chai, R.; Handojoseno, A.; Elamvazuthi, I.; Nguyen, H.T.; Su, S. Investigation
of window size in classification of EEG-emotion signal with wavelet entropy and support vector machine.
In Proceedings of the International Conference of the IEEE Engineering in Medicine and Biology Society,
Milan, Italy, 25–29 August 2015; pp. 7250–7253.
24. Jenke, R.; Peer, A.; Buss, M. Feature extraction and selection for emotion recognition from EEG. IEEE Trans.
Affect. Comput. 2014, 5, 327–339. [CrossRef]
25. Schaaff, K.; Schultz, T. Towards emotion recognition from electroencephalographic signals. In Proceedings
of the Affective Computing and Intelligent Interaction and Workshops, Amsterdam, The Netherlands,
10–12 September 2009; pp. 1–6.
26. Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2011,
2, 1–27. [CrossRef]
27. Zheng, W.L.; Lu, B.L. Investigating critical frequency bands and channels for EEG-based emotion recognition
with deep neural networks. IEEE Trans. Auton. Ment. Dev. 2015, 7, 162–175. [CrossRef]

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

You might also like