Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

The 2017 4th International Conference on Systems and Informatics (ICSAI 2017)

EEG-based emotion recognition using empirical


wavelet transform

Dan Huang, Suhua Zhang Yong Zhang*


School of Computer and Information Technology School of Computer and Information Technology
Liaoning Normal University Liaoning Normal University
Dalian, China Dalian, China
*
Corresponding author. E-mail: zhyong@lnnu.edu.cn.

Abstract—Emotion recognition has a prominent status in the dataset [5] are carried out on the dimensional model.
applications of brain-machine interface. An approach on Accordingly, this paper also utilizes the dimensional model to
recognizing Electroencephalography (EEG) emotion using represent emotions.
empirical wavelet transform (EWT) and autoregressive (AR)
model is given in this paper. The proposed method chooses two Many methods are developed for the recognition of human
channels in a certain time segment to perform feature extraction. emotion according to dimensional representation. Facial
The EWT is first used to decompose EEG-based emotion data expression, speech analysis and EEG signal are employed to
into several empirical modes, and then AR coefficients are classify human emotion [5, 6, 7]. Due to the high resolution
calculated based on the selected empirical modes. Furthermore, and low cost, EEG has extensive applications in identifying
these features constitute a feature vector and are then input into emotion [4, 8]. Lots of researches have suggested that EEG can
a classifier to perform emotion recognition. This paper afford beneficial feedback of human behavior, which helps to
implemented multiple experiments to verify the performance of classify emotions [9, 10]. Therefore, many investigators used
our proposed approach on DEAP dataset. The best recognition different approaches to feature extraction of EEG and then
rate of our approach achieves 67.3% for arousal dimension and identified emotion categories. These approaches mainly include
64.3% for valence dimension. The obtained results show that our wavelet transform [11, 12], autoregressive (AR) model [8, 13],
proposed approach is superior to some exist methods. sample entropy [14], and empirical mode decomposition
(EMD) [15, 16], and so on. The EMD approach based on
Keywords-Empirical wavelet transform; autoregressive model;
Hilbert-Huang transformation can be used for processing
EEG; emotion recognition; support vector classifier
nonlinear and non-stationary signals. EMD decomposes the
complex signals into intrinsic mode functions (IMFs), and each
I. INTRODUCTION IMF component contains local characteristic signal at different
Emotional activities of human beings act a significant and time scales of the original signal. However, the EMD approach
prominent part in their social interaction. It is convenient for lacks the support of some mathematical theory [17]. On the
people to communicate with each other. Therefore, the core basis of EMD, Gilles presented an empirical wavelet transform
problem of constructing human-machine interaction system is (EWT) approach to build adaptive wavelets [17]. Since EWT is
to understand emotion and further identify emotion, which has established in wavelet frame, it is superior to the traditional
attracted wide attention of researchers in many areas, especially EMD method.
in psychology, biomedical engineering, machine learning and Support vector machine (SVM) [14] is usually selected as
cognitive science [1]. the classifier to recognize emotion states. In addition, other
The primary task of emotion recognition is to depict the classification methods are also used in emotion recognition [12,
different affective states of humanity. Researchers have used 18, 19, 20], such as k-nearest neighbor (KNN), Naïve Bayes
two types of representation models to measure emotional classifier, linear discriminant analysis method, and so on.
states. One is discrete model and the other is dimensional We first employ the EWT method to decompose the raw
model. Discrete model assumes that the emotional states are EEG signal, and calculate its AR coefficients from modes
composed of limited and discrete emotions, usually containing obtained by EWT to form a feature vector. Finally we use these
the six basic emotional states introduced by Ekman [2], such as features to recognize emotions. Extensive experiments on
anger, disgust, fear, happiness, sadness, surprise, and also DEAP emotion data show that the proposed approach can
containing mixed emotions such as motivational, self- achieve a stable recognition performance of emotional states.
awareness [3]. The dimensional model claims that the
emotional states can be represented by two emotion The rest is organized as follows. In the next section, we
dimensions: valence and arousal [4]. The values of valence and illustrate the proposed method, including data acquisition,
arousal range from low (negative) to high (positive). A pair of feature extraction and classification approach used. Section III
(valence, arousal) can represent an emotional state in the two- gives the validated classification results. Section IV concludes
dimensional space. Currently lots of research works on DEAP this paper.

978-1-5386-1107-4/17/$31.00 ©2017 IEEE 1444


II. PROPOSED APPROACH to use all the data to identify emotions, which can lead to a
decline in classification performance. Therefore, we can choose
We first give the proposed emotion recognition method
a section of data to replace all the data according to a following
which employs the EWT technique to feature extraction from
rule. In our work, we only use 9 seconds data to reduce the
the raw EEG. The proposed method consists of three major
amount of data.
tasks: data acquisition, feature extraction using EWT method
and AR model, and emotion classification based on SVM For 63 seconds EEG signals, we first remove the first 3
classifier. The framework of our proposed approach is seconds of baseline signals, and then employ an overlapping
described in Fig. 1. sliding window technique with the window size of 9 seconds
and the overlap of 6 seconds, to divide the remaining 60
seconds signal into 17 time segments. We calculate the relative
energy of β band for each time segment. The higher the
relative energy, the more excited the brain is at this time
segment, and the better the evaluation performance is at this
time segment.
For each time segment, we use fast Fourier transform (FFT)
to extract features of β ,α ,θ , δ bands and calculate energy of
each band. The energy of a finite length discrete signal is
computed as follow:

E = ¦n=1| xn |2 .
N
(1)

In Equation (1), xn represents the amplitude of the time


domain discrete signal point. N represents the amount of
discrete points. According to Equation (1), we can calculate the
absolute energy E β , Eα , Eθ , Eδ of β ,α ,θ , δ bands for
Figure 1. Framework of our proposed approach.
each time segment, then the relative energy of β band can be
A. Data acquisition expressed as
In this paper, we use a DEAP dataset which is provided by
Koelstra et al. [5]. DEAP dataset contains information about 32
subjects. For each subject, a total of 32 EEG signals and 8

Eb = . (2)
peripheral physiological signals are collected. Each subject E β + Eα + Eθ + Eδ
watched 40 preselected music videos with a length of 60
seconds.
Finally, the relative energy of β band for each time
When a subject watches a music video, he or she quantifies
segment is calculated as shown in Table I. The highest relative
emotional responses from five different dimensions, including
energy appears in the 19s-27s time segment, so we choose 19-
valence, arousal, dominance, liking and familiarity. The values
27 seconds to carry out our experiments.
in these dimensions are in the range from 1 to 9. In this paper
we mainly concern an emotion representation model consisting
of arousal and valence dimensions. TABLE I. THE RELATIVE ENERGY OF BETA BAND FOR EACH TIME
SEGMENT
DEAP dataset contains 32-channel EEG signals, and the
sampling frequency is 128 Hz. The 63s signal is recorded for relative energy relative energy
segment segment
of beta band of beta band
each trial, in which the first 3s signal is a pre-trial baseline
removed data and the last 60s signal is trial data. In addition, 1s-9s 0.2623 28s-36s 0.3644
the DEAP dataset includes a total of 1280 data samples for 32 4s-12s 0.3148 31s-39s 0.3695
participants and each containing 40 trials. 7s-15s 0.3463 34s-42s 0.3608
Existing research indicates that emotion activity of human 10s-18s 0.3616 37s-45s 0.3537
beings is mainly concentrated in the frontal, temporal and 13s-21s 0.3649 40s-48s 0.3601
central areas, such as F3, F4, C3, C4, T3, T4 channels [21]. 16s-24s 0.3727 43s-51s 0.3646
Feature extraction of EEG signals requires a large
computational cost. So we only select two channels F3 and F4 19s-27s 0.3766 46s-54s 0.3675
to conduct our studies in this paper. 22s-30s 0.3709 49s-47s 0.3581

Because each music video lasts for 60 seconds, there are 25s-33s 0.3673 52s-60s 0.3613
7680 (60*128) data for each channel. In fact, it is not necessary

1445
B. Feature extraction Step 3. Defining the W fε (n, t ) by using the traditional
1) Empirical wavelet transform wavelet transform method. The detail coefficients are defined
Due to its good performance in signal separation, EMD has as follow:
received extensive attention in signal analysis research over the
past ten years. EMD essentially decomposes a signal into a
series of IMF components. Nevertheless, the EMD method W fε (n, t ) =< f (t ),ψ n (t ) >= ³ f (τ )ψ n (τ − t )dτ , (5)
lacks the support of mathematical theory [17]. In order to
overcome this limitation, Gilles [17] presented an empirical
wavelet transform on the basis of both EMD and wavelet The approximation coefficients can be obtained from the
transform. following equation.
In this paper, the EWT method is used for the original EEG
signal decomposition. For a given original signal f(t), the EWT
method is described in the following steps [17]:
W fε (0, t ) =< f (t ), φ1 (t ) >= ³ f (τ )φ1 (τ − t )dτ . (6)
Step 1. Partitioning the Fourier spectrum in [0, ʌ] of the
original signal into N contiguous segments Λ n = [ωn−1 , ωn ] ,
The empirical mode f k (t ) is given by
where ωn is the limits between different segments and variable
ω0 ωN π, f 0 (t ) = W fε (0, t ) ∗ φ1 (t ) , (7)
n changes from 1 to N. The value of and are 0 and
respectively.
Step 2. Defining the empirical wavelets as bandpass filters f k (t ) = W fε ( k , t ) ∗ψ k (t ) . (8)
on each Λ n . According to the construction of Littlewood-
Paley and Meyer’s wavelets, the empirical scaling function 2) AR model
 
φn (ω ) and the empirical wavelets ψ n (ω ) are defined as We then calculate AR coefficients of each empirical mode
obtained from EWT through AR method. The AR model is a
follows, respectively.
statistical method of processing time series, which uses the first
t-1 values of a variable y, namely from y1 to yt-1, to predict the
current value yt. The AR model are widely used in the
­1ˈif | ω |≤ ω n − τ n prediction of economics, informatics and natural phenomena.
° π 1 The AR model can be represented in the following equation.
 °cos[ β ( (| ω | −ω n + τ n ))], (3)
φn (ω ) = ® 2 2τ n
° if ωn − τ n ≤| ω |≤ ω n + τ n p
°
¯0ˈ otherwise y(t ) = ¦ φi y(t − i ) + ε t , (9)
i =1

where p is the order of the model. It shows that y(t) is the sum
­1, if ωn + τ n ≤| ω |≤ ωn+1 − τ n+1 of the linear combination of the first p terms and the error term.
° π 1 φi are the model parameters. ε t is white noise with a mean
°cos[ β ( (| ω | −ω n+1 + τ n+1 ))],
value of 0 and a variance of ı.
° 2 2τ n+1
 ° if ωn+1 − τ n+1 ≤| ω |≤ ωn+1 + τ n+1 The prediction accuracy of the AR model depends on the
ψ n (ω ) = ® π 1 (4) choice of the order p and the accuracy of the model parameters.
°sin[ β ( (| ω | −ωn + τ n ))], It is obvious that the calculation of an AR model is mainly to
° 2 2τ n solve the corresponding p coefficients after determining the
° if ωn − τ n ≤| ω |≤ ωn + τ n order p. Generally the AR model can be solved through many
° approaches [22]. In this paper, we use Burg method to calculate
¯0, otherwise AR coefficients.
In this paper, we choose 9s EEG signals of two channels for
In equation (3) and equation (4), τ n = γωn . The parameter
 EWT algorithm, so there are 2304 data for each produced mode

γ ensures thatφn (ω ) and ψ n (ω ) constitute a tight frame in the case of 128 Hz sampling frequency. An empirical mode
f k (t ) is split into N segments according to the size of sliding
ω − ωn
and γ < min ( n+1 ) . β ( x) in equation (3) and window. Suppose that each segment produces p coefficients
n ωn+1 + ωn through AR model, then 4*p*N features in total are generated
equation (4) is an arbitracy function. and forms a feature vector for a trial.

1446
C. Support vector machine This paper mainly concerns a two-level emotional states
SVM developed by Vapnik [23] has attracted the attention model in an valence and arousal space, in which valence is
of many researchers due to its good classification performance. divided into high valence (HA) and low valence (LV), while
As a supervised machine learning method, SVM discriminates arousal is divided into high arousal (HA) and low arousal (LA).
different classes using linear of nonlinear hyperplanes. Values below 5 are considered to be low, while others are high.
The sizes of two emotional states for valence dimension are
l
Let there be l data points {( xi , yi )}i =1 for a binary 724 and 566, while the sizes of two emotional states for arousal
dimension are 754 and 526, respectively.
classification problem, where xi is the ith n-dimensional
training tuples, and yi ∈ {+1,−1} is the class label Therefore, we construct 2 classification problems to
evaluate the recognition performance, each only considering a
corresponding to xi, for i = 1, …, l. To seek for an optimal binary-class case. Table II listed the categories and their
hyperplane in the nonlinear case, all training tuples are corresponding sizes.
projected to a high dimensional space with a projection ϕ . A
possible separating hyperplane is described as [23]
TABLE II. 1. DESCRIPTION OF DIFFERENT CLASSIFICATION TASKS

Sizes of Sizes of
w ⋅ ϕ ( x) + b = 0 . (10) Classes
samples
Classes
samples
HV 724 HA 754
The above classification problem can be formalized to an
optimization problem: LV 566 LA 526

To build the emotion recognition model, we used the raw


l
1 EEG signals of the selected two channels (F3 and F4) in 19s-
w + C ¦ ȟi ,
2
min (11) 27s time segment in the experiments. These signals are
w ,b ,ξ 2 i =1
decomposed into several empirical modes f k (t ) by EWT
subject to technique and features are extracted from empirical modes
using AR model. SVM is employed as a classification method
of emotional states.
yi ( w ⋅ ϕ ( xi ) + b) + ξ i − 1 ≥ 0 We choose Gaussian as the kernel function of SVM in this
, (12)
ȟ i ≥ 0, i = 1, ", l paper, so two parameters C and γ need to be optimized by the
grid search approach. LIBSVM [24] is employed to verify the
where parameter C represents the degree of punishment for proposed method.
classification error, ξ i > 0 is the slack variables. Our proposed method selects the first 4 empirical modes
The corresponding decision function is represented as f k (t ) (k=1, …, 4) to calculate AR coefficients. For the sake of
accurately evaluating the classification performance, the order
p of AR model ranges from 2 to 11.
l
f ( x) = sign(¦ α i yi K ( x, xi ) + b) (13) In our experiments, each empirical mode is divided into
i =1 several segments according to the size of sliding window. Then
AR coefficients of each segment are calculated separately by
AR model. For the sake of evaluating the effects of different
where αi is Lagrange multiplier and K ( xi , x j ) represents the size of sliding window on the classification performance, we
kernel function. K ( xi , x j ) is a symmetric function and give the division results of segments in different cases, as listed
in Table III.
satisfies the Mercer condition.
We have used Gaussian kernel as the kernel function of TABLE III. DIFFERENT DIVISIONS OF SEGMENTS
SVM in this paper. Its mathematical expression is given as
follow. The size of sliding Number of The number of samples
window (s) Segments for each segment

K ( xi , x j ) = exp(−γ || xi − x j || 2 ) (14) 9 8 1152

3 24 384

III. EXPERIMENTAL RESULTS As shown in Table III, we evaluated our proposed method
The main task of this section is to validate and test our in the case of s=9s and s=3s, where s is the size of sliding
proposed emotion recognition method. Generally we evaluated window. For s=9s, the total 18s data will be divided into 8
the recognition rate of arousal and valence dimensions. segments for 4 empirical modes (18/9*4=8). Corresponding the

1447
number of samples is 1152 for the sampling rate of 128Hz accuracy was 67.93% in the case of s=3 while that of s=9 only
(128*9=1152). For s=3s, there are 24 segments and each was 61.21%.
segment contains 384 samples.
In summary, as can be shown in Fig. 2 and Fig. 3, there is a
For each segment, the AR coefficients computed can be clear trend on the evolution of the classification performance.
represented as features. All the features for all segments of Overall, the classification accuracy of s=3 outperforms that of
each trial form one input feature vector which is then fed into s=9 for both valence-based and arousal-based emotion
SVM. The experiment was repeated ten times and its average recognition. The experimental results indicate that the selection
accuracy (%) is calculated, as shown in Fig. 2 and Fig. 3. The of p has a great influence on the recognition accuracy. The
symbol s and p in Fig. 2 and Fig. 3 represents the size of sliding bigger p value will not improve the recognition performance,
window and the order of AR model, respectively. especially in the case of s=3.
Table IV listed the accuracy comparison of several methods
on DEAP dataset. The recognition accuracy of our proposed
approach is the averages of p=2 and p=3 in the case of s=3 in
Table IV, which is 64.3% for valence-based dimension
(HV/LV) and 67.3% for arousal-based dimension (HA/LA).
Koelstra et al. [5] reported 57.6% and 62.0% accuracies for
valence-based and arousal-based emotion recognition,
respectively. The accuracies reported by Kumar et al. [25] are
62.5% for HV, 59.4% for LV, 68.8% for HA, and 60.9% for
LA. The average accuracies for valence and arousal are 60.95%
and 64.85%, respectively.

TABLE IV. ACCURACY COMPARISON OF SEVERAL METHODS

Figure 2. Experimental results on valence dimension. Methods Emotion states Accuracy(%)


Koelstra et al. [5] HV/LV 57.6
HA/LA 62.0
Kumar et al. [25] HV 62.5
LV 59.4
HA 68.8
LA 60.9
Our method HV/LV 64.3
HA/LA 67.3

From Table IV we know that the recognition accuracy of


our proposed approach is superior to that of the comparison
methods. It also confirms the effectiveness of our proposed
approach using EWT method and AR model.

IV. CONCLUSIONS
Figure 3. Experimental results on arousal dimension. Recently the popularity of wearable devices has led to a lot
of researchers to pay attentions to the emotion recognition
For the valence-based emotion recognition, as shown in Fig. based on EEG signals. This paper employs EWT and AR
2, the best recognition accuracy obtained was 66.06% in the model to propose an emotion recognition approach. The
case of s=3 and p=2. The accuracies of s=9 and s=3 have been proposed approach contains three key issues, namely, data
a downward trend as the order p of AR increases, while the acquisition, feature extraction using EWT and AR model, and
classification performance has a slightly vibration in the case of emotion classification based on SVM classifier. Extensive
s=9 and p is larger than 7. Further observation shows that the experiments are performed on a multimodal emotion dataset
recognition accuracy in the case of s=9 outperformed that of DEAP. The obtained results shows the proposed approach has
s=3 when parameter p ranges from 2 to 7. However, the a stable performance of emotion recognition, which are
recognition accuracy in the case of s=3 is slightly superior to superior to several comparison methods.
that of s=9 while parameter p is no less than 7.
Fig. 3 presented the recognition performance for the ACKNOWLEDGMENTS
arousal-based dimension. The proposed method achieved more
This work was partly supported by National Natural
than 60% of recognition accuracy for all the order p in the case
Science Foundation of China (No. 61373127 and No.
of s=3, which obtained better performance than the case s=9.
61772252).
Similar to that of the valence-based dimension, the recognition
performance also showed a downward trend in the case of s=3,
even though it fluctuated slightly at p=4. The best recognition REFERENCES

1448
[1] S. M. Alarcao and M. J. Fonseca, “Emotions recognition using EEG forward feature selection of electroencephalogram signals,” Journal of
signals: A Survey,” IEEE Transactions on Affective Computing, 2017, Medical Signals & Sensors, vol. 4, no. 3, pp. 194–201, 2014.
doi: 10.1109/TAFFC.2017.2714671, in press. [14] X. Jie, C. Rui, and L. Li, “Emotion recognition based on the sample
[2] P. Ekman, “An argument for basic emotions,” Cognition and Emotion, entropy of EEG,” Bio-Medical Materials and Engineering, vol. 24, pp.
vol. 6, no. 3, pp. 169–200, 1992. 1185–1192, 2014.
[3] R. Calvo and S. D’Mello, “Affect detection: an interdisciplinary review [15] N. E. Huang, Z. Shen, S. R. Long, M. L. Wu, H. H. Shih, and Q. Zheng,
of models, methods, and their applications,” IEEE Transactions on “The empirical mode decomposition and Hilbert spectrum for nonlinear
Affective Computing, vol. 1, no. 1, pp. 18–37, 2010. and non-stationary time series analysis,” in Proceedings of the Royal
[4] L. Yisi, O. Sourina, and N. Minh, “Real-time EEG-based human Society of London. Series A: Mathematical, Physical and Engineering
emotion recognition and visualization,” in Proceedings of International Sciences, vol. 454, no. 1971, pp. 903–995, 1998.
Conference on Cyber Worlds (CW), 2010, pp. 262–269. [16] V. Bajaj and R. B. Pachori , “Classification of seizure and nonseizure
[5] S. Koelstra, C. Muhl, M. Soleymani, J.-S. Lee, A. Yazdani, T. Ebrahimi, EEG signals using empirical mode decomposition,” IEEE Transactions
T. Pun, A. Nijholt, and I. Patras, “DEAP: A database for emotion on Information Technology in Biomedicine, vol. 16, no. 6, pp. 1135–
analysis; using physiological signals,” IEEE Transactions on Affective 1142, 2012.
Computing, vol. 3, no. 1, pp. 18–31, 2012. [17] J. Gilles, “Empirical wavelet transform,” IEEE Transactions on Signal
[6] K. B. Meehan, C. D. Panfilis, N. M. Cain, et al., “Facial emotion Processing, vol. 61, no. 16, pp. 3999–4010, 2013.
recognition and borderline personality pathology,” Psychiatry Research, [18] H. J. Yoon and S. Y. Chung, “EEG-based emotion estimation using
vol. 255, pp. 347–354, 2017. Bayesian weighted-log-posterior function and perceptron convergence
[7] C. K. Yogesh, M. Hariharan, R. Ngadiran, et al., “A new hybrid PSO algorithm,” Computers in Biology and Medicine, vol. 43, no. 12, pp.
assisted biogeography-based optimization for emotion and stress 2230–2237, 2013.
recognition from speech signal,” Expert Systems With Applications, vol. [19] E. Kroupi, J. M. Vesin, and T. Ebrahimi, “Subject-independent odor
69, pp. 149–158, 2017. pleasantness classification using brain and peripheral signals,” IEEE
[8] T. D. Pham, D. Tran, W. Ma, and N. T. Tran, “Enhancing performance Transactions on Affective Computing, vol. 7, no. 4, pp. 422–434, 2016.
of EEG-based emotion recognition systems using feature smoothing,” in [20] Z. Guendil, Z. Lachiri, and C. Maaoui, “Computational framework for
ICONIP 2015, Part IV, LNCS 9492, 2015, pp. 95–102. emotional VAD prediction using regularized extreme learning machine,”
[9] P. Petrantonakis and L. Hadjileontiadis, “A novel emotion elicitation International Journal of Multimedia Information Retrieval, vol. 6, pp.
index using frontal brain asymmetry for enhanced EEG-based emotion 251–261, 2017.
recognition,” IEEE Transactions on Information Technology in [21] B. Güntekin and E. Baúar, “Event-related beta oscillations are affected
Biomedicine, vol. 15, pp. 737–746, 2011. by emotional eliciting stimuli,” Neuroscience Letters, vol. 483, no. 3, pp.
[10] Y. P. Lin, C. H. Wang, T. P. Jung, T. L. Wu, S. K. Jeng, J. R. Duann, 173–178, 2010.
and J.H. Chen, “EEG-based emotion recognition in music listening,” [22] M. B. Priestley, Spectral Analysis and Time Series. Academic Press,
IEEE Transactions on Biomedical Engineering, vol. 57, no. 7, pp. 1798– London, 1994.
1806, 2010. [23] V. Vapnik, The Nature of Statistical Learning Theory. Springer, New
[11] M. Murugappan, N. Ramachandran, and Y. Sazali, “Classification of York, 1995.
human emotion from EEG using discrete wavelet transform,” Journal of [24] C. C. Chang and C. J. Lin, “LIBSVM: a library for support vector
Biomedical Science and Engineering, vol. 3, no. 4, pp. 390–396, 2010. machines,” ACM Transactions on Intelligent Systems and Technology,
[12] Z. Mohammadi, J. Frounchi, and M. Amiri, “Wavelet-based emotion vol. 2, no. 3, pp. 1–27, 2011.
recognition system using EEG signal,” Neural Computing and [25] N. Kumar, K. Khaund, and S. M. Hazarika, “Bispectral analysis of EEG
Applications, vol. 28, no. 8, pp. 1985–1990, 2017. for emotion recognition,” Procedia Computer Science, vol. 84, pp. 31–
[13] S. Hatamikia, K. Maghooli, and A. M. Nasrabadi, “The emotion 35, 2016.
recognition system based on autoregressive model and sequential

1449

You might also like