A Three-Stage Decision Framework For Multi-Subject Emotion Recognition Using Physiological Signals

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

A Three-Stage Decision Framework for Multi-


Subject Emotion Recognition Using Physiological
Signals
Jing Chen, Bin Hu*, Yue Wang, Yongqiang Dai, Yuan Yao, and Shengjie Zhao
*Corresponding author
The School of Information Science and Engineering
Lanzhou University
Lanzhou, China
{chenj12, bh, wangyue15, daiyq14, yaoy2015, zhsj15}@lzu.edu.cn

Abstract—This paper investigates the potential of physi- method can be viewed as a subject-dependent approach and
ological signals as reliable channels for multi-subject emotion may only be feasible for a limited number of subjects. If there
recognition. A three-stage decision framework is proposed for were one thousand subjects, there would be one thousand
recognizing four emotions of multiple subjects. The decision subject-specific recognition models, which will result in
framework consists of three stages: (1) in the initial stage, burdensome calculation.
identifying a subject group that a test subject can be mapped to;
(2) in the second stage, identifying an emotion pool that an In this paper, we propose a novel three-stage decision
instance of the test subject can be assigned to; and (3) in the final framework for multi-subject emotion recognition. The basic
stage, generating the predicted emotion from the given emotion idea is to transform traditional subject-independent recogni-
pool for the test instance. In comparison with a series of tion tasks into group-dependent recognition tasks.
alternative methods, the high accuracy of 70.04% achieved by
our proposed method clearly demonstrates the potential of the II. DATA
three-stage decision method in multi-subject emotion recognition.
A. Experimental Data
Keywords—affective computing; emotion recognition; multi- The Database for Emotion Analysis Using Physiological
modal physiological signals; subject-independent Signals (DEAP) 1 database is being used in this study. The
database contains both electroencephalogram (EEG) and
I. INTRODUCTION peripheral physiological signals of 32 subjects (c=32). EEG
Among emotion recognition studies, one particular issue was recorded from 32 active electrodes (32 channels).
has provoked extensive discussion which is ‘individual Peripheral physiological signals (8 channels) include galvanic
differences’. If physiological patterns of different subjects skin response (GSR), skin temperature (TMP), blood volume
display large variances for a specific emotion, classifiers could pulse (BVP), respiration (RSP), electromyogram (EMG)
not make explicit decisions in the multi-subject context. collected from zygomaticus major and trapezius muscles, and
Therefore, subjects were limited to one single person in many horizontal and vertical electrooculograms (EOGs).
studies [1-3]. But the specific models and recognition results Physiological signals were recorded while 40 (carefully
derived from a single subject may fail to match the results selected) one-minute music clips were played in a random
derived from other subjects’. Accordingly, research on order to each subject, thus 40 trials per subject were generated.
emotion recognition has focused on subject-independent Of these 40 videos included in the DEAP dataset, 17 videos
approaches (indicating one recognition model constructed by were selected from Last.fm2 affective tags which were added
multiple subjects) [3-5]. It has been demonstrated that subject- by web users, while 23 videos were selected manually without
independent approaches generally exhibit inferior performance affective tags. The DEAP dataset also contains self-reports of
as compared with subject-dependent approaches (indicating five dimensions of emotion (valence, arousal, dominance,
one recognition model constructed for a single subject) due to liking, and familiarity) for 32 subjects. The first four scales
the impact of ‘individual differences’. Hence it has been range from 1 to 9; and the fifth dimension has values ranging
suggested by a number of investigations [5-7] that the between 1 and 5. Among these dimensions, two dimensions
recognition rate could be improved by transforming subject- representing various facets of emotion are: valence, ranging
independent cases to be analogous to subject-dependent cases. from negative (or unpleasant) to positive (or pleasant);
Kim and André [5], however, did not experimentally elaborate arousal, ranging from calm (or bored) to active (or excited).
on this issue but merely offered this as a suggestion. In the
We focus on these two basic dimensions: valence and
studies of Yuan et al. [6] and Gu et al. [7], a test subject is
arousal. In the DEAP dataset, the arousal and valence of the
assigned to a subject class in the first stage, and then the data
of the test subject are passed to a subject-specific recognition
model in the second stage for recognizing his emotions. Their 1
http://www.eecs.qmul.ac.uk/mmv/datasets/deap/
2
http://www.last.fm

978-1-5090-1610-5/16/$31.00 ©2016 IEEE 470


TABLE I. THE FEATURES EXTRACTED FROM EEG AND PERIPHERAL
PHYSIOLOGICAL SIGNALS

Name Description
Absolute Spectral Power of two frequency bands (alpha
Eα, Eβ
and beta) respectively
Eβ/Eθ Energy Ratio of Beta to Theta bands
Three Hjorth parameters (complexity, activity, and
Comp, Acti, Mobi
mobility)
C0 C0 Complexity
Var Variance
Fig. 1. Valence-arousal space. EQ1: valence rating>5 and arousal rating>5, SpEn Spectral Entropy
EQ2: valence rating>5 and arousal rating≤5, EQ3: valence rating≤5 and arousal
rating≤5, EQ4: valence rating≤5 and arousal rating>5.
the last one is the least relevant to separate two classes. For
40 videos were self-rated. By calculating the agreement on the feature selection, we employed a filter method rather than any
affective contents of videos, we found that some videos wrapper method to avoid the strong adaptability of a classifier
showed a low agreement among 32 subjects. If a video has no to the training data, generally termed over-fitting.
affective tags and low agreement on its affective content, the
emotion elicited by the video will be hard to label in the multi- III. METHODOLOGY
subject context. Therefore, the 17 videos with affective tags The basic idea of the proposed decision framework is to
are used in this study and the average values of 32 subjects’ transform a general subject-independent recognition task into a
arousal and valence self-ratings for each video are used as the group-dependent recognition task by classifying a test instance
ground truth. Specifically, this study aims to classify four into a particular group model prior to the emotion classification
emotions (EQ1, EQ2, EQ3, and EQ4) corresponding to four procedure. The methods described in the following subsections
quadrants (v=4) shown in Fig. 1 where the discrete emotional assemble an overall decision framework (shown in Fig. 2).
states of the 17 trials can be mapped. EOG signals are often During the learning phase, separated group models are built in
regarded as artifacts in EEG and used as noise references for the first stage by categorizing multiple subjects. In the second
selecting raw EEG segments; therefore, we do not use EOG stage, emotion pools are generated, each including a set of
and exploit the remainder of the signals (38 channels). emotions. During the testing phase, a test subject is initially
classified to a group that the test subject shows the greatest
B. Data Preprocessing and Feature Extraction
similarity with. A further decision procedure is applied to
Nine feature types are applied in this study. Table I assign an instance of the test subject to a particular emotion
provides a complete overview of extracted features. Total pool. In the third stage, an emotion will be the decision for the
number of extracted features from physiological channels is instance of the test subject.
342 (= 38 channels × 9 feature types). Each trial lasts one
minute. The features used in this study were extracted with 4- A test instance is classified to a group model Mn in the first
second sliding and 2-second overlapped time windows. stage or to an emotion pool Poq in the second step with the
Accordingly, 29 samples (t=29) are generated from each one- highest probability. All three stages are detailed in the
minute trial and the number of samples per subject is 493 (=29 following subsections.
samples per trial × 17 trials per subject). In order to obtain an
emotion decision for an instance (or a trial with t samples), t A. The First Stage
samples will be fed to a classifier simultaneously. During the learning phase, the objective of the first stage is
Each feature was normalized to the range [0, 1]. These to build group models M1, M2,…, Mm from subjects S1, S2,…,
normalized features are the input of the whole decision Sc (m << c). Each group model is trained by physiological data
approach. In each stage of the proposed approach, Fisher from a group of subjects, and subjects among different groups
Criterion Score (FCS) [8] was used to select significant can be regarded as having large individual differences.
features. FCS attempts to find a feature subset where samples Since not all of extracted features are relevant to the
from the same class are assembled, whereas samples from subject partitioning, festep1 features (festep1: the number of
different classes are separated to the maximum level: features used in the first step) are selected to cluster c subjects
(m1,l − m2,l )2 to m groups. Features used in this first stage should be selected
Fl = (1) in the context of both the subject partitioning and the emotion
σ1,2l + σ 22,l
classification. Here, we divide the dataset used in this training
where the mean and standard deviation of samples belonging to phase into two parts: a training set and a validation set. The
two classes Class1 and Class2 are denoted by m1,l, m2,l, σ1,l and feature set with size of festep1 is determined as follows:
σ2,l respectively for the l-th feature (l=1,2, … ,342), and Fl
denotes the separation degree of the l-th feature for the two a) We put together the training data which are originally
classes. Thus, a feature list sorted by all Fl values in descending labeled with an emotion s (s=EQ1, EQ2,…, EQv; v is the
order was obtained. The first feature is the most relevant, while number of emotions) and replace emotion labels with subject

471
Fig. 2. Diagram of the three-stage decision framework for multi-subject
emotion recognition. Fig. 3. The process of feature selection in stage one. ‘Sk’ (k=1,2,…,c) is
a subject ID.
IDs. Fig. 3 graphically gives an illustration of combining decision process is performed as below:
every subject’s feature matrix labeled with EQ1. We label the a) Calculating the probability Pj that X is assigned to the j-
generated feature matrix of the emotion EQ1 with subject IDs th group model Mj (j=1…m) using the festep1 features and k-NN
(‘S1’,’S2’,…,’Sc’; c represents the number of subjects used in algorithm;
the learning phase) (Fig. 3(b)). When the generated feature
matrix and the target vector of subject IDs are simultaneously b) Identifying the n-th group model Mn that generates the
fed into the FCS feature selection method, significant features maximum probability for X;
which can classify subjects can be found, and a ranked feature c) Assigning X to the corresponding group model Mn.
list is determined for each new feature matrix derived from an
emotion s. The first feature in a list derived from an emotion s Given that the sum of probabilities that one sample is
is viewed as the most significant feature for the subject assigned to every group model is equal to one, the sum of
partitioning under the condition of emotion s. probabilities for the test instance X assigned to every group
model is equal to t. Thus, the sum of probabilities for a test
b) Totally, v ranked feature lists are obtained respectively instance assigned to a group model ranges in [0, t].
for v emotions.
B. The Second Stage
c) We intersect the first fe1 features from each of the v
ranked feature lists. The variable fe1 can be changed within a This stage aims to transform the multi-emotion recognition
scope. The intersection, when fe1 is set to a certain value, will task to the recognition of emotion types. More than one
be fed into k-means clustering algorithm to cluster c subjects emotion pool is built for each group model. Given a group
to m groups. As the samples of one subject may not be model, emotions in an emotion pool are totally different from
mapped to one cluster completely, this subject will be the emotions in other emotion pools.
assigned to a group which most of the subject’s samples
During the learning process, FCS is initially used on the
belong to. The number of group models (m) is identical to that
training set to generate a ranked feature list. Then, the first fe2
of subject groups. When we talk about a ‘subject’, we say the
features and the class labels (emotion pools) are fed to C4.5 to
subject will be classified to a ‘subject group’; when we talk
build a classification model. In this stage, fe2 is also varied in a
about an ‘instance’, we say the instance will be classified to a
scope. When fe2 is set to different values, there are many
‘group model’. Then, we use both the m group models as m
classification models. Each classification model is applied on
class labels and the feature set made up of the intersection of
the validation set. When the classification performance
the first fe1 features as the input of k-NN to build a
reaches the best, the optimal value of fe2 will be identified. At
classification model. When fe1 is set to different values, there
this time, the optimal fe2 is represented by festep2.
are many classification models. In order to determine the
optimal parameter of fe1, the classification model is applied on During the testing phase, we assign the test instance X to
the validation set. The optimal value of fe1 is determined when an emotion pool, and in the third stage an explicit emotion will
the correct classification rate reaches the highest. At this time, be identified for X. The test instance X is assigned to an
the intersection has festep1 features (festep1 ≤ fe1). emotion pool Poq as the following steps:
During the testing phase, a test instance X (containing t a) Calculating the probability Pi that X is assigned to the i-
samples) will be assigned to a particular group model. The th emotion pool Poi (i=1…p) using the selected festep2 features

472
and C4.5 decision tree algorithm; TABLE II. THE CONFUSION MATRIX OF CLASSIFICATION AND THE CRR
BY USING TS_14. (ROW: GROUND TRUTH, COLUMN: PREDICTED LABELS)
b) Identifying the q-th emotion pool Poq that generates the
TS_14
maximum probability for X;
Po1: EQ1 and EQ4, Po2: EQ2 and EQ3 Total
c) Assigning X to the corresponding emotion pool Poq. EQ1 EQ2 EQ3 EQ4 instances
EQ1 129 9 21 19 178
C. The Third Stage
EQ2 18 96 17 14 145
In this stage, a particular emotion is finally identified for
the test instance X after the instance has been assigned to the EQ3 10 16 88 6 120
group model Mn and the emotion pool Poq. The classification EQ4 18 9 6 68 101
process is as follows: CRR 70.04%

a) During the learning phase, identifying festep3 features Agreement 75.00%


used in the third stage. The selection process is identical to
that described in the second stage except for the classifier and investigated a number of alternative strategies widely used for
the class labels. Random Forest (RF) is used as the classifier recognizing multiple emotions: a) Using a classifier to
perform four-emotion recognition task. We tested four
and the emotions in the emotion pool Poq as the class labels.
classifiers including k-NN, C4.5, RF, and SVM. b) One-
b) Calculating the sample number of X classified to each of against-Rest scheme. With each of the four emotions as the
the emotions inside the emotion pool Poq using RF algorithm; single class, we trained C 41 = 4 classifiers totally. c) One-
c) Assigning X to the emotion to which the majority of the against-One scheme. Based on different combination of four
samples in X are classified. emotions, C42 = 6 classifiers were trained from training data.

IV. RESULTS In Fig. 4, the first plot (a) describes the direct four-emotion
classification, and the other plots describe the binary
In this study, we used a leave-one-subject-out cross- classification strategies. We exploited a leave-one-subject-out
validation, where a single subject taken from the whole dataset cross-validation on all these strategies. The optimal classifier
is used as the test subject while the remaining subjects are parameters in the above strategies were automatically searched.
used for training. This process is repeated so that each subject In the One-against-Rest and One-against-One schemes
is used as the test once. As for the dataset used in the training respectively, a final decision was obtained by combining the
process, we divide the dataset into a training set and a outputs of every classifier. Two typical decision fusion
validation set. Since the validation set is used in all three approaches, majority voting and sum rule [9-10], were used
stages, it should include 31 subjects and at least an instance respectively to generate the final decision. Majority voting
per emotion for each of these subjects. approach sums up the decisions of all classifiers, and sum rule
During the learning phase of each run of the cross- approach sums up the support values for each class generated
validation, the classifier parameters, such as K in k-NN and I by classifiers. The four stated classifiers were tested in both
in RF, and the number of features used in each stage was One-against-Rest and One-against-One schemes as well.
automatically searched and configured. When a classification The recognition results of the three strategies as well as
model built by the training set was tested on the validation set our proposed approach are shown in Table III. When
and achieved the best classification performance, the optimal performing multiclass classification, SVM outperforms k-NN
classifier parameters and the feature size were identified. with the recognition accuracies of 51.10% and 43.93%
When a classifier cannot accept literal input, subject IDs or respectively. C4.5 and RF have the similar recognition
emotion labels will be represented as numbers. performance. However, k-NN outperforms other classifiers
In the stage one, we determined to build two group models when used in both One-against-Rest and One-against-One
(m=2) during the learning phase based on the result schemes. We list the best recognition results which are derived
comparison of different clustering. We assigned each training by using k-NN in these two schemes. The highest CRRs are
subject to a group to which the majority of the subject’s 44.30% and 50.69% respectively with the decision fusion
samples were clustered. In the stage two, we put EQ1 and EQ4 approach of majority voting. For the One-against-Rest
to an emotion pool Po1 and EQ2 and EQ3 to the second pool scheme, each mixed class involves three emotions plus
Po2 (this method named TS_14). Other strategies like individual differences, meaning that it broadly contains
gathering EQ1 and EQ2 to an emotion pool and EQ3 and EQ4 physiological data which may cover the data from the single
to the other emotion pool will be discussed in Section V. class. This may be a reason why a low CRR of 44.30% is
achieved by the One-against-Rest scheme. As for the One-
Table II presents the confusion matrix of classification by against-One scheme, for example, a classifier C is trained by
using the three-stage decision framework. Specifically, TS_14 both EQ2 and EQ3 instances, and the ground truth of a test
is applied in the decision framework to partition two emotion instance is EQ1; when the classifier C is applied to classify the
pools. The average correct recognition rate (CRR) of 70.04% test instance, the decision will be completely wrong no matter
is achieved to classify 4 emotions of 32 subjects. which class the instance will be classified to. This kind of
V. DISCUSSION incorrect decisions leads to the low CRR of 50.69% by the
One-against-One scheme.
In addition to the three-stage decision framework, we

473
Multimodal physiological data dealing with a certain problem.

Classifier
VI. CONCLUSIONS
In this paper, we proposed a three-stage decision
EQ1 EQ2 EQ3 EQ4
framework for multi-subject emotion recognition based on
physiological signals. To deal with the poor recognition
(a) multiclass classification outcomes caused by ‘individual differences’, we initially
Multimodal physiological data
assign a test instance to a particular group model and then
perform emotion classification in a group-dependent manner.
The CRR of 70.04% was reported for recognizing four
Classifier1 …… Classifier4 emotions of 32 subjects. Based on a series of comparison and
discussion, the proposed three-stage decision framework holds
EQ1 EQ2,EQ3,EQ4 … EQ4 EQ1,EQ2,EQ3 promise for the robust multi-subject emotion recognition and
for the future affective HCI applications.
Final decision ACKNOWLEDGMENT
(b) One-against-Rest classification This work is supported by the National Basic Research
Program of China (2014CB744600), the International
Multimodal physiological data Cooperation Project of Ministry of Science and Technology
(2013DFAlI140) and the National Natural Science Foundation
Classifier1
……
Classifier6 of China (61210010). The authors acknowledge European
Community's Seventh Framework Program (FP7/2007-2011)
EQ1 EQ2 …… EQ3 EQ4 for their DEAP database.
REFERENCES
Final decision
[1] A. Haag, S. Goronzy, P. Schaich, and J. Williams, "Emotion recognition
(c) One-against-One classification using bio-sensors: First steps towards an automatic system," in Affective
Dialogue Systems, vol.3068. Kloster Irsee, Germany: Springer-Verlag,
Fig. 4. Three typical ways of handling multiclass classification problems. Jun. 2004, pp. 36-48.
[2] R. W. Picard, E. Vyzas, and J. Healey, "Toward machine emotional
The proposal of the three-stage decision framework aims intelligence: Analysis of affective physiological state," IEEE Trans.
to improve emotion recognition performance by partly dealing Pattern Anal. Mach. Intell., vol. 23, pp. 1175-1191, Oct. 2001.
with the problems encountered in the above methods. [3] Y. Gu, S. L. Tan, K.-J. Wong, M.-H. R. Ho, and L. Qu, "Emotion-aware
Specifically, to deal with the misclassification between EQ1 technologies for consumer electronics," in Proc. IEEE Int. Symp.
and EQ4 and between EQ2 and EQ3, we decompose the Consumer Electronics, Vilamoura, ISCE. Apr. 2008, pp. 1-4.
multiclass classification problem to binary classification to [4] J. Wagner, J. Kim, and E. André, "From physiological signals to
emotions: Implementing and comparing selected methods for feature
eliminate the suboptimal performance occurred in the extraction and classification," in Proc. IEEE Int. Conf. Multimedia and
multiclass problem. Moreover, we transform the subject- Expo, Amsterdam, ICME. Jul. 2005, pp. 940-943.
independent emotion recognition task to the group-dependent [5] J. Kim and E. André, "Emotion recognition based on physiological
task, that is, a test subject will be assigned to a group before changes in music listening," IEEE Trans. Pattern Anal. Mach. Intell., vol.
decomposing and recognizing his four emotions. 30, pp. 2067-2083, Dec. 2008.
[6] G. Yuan, T. S. Lim, W. K. Juan, H. M.-H. Ringo, and Q. Li, "A GMM
Comparing results shown in Tables II and III, we find that based 2-stage architecture for multi-subject emotion recognition using
TS_14 (using the three-stage decision framework) physiological responses," in Proc. 1st Augmented Human Int. Conf. (AH
outperforms other methods, and the best classification 2010), Megève, France, 2010.
performance in each stage can be achieved by using only one [7] Y. Gu, S.-L. Tan, K.-J. Wong, M.-H. R. Ho, and L. Qu, "A biometric
classifier. This may be owing to the stage-divided strategy signature based system for improved emotion recognition using
physiological responses from multiple subjects," in Proc. 8th IEEE Int.
used in this three-stage decision framework with each stage Conf. Industrial Informatics, Osaka, INDIN. 2010, pp. 61-66.
[8] R. O. Duds and P. E. Hart, "Pattern classification and scene analysis," A
TABLE III. THE RECOGNITION PERFORMANCE OF COMPARATIVE Wiley lnterscience Publication, John Wiley and Sons, Inc, 1973.
METHODS [9] J. Wagner, F. Lingenfelser, E. André, and J. Kim, "Exploring fusion
methods for multimodal emotion recognition with missing data," IEEE
Method The best CRR Trans. Affect. Comput., vol. 2, pp. 206-218, Oct. 2011.
k-NN 43.93% [10] S. Koelstra and I. Patras, "Fusion of facial expressions and EEG for
implicit affective tagging," Image Vision Comput., vol. 31, pp. 164-174,
SVM 51.10% Feb. 2013.
a)
C4.5 48.71%
RF 48.16%
b) One-against-Rest (Using k-NN classifier) 44.30%
c) One-against-One (Using k-NN classifier) 50.69%
Three-stage decision framework 70.04%

474

You might also like