Download as pdf or txt
Download as pdf or txt
You are on page 1of 102

EEG-based Emotion Recognition

Diplomarbeit am Institut für Algorithmen und Kognitive Systeme


Cognitive Systems Laboratory
Prof. Dr. Tanja Schultz
Fakultät für Informatik
Universität Karlsruhe (TH)

von
cand. inform.
Kristina Schaaff
Matrikelnr.: 1323079

Betreuer:
Prof. Dr. Tanja Schultz
Dipl.-Math. Michael Wand

Tag der Anmeldung: 3. März 2008


Tag der Abgabe: 3. September 2008

Institut für Algorithmen und Kognitive Systeme, Cognitive Systems Laboratory


Ich erkläre hiermit, dass ich die vorliegende Arbeit selbständig verfasst und keine
anderen als die angegebenen Quellen und Hilfsmittel verwendet habe.
Karlsruhe, den 3. September 2008
v

Abstract

In the area of human-computer interaction information about the emotional state of


a user becomes more and more important. For instance, this information could be
used to make communication with computers more human-like or to make computer
learning environments more effective.
This thesis proposes an emotion recognition system from electroencephalographic
(EEG) signals. Emotional states were induced by pictures from the international
affective picture system. EEG data was recorded either from 16 electrodes with a
standard EEG-cap according to the 10-20-system or by using a headband with cov-
ers only four frontal electrodes but is much more comfortable to wear. For emotion
recognition two types of classifiers - support vector machines (SVMs) and hidden
Markov models (HMMs) were investigated. In experiments where the three emo-
tional states pleasant, neutral, and unpleasant were discriminated, with the HMM
based system mean recognition rates of 46.15 percent were achieved, whereas with
the SVM based system we obtained a mean recognition rate of 62.07 percent.
This study shows that electroencephalographic signals are feasible for emotion re-
cognition and that SVMs seem to be better suited for emotion recognition than a
sequence-based approach with HMMs.
vi
Acknowledgements

First of all, I would like to thank my advisor, Prof. Dr. Tanja Schultz for giving
me the opportunity to work on this interesting field of research and for her constant
guidance and all the valuable discussions about my work. I also want to express my
gratitude to Michael Wand and Matthias Honal, who repeatedly provided helpful
advice about the technical details of this work.
Thanks a lot to all the people who volunteered for the experiments. Without the
data collected from these experiments, this work would not have been possible.
Special thanks to my family and my boyfriend for standing my frustration and
cheering me up when I needed it.
viii
Contents

List of Figures xiii

List of Tables xiv

1 Introduction 1
1.1 Goals of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Structure of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Emotion and Affective Computing 3


2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Relevance of Emotions for Human-Computer Interaction . . . 3
2.1.2 Affective Applications . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2.1 Human-like Communication . . . . . . . . . . . . . . 4
2.1.2.2 Learning . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2.3 Gaming and Entertainment . . . . . . . . . . . . . . 5
2.1.2.4 Providing Help or Feedback . . . . . . . . . . . . . . 6
2.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Affective Computing . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.2 Emotions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.3 Emotions versus Moods . . . . . . . . . . . . . . . . . . . . . 8
2.2.4 Component Definition . . . . . . . . . . . . . . . . . . . . . . 8
2.2.5 Concepts of Emotion . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.5.1 Categorical Models . . . . . . . . . . . . . . . . . . . 9
2.2.5.2 Dimensional Models . . . . . . . . . . . . . . . . . . 10
2.3 Psychophysiology of Emotions . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1 The Human Nervous System . . . . . . . . . . . . . . . . . . . 11
2.3.2 How do Emotions Show? . . . . . . . . . . . . . . . . . . . . . 13
2.3.3 Elicitation of Emotions . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Registering Emotions from Physiological Signals . . . . . . . . . . . . 18
2.4.1 Electrical Brain Activity . . . . . . . . . . . . . . . . . . . . . 18
2.4.1.1 Electrode Placement . . . . . . . . . . . . . . . . . . 19
2.4.1.2 Electrophysical Origin of EEG Signals . . . . . . . . 19
2.4.1.3 Artifacts . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.1.4 Emotions and EEG . . . . . . . . . . . . . . . . . . . 21
2.4.2 Muscle Activity . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.3 Skin conductance . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.4 Skin Temperature . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.5 Heart Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4.6 Respiration Rate . . . . . . . . . . . . . . . . . . . . . . . . . 24
x Contents

2.4.7 Combined measures for affect recognition . . . . . . . . . . . . 24

3 Data Collection 27
3.1 Preceding Considerations . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.1 Selection of Emotions . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.2 Emotion Induction . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.1 Hardware Design . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.2 Software Design . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.2.1 Recording Software . . . . . . . . . . . . . . . . . . . 29
3.2.2.2 Recognition Software . . . . . . . . . . . . . . . . . . 30
3.2.3 Electrode Placement . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.4 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.5 Stimulus Material for Emotion Induction . . . . . . . . . . . . 30
3.2.6 Picture Presentation . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.7 Experimental Procedure . . . . . . . . . . . . . . . . . . . . . 32
3.3 Ethical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5 Summary of Collected Data . . . . . . . . . . . . . . . . . . . . . . . 34

4 Methods 35
4.1 Classification with Support Vector Machines . . . . . . . . . . . . . . 35
4.1.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.1.1 Feature Extraction . . . . . . . . . . . . . . . . . . . 35
4.1.1.2 Obtaining Feature Vectors . . . . . . . . . . . . . . . 36
4.1.1.3 Dimensionality Reduction . . . . . . . . . . . . . . . 37
4.1.2 Training and Classification . . . . . . . . . . . . . . . . . . . . 37
4.2 Sequential Recognition System . . . . . . . . . . . . . . . . . . . . . . 39
4.2.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2.2 Training and Classification . . . . . . . . . . . . . . . . . . . . 40

5 Experiments and Results 41


5.1 IAPS Ratings versus Subject Ratings . . . . . . . . . . . . . . . . . . 41
5.2 SVM-based Emotion Recognition . . . . . . . . . . . . . . . . . . . . 42
5.2.1 Description of the Baseline System . . . . . . . . . . . . . . . 42
5.2.2 Data Corpus Selection . . . . . . . . . . . . . . . . . . . . . . 43
5.2.3 Optimization of Data Preprocessing . . . . . . . . . . . . . . . 44
5.2.3.1 Window Size and Average over Feature Vectors . . . 44
5.2.3.2 Filter Properties . . . . . . . . . . . . . . . . . . . . 45
5.2.3.3 Normalization . . . . . . . . . . . . . . . . . . . . . . 45
5.2.3.4 Dimensionality Reduction . . . . . . . . . . . . . . . 46
5.2.4 Optimization of Training and Classification . . . . . . . . . . . 48
5.2.5 Analysis of Results . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3 Comparison to Sequence Modeling with Hidden Markov Models . . . 50
5.3.1 Description of the Baseline System . . . . . . . . . . . . . . . 50
5.3.2 Optimization of Data Preprocessing . . . . . . . . . . . . . . . 51
5.3.2.1 Bandpass Filter Properties . . . . . . . . . . . . . . . 51
5.3.2.2 Dimensionality Reduction . . . . . . . . . . . . . . . 51
5.3.3 Optimization of Training and Classification . . . . . . . . . . . 51
Contents xi

5.3.4 Analysis of Results . . . . . . . . . . . . . . . . . . . . . . . . 54


5.4 Electrode Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.4.1 Using Frontal Electrodes Only . . . . . . . . . . . . . . . . . . 54
5.4.2 Using Midline Electrodes Only . . . . . . . . . . . . . . . . . . 56
5.4.3 Comparison of Different Electrode Positions . . . . . . . . . . 56
5.5 Analysis of Temporal Progression . . . . . . . . . . . . . . . . . . . . 58

6 Conclusion and Future Work 61


6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

A Prototype Model 65

B IAPS-picturesets 67

C Experimental Instructions 71

D Data 73
D.1 Data for section 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
D.2 Data for section 5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
D.3 Data for section 5.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
D.4 Data for section 5.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Bibliography 81
List of Figures

2.1 The Yerkes-Dodson principle (Stickel et al., 2007) . . . . . . . . . . . 5


2.2 Circumplex model of affect (Russel, 1980) . . . . . . . . . . . . . . . 11
2.3 The major components of the limbic system. (Carlson, 2007) . . . . . 12
2.4 Parasympathetic and sympathetic nervous system (Carlson, 2007) . . 13
2.5 Electrode positions in the 10-20-system . . . . . . . . . . . . . . . . . 19
2.6 Lobes of the cortex (Andreassi, 2000) . . . . . . . . . . . . . . . . . . 19
2.7 Main components of a single neuron (Carlson, 2007) . . . . . . . . . . 20
2.8 Excitatory and inhibitory activity at the synapse. (Andreassi, 2000) . 21
2.9 EMG electrode placements for surface differential recording over ma-
jor facial mimetic muscles. (Andreassi, 2000) . . . . . . . . . . . . . . 22

3.1 Pictures of the IAPS in a 2-dimensional space based on mean pleasure


and arousal ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Left side shows screen as it is visible to the supervisor, right side
shows the subject’s screen where pictures for emotion elicitation are
presented . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Process of picture presentation . . . . . . . . . . . . . . . . . . . . . . 32

4.1 Comparison between classification with linear and RBF-kernel. The


left figure shows the original classification problem, while the figure in
the middle shows classification results achieved with a linear kernel.
Classification on the left figure was done with a RBF-kernel (param-
eter C was set to 1000 for both kernels). . . . . . . . . . . . . . . . . 38

5.1 Mean recognition rates subject to window size and average size . . . . 44
5.2 Mean recognition rate subject to frequency band (whiskers indicate
standard deviation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.3 Mean recognition rate subject to average size (whiskers indicate stan-
dard deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4 Mean recognition rate subject to dimensionality (whiskers indicate
standard deviation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
xiv List of Figures

5.5 Mean recognition rates for different frequency bands (whiskers indi-
cate standard deviation) . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.6 Recognition rate depending on number of dimensions after LDA . . . 52
5.7 Recognition rate subject to number of HMM states (whiskers indicate
standard deviation). Bars show the relative portion of subjects whose
recognition rate was best at this number of HMM states. . . . . . . . 52
5.8 Recognition rate depending on number of dimensions after feature
reduction using the frontal electrodes. (whiskers indicate standard
deviation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.9 Recognition rate depending on number of dimensions after feature
reduction using only midline electrodes. (whiskers indicate standard
deviation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.10 Recognition rate for each time segment (whiskers indicate standard
deviation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

A.1 Results of a hierarchical cluster analysis of 135 emotion names (A =


joy, B = cheerfulness, C and D = Sadness). The scale at the left
indicates the cluster strength, asterisks indicate empirically selected
subcluster names. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
List of Tables

2.1 Four categories of affective computing, focusing on expression and


recognition (Picard, 1995) . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Comparison between moods and emotions . . . . . . . . . . . . . . . 8
2.3 Elements of the response package and their functions (Levenson, 1999) 14

TM
3.1 Technical specifications of Varioport EEG amplifier . . . . . . . . . 29
3.2 Overview about the subjects . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Characteristics of IAPS pictures used for emotion induction . . . . . 32

5.1 Cumulated subject ratings of IAPS pictures. Columns show subject


ratings (S), rows contain ratings according to the IAPS (I). . . . . . . 42
5.2 Confusion matrix of IR corpus (absolute values) . . . . . . . . . . . . 43
5.3 Confusion matrix of SR corpus (absolute values) . . . . . . . . . . . . 43
5.4 Normalized confusion matrix for SR corpus (relative values) . . . . . 43
5.5 Influence of normalization mode on recognition accuracy . . . . . . . 46
5.6 Mean recognition rate with different averaging conditions . . . . . . 47
5.7 Optimal values of C and γ for RBF and linear kernel . . . . . . . . . 49
5.8 Summary of consecutive improvements due to parameter optimization 50
5.9 Best combination of number of HMM-states and number of GMMs
for each subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.10 Optimal values for C and γ for frontal electrodes for RBF and linear
kernel. Recordings for subjects with subject IDs 42 - 46 were done
with the headband. . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.11 Optimal values for C and γ for midline electrodes for RBF and linear
kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.12 Summary of recognition rates subject to electrode selection . . . . . . 57

B.1 Pleasant pictures for all subjects . . . . . . . . . . . . . . . . . . . . . 67


B.2 Pleasant pictures for female subjects . . . . . . . . . . . . . . . . . . 68
xvi List of Tables

B.3 Pleasant pictures for male subjects . . . . . . . . . . . . . . . . . . . 68


B.4 Neutral pictures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
B.5 Unpleasant pictures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

D.1 Mean recognition rates for variation of window size and averaging
over adjacent feature vectors . . . . . . . . . . . . . . . . . . . . . . . 74
D.2 Mean recognition rates for variation of frequency band . . . . . . . . 75
D.3 Mean recognition rates for variation of average size . . . . . . . . . . 75
D.4 Mean recognition rate subject to number of dimensions after correlation-
based feature reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 76
D.5 Mean recognition rates for variation of frequency band . . . . . . . . 77
D.6 Mean recognition rate subject to number of dimensions after LDA . . 77
D.7 Mean recognition rate subject to number of HMM-states . . . . . . . 78
D.8 Optimal number of HMM states . . . . . . . . . . . . . . . . . . . . . 78
D.9 Mean recognition rate depending on number of dimensions after fea-
ture reduction for frontal electrodes . . . . . . . . . . . . . . . . . . . 79
D.10 Mean recognition rate depending on number of dimensions after fea-
ture reduction for midline electrodes . . . . . . . . . . . . . . . . . . 79
D.11 Mean recognition rate for each time segment . . . . . . . . . . . . . . 80
1. Introduction

”Look Dave, I can see you’re really upset about this. I honestly think you
ought to sit down calmly, take a stress pill, and think things over.”

When Stanley Kubrick made his film ’2001 - a space odyssey’ in 1968, he tried to
draw a picture of the year 2001 which is as close to reality as possible. In his film
one of the protagonists is the board computer HAL. Besides many other abilities
this computer is able to recognize emotions.
Contrary to Kubrick’s expectations, the research in developing computers, which
are able to recognize emotions, is still at the beginning. The reason why there is
so little activity in emotion research can be seen in the problems with defining and
measuring emotions. It was not until the last two decades that scientists started
to realize the importance of emotion in human-computer interaction. Since then
research in affective computing - which means computing, that ”relates to, arises
from, or deliberately influences emotions” (Picard and Healey, 1997) - has become
more and more important.
Many people have problems with the logical and rational way in which computers
react. As the importance of computers in everyday life is constantly increasing, it is
necessary to improve human-computer interaction. One crucial step is to develop a
more natural way of communication. Therefore, computers have to learn to recognize
and react to human emotions.
Emotions are expressed through posture and facial expression as well as through
internal processes, such as heart rate and blood pressure or brain activity. Moreover,
emotions can also be expressed in speech, e.g. by raising the voice. There are a lot of
different ways to recognize emotions. Typical communication channels that indicate
emotions are voice and facial expressions. One problem in using face and speech
recognition is that an emotion can only be detected when a person is speaking or
looking in the direction of the camera. Moreover, speech recognition returns only
a string of spoken words which often does not say anything about the emotional
state of the speaker. In addition, face expression and speech can be intentionally
influenced to ’fake’ an emotion. For this reason it can be helpful to use physiological
2 1. Introduction

data like brain waves, skin conductance, heart rate, etc. Using physiological signals
for emotion recognition provides a lot of advantages:

• Physiological signals are constantly emitted

• As sensors are attached directly to the body, a person cannot move out of
reach from a camera or a microphone that is placed in a room

• Biosignals are controlled by the central nervous system and therefore cannot
be influenced intentionally

In addition to that, physiological data could also be used complementary to emo-


tional data collected from voice or facial expressions to improve recognition rates.

1.1 Goals of this Thesis


Although emotions have often been discarded in studies of human-computer in-
teraction, they becomes more and more important. Therefore, this thesis tries to
investigate a method to facilitate emotion recognition from physiological signals.
Subject of investigation is the electrical brain activity that occurs when a person is
in a certain emotional state. This activity is measured by electroencephalography
(EEG). As it is impossible to influence electrical brain activity voluntary, and more-
over, the brain can be seen as the origin of emotion, we evaluate these signals for our
investigation. Another major advantage of EEG signals is that - compared to other
biosignals, such as heart rate or skin conductance - sensors are applied directly at the
head. Contrary, sensors that are used to measure e.g. blood volume pressure, skin
conductance, or heart rate are generally applied at body areas, which are usually
covered with clothes. Moreover, a person wearing sensors for EEG measurement is
not delimited in use of hands.

1.2 Structure of this Thesis


This thesis is organized as follows: chapter 2 gives an introduction to the theoretical
background of emotion theories and the area of what is called affective computing.
Additionally, we present the influence of emotions on physiological signals and give
an introduction on how emotions can be elicited. Beyond that, different physiolog-
ical methods for registration of emotions from physiological signals are illustrated.
Next, the process of data collection is specified in chapter 3. Starting from preceding
considerations, we describe the experimental design and the problems which accom-
pany this design. Chapter 4 gives an overview about the methods used to analyze
the raw EEG signals which were collected according to the experimental procedure
in chapter 3. Experiments and results of data analysis are depicted in chapter 5.
Finally, we summarize the results of this study and give an outlook on future work
in chapter 6.
2. Emotion and Affective
Computing

2.1 Motivation
In the following section we try to demonstrate the importance of emotions for
human-computer interaction. Section 2.1.1 first outlines the advantages of emo-
tional human-computer interaction. Next, section 2.1.2 provides some examples
for affective applications (i.e. applications that recognize and respond to a user’s
emotion).

2.1.1 Relevance of Emotions for Human-Computer Interac-


tion
In 1995 Reeves and Nass proposed the so called media equation, which says that
human-computer interaction follows the same principles as human-human interac-
tion. This is also supported by the CASA (Computers Are Social Actors) paradigm
(Nass et al., 1994) which states that human interaction with computers is funda-
mentally social.
For instance, a study conducted by Nass et al. (1996) showed that people feeling to
build a team with the computer will think better of a computer and will cooperate
and agree more with the computer than people who feel not to build a team. In
addition, the same rules of politeness apply for human-computer interaction and
human-human interaction as people tend to give more positive feedback about a
computer when asked by the computer itself than when being asked by a different
computer (Reeves and Nass, 1995). Moreover, Nass and Moon (2000) found that
gender differences also apply to communication with computers. For example, a
computer with a male voice was considered to be more competent than a computer
with a female voice, although the spoken text was the same.
Based on the findings that human-computer interaction is comparable to the inter-
action between humans, a computer that is able to recognize and respond to a user’s
emotion should be able to improve human-computer interaction significantly. Dryer
4 2. Emotion and Affective Computing

and Horowitz (1997) showed that people interacting with partners that are similar
or complementary to themselves perform better when working in the same team.
Therefore, a computer that adapts to a user’s emotional needs should also help to
increase the user’s performance.
As an example, in a study about the effectiveness of an agent that responds to
user frustration, Klein et al. (2002) found that computers are able to reduce strong
negative emotions - even if they are the source of these emotions - by responding in
an understanding way to the user’s emotions. For this purpose, computers do not
even need to be personified characters or use advanced interaction techniques such
as speech. In addition, they found that applications that simply let people vent
their anger, do not help to recover from negative emotional states.
Axelrod (2004) showed that there are also significant differences in human behavior
when people interact with an affective system compared to a normal computer, as
people tend to act more emotional when they believe to be interacting with an
affective system.

2.1.2 Affective Applications


There are many applications for computers that are able to recognize our emotions.
Before designing an application that involves an affective computer, there are three
important issues that have to be clarified (Picard, 1995):

1. ”What is the relevant set of emotions for this application?

2. How can these best be recognized / expressed / modeled?

3. What is an intelligent strategy for responding to or using them?”

It strongly depends on the domain of application how these questions are answered
in a specific case.
In the following we illustrate various applications for affective computers as suggested
by Picard (1997).

2.1.2.1 Human-like Communication


Nowadays, an increasing number of people spend more time interacting with a com-
puter than with other humans. In addition, more and more people communicate
with each other through computers (Picard and Healey, 1997). The problem when
communicating through or with computers is that computers usually are not able
to show or to submit any emotional reactions. Making this communication more
human-like could improve collaboration when computers are involved.
One of the most affect-limited communication form is the communication via email.
The only way to express emotions in an email are emoticons like :-) or ;-) which can
easily be misinterpreted and lead to misunderstandings. To expand the emotional
bandwidth of an email, tools that are able to recognize and express emotions could
be used. For instance, typing rhythm and pressure applied to the keyboard could be
measured to identify the emotional state of the sender and communicate it to the
recipient.
2.1. Motivation 5

Another field for making communication more human-like is text-to-speech-synthesis


which is usually done with monotone flat intonation. People who have to use a
synthesized voice could use their own emotions to be added to speech by trying to
recognize emotional signals from the speaker. Beyond that, Badzinski (1991) found
that a story that is read with an emotional voice does not only increase interest
and enjoyment of a story but also improves speed of understanding for children.
Moreover, emotional text-to-speech synthesis could enhance written text experience
for blind people.
Computers usually do not ask before they interrupt somebody with a notification.
This is another opportunity, where communication could become more emotional. If
a person is in a state of high cognitive load or emotional pressure, a notification by
the computer can be quite disturbing. Therefore, it would be a great improvement
if the computer could derive from the user’s emotional state if it is a good moment
to interrupt a person or not.
Finally, computers could be enabled to make small talk. For example a friendly
spoken ’good morning’ could substitute the traditional ’login name’. This might
have a positive influence on the user’s attitude towards the computer.
2.1.2.2 Learning
In autodidactic learning environments learning can be improved significantly if com-
puters are able to recognize the user’s emotion and respond to them. According to
the Yerkes-Dodson principle (Yerkes and Dodson, 1908) arousal can be helpful for
cognitive performance until a certain level of arousal is reached. After reaching
this level performance decreases. Accordingly, the best learning performance can
be achieved if a person is kept on an optimal level of arousal. Furthermore, the
optimum level of arousal is inversely related to the difficulty of a task.

Figure 2.1: The Yerkes-Dodson principle (Stickel et al., 2007)

For instance, frustration (and therefore a higher arousal) can be avoided by making
a quiz easier if the computer notices that the user is overstrained. Similarly, if the
computer notices a decrease in arousal from the optimum, a quiz could be made
more challenging.
2.1.2.3 Gaming and Entertainment
Gaming and entertainment offer a wide range of applications for computers which
are able to deal with emotions.
6 2. Emotion and Affective Computing

Healey et al. (1998) proposed the concept of an affective DJ. For this reason, they
designed a wearable computer, that gets to know a person’s preferences by recog-
nizing and responding to its emotional signals. This information is used to help the
person with the music selection according to the present mood.
In most computer games success is only based on the actions that are taken not
on how they are performed. Physiological signals could be used to adapt the game
to the current emotional state of a user. For instance, a calmer behavior might
be rewarded by introducing a new companion or a brave action while somebody is
highly aroused might lead to some extra points.

2.1.2.4 Providing Help or Feedback


There are various applications where a computer could provide help to a user if it
is aware of the user’s emotions. One important application is to react to a user’s
preferences. To learn a user’s preferences, the computer could collect information
about what a person likes and dislikes from its emotional responses. Moreover,
contents could be adapted to the actual emotional situation. For instance, if a
person is stressed, only the most important icons are shown on the desktop.
Another possible application to develop an agent that could help people to see how
they appear to other people in various situations. This ’affective mirror’ would offer
people the possibility to practice e.g. for an important talk and receive feedback on
their performance.
Besides that, computers could also be used to help autistic people as most autistic
people have problems with emotional understanding, especially of suitable responses
in an emotional situation. The lack of emotional intelligence can be decreased by
helping them to understand and respond to emotional situations. Computers could
help autistic people by giving them feedback as they try to learn skills like empathy.
As an example, this could be done in a gaming environment which requires repetitive
emotional interaction.

2.2 Definitions
In the following section we give an introduction to the concept of affective computing.
Furthermore, we present different approaches of emotion definitions.

2.2.1 Affective Computing


In human-human-interaction emotions play an important role. The ability to rec-
ognize the emotional state of people surrounding us is an important part of natural
communication. The attempt to make this information available to computers and
thereby making human-computer-interaction more natural is the goal of affective
computing. After a short introduction to affective computing, this section gives an
overview about the most commonly used affective measures. Finally, some applica-
tions of affective computers are described.
The concept of affective computing was mainly influenced by Picard, who defines
affective computing as ’computing that relates to, arises from, or deliberately influ-
ences emotions’ (Picard, 1995).
2.2. Definitions 7

Computer Express affect


can No Yes
Perceive No (I) (II)
affect Yes (III) (IV)

Table 2.1: Four categories of affective computing, focusing on expression and recog-
nition (Picard, 1995)

Picard (1995) differs between four relevant cases of affective computing which are
summarized in Table 2.1.
Most computers fall in category (I) having no affect perception and expression at all.
Out of the three affective categories, category (II) is probably the one with the most
advanced technology: computers that have voices with natural intonation or faces
with natural expressions fall into this category. Category (III) enables a computer to
perceive a person’s affective state and to adjust its response to this information. The
last category provides truly ’personal’ and ’user friendly’ computing by maximizing
the emotional communication between humans and computers. It is important to
note that this does not mean that the computer would be driven by its emotions.
In this thesis we focus on emotion recognition, but do not develop a system that is
able to express emotions. The research done in this thesis will therefore confine to
the third category.

2.2.2 Emotions
”Everyone knows what an emotion is, until asked to give a definition.”
(Russell and Fehr, 1984)

Although emotions are very central in human behavior and communication, there
is still no commonly accepted definition of the term emotion. In fact, there are
many different approaches to find an appropriate definition which are quite often
contradictory. Schmidt-Atzert (1981) names two reasons why it is so hard to find
an exact definition for emotion. First, the term has been applied to many different
phenomena in varying contexts which are only connected by the term ’emotion’.
Secondly, it is hard to distinguish emotional events against non-emotional events as
a certain arousal might be considered as emotional or non-emotional.
In 1981 Kleinginna and Kleinginna analyzed and classified 92 definitions and nine
skeptical statements about emotions and came to the conclusion that there is little
consistency among definitions and many definitions are too vague. They proposed
the following definition trying to emphasize the manifold possible aspects of emotion:

”Emotion is a complex set of interactions among subjective and objective


factors, mediated by neural/hormonal systems, which can
a) give rise to affective experiences such as feelings of arousal, pleasure
/ displeasure;
b) generate cognitive processes such as emotionally relevant perceptual
effects, appraisals, labeling processes;
8 2. Emotion and Affective Computing

c) activate widespread physiological adjustments to the arousing con-


ditions; and
d) lead to behavior that is often, but not always, expressive, goaldirec-
ted, and adaptive.” (Kleinginna and Kleinginna, 1981).

This quite generic definition tries to merge all significant aspects of emotion, al-
though these aspects are often contradictory. Therefore this definition presents
rather a cross-section of all opinions than an actual version of a working definition.
After a comparison and differentiation of the two terms emotion and mood, this
chapter outlines the most important concepts of emotion starting with the com-
ponent model presented by Schmidt-Atzert (1981). Subsequently, two groups of
emotion definitions are compared: categorical models (also known as discrete mod-
els) of emotions and dimensional models. The first group defines emotions as a set of
discrete categories while the latter tries to treat emotions as continuous dimensions.

2.2.3 Emotions versus Moods


Although the terms mood and emotion are often used interchangeably, most scien-
tists agree that in fact moods and emotions are related but also distinct phenomena.
In a meta analysis of 65 published articles about the distinction between mood and
emotion, Beedie et al. (2005) identified eight criteria to distinguish emotion from
mood which are summarized in table 2.2.

Criterion Mood Emotion


Intentionality
Not about anything in About something
particular
Cause Can occur without apparent Caused by a specific event or
cause object
Consequences Largely influences cognitive Largely prepares the organism
processes for activity (’fight or flight’)
Control Controllable Not controllable
Duration Enduring Short
Function Effect on cognition Effect on action
Intensity Less intense More intense
Physiology No distinct physiological Distinct physiological pattern-
patterning ing

Table 2.2: Comparison between moods and emotions

This comparison indicates that although there are no quantitative boundaries be-
tween emotions and moods (e.g. there is no exact duration defined which discrim-
inates a mood from an emotion), it can easily be distinguished between those two
affective phenomena. The rest of this thesis will solely deal with emotions.

2.2.4 Component Definition


According to the component definition of emotion, emotion is regarded as a complex
entity which consists of different components. The number and the names of the
components differ between scientists. According to Schmidt-Atzert (1981); Izard
(1994) there are three essential components:
2.2. Definitions 9

1. the subjectively felt emotion, which refers to states that are named as emotions
by a person itself,

2. the emotional physiological reactions in the brain and the nervous system that
can be attributed to emotional stimuli, and

3. emotional behavior describes behavior that occurs - according to external ob-


servers - as a reaction to an emotional stimuli.

The subjectively felt emotion can be identified by self descriptions while emotional
physiological reactions can be measured by biophysiological signals such as elec-
troencephalography (EEG), electromyography (EMG) or skin conductance. More-
over, the emotional behavior can be identified e.g. from gesture or face recognition.
The three components influence each other, e.g. laughter can not only be observed
with the EMG of zygomatic major (physiological reaction) but also from the facial
expression (emotional behavior) and self reports (subjective experienced).

2.2.5 Concepts of Emotion


Until today, psychologists are still arguing about a commonly accepted definition of
emotion. One can differ between two groups of emotion concepts: categorical / dis-
crete models which try to distinguish emotions within between well-defined borders
and dimensional models, which try to arrange emotions in a multidimensional space.
As it is possible to arrange discrete emotions in a dimensional space Cowie et al.
(1999), most scientists agree that both models can be seen as complementary. How-
ever, in practical studies usually only one of the concepts is selected. The following
section will outline the most important principles of both approaches.

2.2.5.1 Categorical Models


One way to conceptualize emotion is to organize them as a set of diverse, discrete
emotions.
According to Ekman (1992), there are nine characteristics that distinguish basic
emotions from each other and from other affective phenomena:

1. Distinctive universal signals

2. Presence in other primates

3. Distinctive physiology

4. Distinctive universals in antecedent events

5. Coherence among emotional response systems

6. Quick onset

7. Brief duration

8. Automatic appraisal

9. Unbidden occurrence
10 2. Emotion and Affective Computing

Despite this, there is still little agreement among scientists about how many, which,
and why emotions are basic. Different theorists consider different emotions to be
basic, but they all share the idea that there are emotions which are more basic than
other emotions. For example Mowrer (1960) considers only pleasure and pain to be
basic whereas Frijda (1986) identifies 18 basic emotions. A summary of proposals
of a representative set of emotion theorists who hold (or held) some sort of basic-
emotion position can be found at Ortony and Turner (1990).
Besides the basic emotion approach there is the prototype approach which also agrees
- similar to the basic emotion approach - that there is a set of emotions which are
more basic than others. However, these basic emotions are seen as prototypes from
which other emotions can be derived. This leads to a treelike hierarchical structure
of emotions like the one shown in Appendix A. A detailed description about the
prototype approach be found in Russell and Fehr (1984) and Shaver et al. (1987).
Although the categorical model of emotion is supported by many scientists, there are
some limitations that have to be considered. At first, there is no exact information
about the number of basic emotions and the notations of the model are often am-
biguous. Moreover, the denotations of the different emotions provide a large scope
for ambiguity. Finally the tight categorization of emotions does not represent the
reality.

2.2.5.2 Dimensional Models


Spencer (1890) was one of the first to describe emotions as dimensions in the sphere
of consciousness. In 1896, Wundt extended this theory by naming the three bipolar
dimensions (1) pleasantness-unpleasantness, (2) relaxation-tension and (3) calm-
excitement. In 1952 this model was empirically validated by Osgood using fac-
tor analysis. Based on this research Bradley and Lang (1994) proposed a three-
dimensional model which includes arousal, valence and dominance:

• Valence addresses the quality of an emotion (ranging from unpleasant to pleas-


ant).

• Arousal refers to the quantitative activation level (ranging from calm to ex-
cited).

• Dominance relates to the degree of control a person feels to have over a situ-
ation (ranging from weak to strong).

Out of these, arousal and valence are the most frequently used dimensions. The deci-
sion for only two dimensions can be justified by the results of Russel and Mehrabian
(1977); Russel (1979, 1980) who found that pleasure and arousal accounted for most
of the major proportion of variance in affect scales while dominance accounted only
for a quite small amount. The two-dimensional circumplex model of affect as sug-
gested by Russel (1980) is shown in Figure 2.2. According to this model, emotions
are specified by their position in the two-dimensional space spanned by the two axes
valence (horizontal axis) and arousal (vertical axis).
Compared to discrete emotion models, there is one major advantage of dimensional
models as the emotional experience can be evaluated without the constrictions by
2.3. Psychophysiology of Emotions 11

Figure 2.2: Circumplex model of affect (Russel, 1980)

the boundaries of discrete emotions. Moreover, it is not necessary to name emotional


states as they are already described by their position in the multidimensional space.
However, the absence of names implies that positions in the dimensional space are
less descriptive than discrete models.

2.3 Psychophysiology of Emotions


In this section the psychophysiology of emotions is explained starting with a brief
description of the human nervous system and its function for emotions. Next, the
impact of emotions on physiological signals is illustrated. Finally, different methods
for emotion elicitation are presented.

2.3.1 The Human Nervous System


The entire human nervous system is involved to process emotional information. The
human nervous system consists of two parts: the central nervous system (CNS) and
the peripheral nervous system (PNS).
The CNS receives sensory information from the environment and passes them on to
the limbic system and the cortex. The limbic system (Figure 2.3), which includes the
hypothalamus, the hippocampus and the amygdala, is the origin of emotion and mo-
tivation and also responsible for emotion expression. (Silbernagl and Despopoulos,
2001)
The PNS is divided into two major parts: the sensory-somatic and the autonomic
nervous system. The somatic nervous system controls skeletal muscles and external
sensory organs (e.g. the skin). It is said to be voluntary because the responses
can be controlled consciously.1 The autonomic nervous system is mostly involun-
1
The only exception are reflex reactions of skeletal muscles which are involuntary reactions to
external stimuli.
12 2. Emotion and Affective Computing

Figure 2.3: The major components of the limbic system. (Carlson, 2007)

tary and transmits impulses from the CNS to the periphal organs. It includes the
sympathetic and the parasympathetic nervous systems which control e.g. the heart
rate, dilations and constrictions of blood vessels, the pupils or the air flow in the
lungs. The sympathetic nervous system responds to stress and danger, whereas the
parasympathetic nervous system is concerned with conservation and restauration of
energy. Figure 2.4 gives an overview about the functions of parasympathetic and
sympathetic nervous system. The figure clarifies that both systems work antagonis-
tic depending on the activating stimulus.

There are two opposite systems in the brain which can be seen as the structural
foundation of emotion. The aversive system is responsible for defensive / protective
reactions such as fear or escape. In contrast, the appetitive system responds to
positive stimuli (i.e. rewards) with preservative behavior (e.g. ingestion, copulation
and the nurture of progeny). In the dimensional model illustrated in section 2.2.5.2
valence determines whether the appetitive (positive) or the aversive (negative) sys-
tem is activated. In contrast to this, arousal determines the intensity with which
the system is activated. (Downey et al., 2004; Lang, 1995)

For the analysis of emotions it is important to know whether the physiological re-
actions are different for each emotion. There are two different theories relating
emotional expression and experience. According to the James-Lange theory (James,
1950), the experience of emotion is a response to physiological changes in our body,
in other words that we do not cry because we are sad but that we are sad because
we are crying. As a consequence, every emotion is an interpretation of the preceding
arousal. This would require distinct physiological signals for every emotion. In con-
trast, the Cannon-Bard theory (Cannon, 1927) proposes that emotional experience
can occur independently of emotional expression. In addition, Cannon and Bard
stated, that the same physiological changes can accompany different emotions.
2.3. Psychophysiology of Emotions 13

Figure 2.4: Parasympathetic and sympathetic nervous system (Carlson, 2007)

2.3.2 How do Emotions Show?


Levenson (1999) states that ”evolution has provided us with a small set of emotions
that activate hard-wired packages of response tendencies”. The aim of these ele-
ments is to provide a set of instructions for certain prototypical situations that are
important for our well-being and survival. This includes also automatical physiolog-
ical reactions of the body. The elements of the response package are illustrated in
Table 2.3. As the number of prototype situations is limited only to those situations
that are essential for survival, human emotions in general still remain unpredictable.

To illustrate the function of the elements of response packages, Levenson (1999)


gives an example of the response tendencies of sadness. For instance the perception
gets a lowered threshold for perceiving other losses and the attentional field becomes
narrower. Changes in the gross motor system show by a slumped posture, low muscle
tonus, and downturned gaze. The facial expression gets sad by eyebrows raised in
the middle, downturned lip corners, a softer and lower voice tone, and a slower
speech rate. Finally the heart rate increases as a response to sadness.
In a study, Levenson et al. (1990) tried to investigate whether there are different
patterns of autonomic nervous system activity for different emotions. There were re-
14 2. Emotion and Affective Computing

Element of Response Functions


Package
Perceptual / Adjustments of perceptual thresholds and breadth of
Attentional systems attentional field to maximize attention to challenging
event and minimize attention to distracting, irrelevant
events
Gross motor behavior Postural adjustments, changes in muscle tonus
appropriate for ensuring purposeful behavior
Purposeful behavior Fixed-action patterns, alteration of behavior
hierarchies that aid in coping with the challenging
event
Expressive behavior Facial displays, alterations in voice tone, utterances
that serve to signal intended action and to
communicate to conspecifics
Gating of higher Limiting novelty of response, accessing associated
mental processes memories to maximize probability of accessing
successful, time-tested responses to challenging event
Physiological support Autonomic, central, endocrine and other physiological
adjustments optimal for supporting the organism’s
response to the challenging event

Table 2.3: Elements of the response package and their functions (Levenson, 1999)

liable differences between different autonomous reactions. Anger and fear produced
a larger increase of heart rate than happiness. Moreover, anger produced a larger
finger temperature increase than happiness whereas fear produced a temperature
decrease. While anger, fear, and sadness produced heart rate increases, disgust lead
to a heart rate decrease. Finally, sadness could be distinguished from anger, fear,
or disgust as it produced a larger increase of skin conductance. Levenson (1992)
confirmed these results in a study which showed that heart rate is higher when ex-
periencing anger, fear, or sadness than for disgust. Moreover, fear lead to a lower
diastolic blood pressure, cooler surface temperatures, greater vasconstriction and
less blood flow in the periphery than anger.

Levenson et al. (1990) explain the differences among autonomous reactions by taking
into account the origin of emotions. For instance, an increase of heart rate can
prepare the body for a fight in case of anger or for a flight in case of fear. In
addition, the decrease of finger temperature in case of fear can be explained by the
fact that the blood flow is diverted away from the periphery and redirected towards
the large skeletal muscles. Similarly, the increase of finger temperature in case of
anger can be seen as a result of the blood flow to the hand muscles to support
grasping weapons.

2.3.3 Elicitation of Emotions

When talking about elicitation of emotions one has to differentiate between three
different kinds of elicited emotion which are listed below.
2.3. Psychophysiology of Emotions 15

Spontaneous emotion: this is the most authentic kind of emotion and therefore
seems to be most appropriate for emotion research. Spontaneous emotion
requires that a person does not know or does not care that data is recorded.

Acted emotion: in contrast to spontaneous emotion the subject knows about the
recordings and tries to put itself in the required emotion. The order and
duration of the emotion is specified in advance.

Controlled elicited emotion: this can be seen as a combination of the first two
kinds of emotion. The aim is to elicit as spontaneous emotion as possible under
controlled conditions. The subject is put in a situation that makes it easy to
experience a certain emotion.

Although spontaneous emotion might seem the most natural and therefore most
suitable kind of emotion, one has to take into account one of the major drawbacks
which is that it is nearly impossible to apply the sensors which are necessary to record
the physiological signals without disturbing the subject. Besides that, there is no
possibility to control which emotion occurs at what time. This makes it very hard
to label the data and to get a balanced number of samples for each emotion. Acted
emotions are often used for emotion detection from speech or faces. However, as not
all of the physiological reactions can be influenced voluntarily acted emotion might
be inappropriate for emotion recognition from physiological signals. Moreover, it is
difficult to behave natural under recording conditions. This is why controlled elicited
emotion is often used for experimental studies, although there is no guarantee that
the desired emotion is really elicited.

There are various ways how emotions can be elicited for experimental studies.
Gerrards-Hesse et al. (1994) differ between five categories of emotion induction meth-
ods2 which reflect the resemblance of their underlying functional principles. These
methods can also be combined to increase effectiveness:

1. Free mental generation of emotional states

2. Guided mental generation of emotional states

3. Presentation of emotion-inducing material

4. Presentation of need-related emotional situations

5. Generation of emotionally relevant physiological states

The methods are explained in the following.


2
In their study Gerrards-Hesse et al. use the term mood induction procedure. This can be due
to the fact that in German mood and emotion are often used interchangeable. As the effects of the
induction procedures are of a limited duration and have a certain cause, which is - according to the
definition in section 2.2.3 - an indicator for an emotion, the mood induction procedures can also
be used for emotion induction and are therefore in the following refered to as emotion induction
procedures.
16 2. Emotion and Affective Computing

Free Mental Generation of Emotional States


The main characteristic of this group of methods is that stimuli are activated men-
tally by the subject themselves and not presented by the experimenter. There are
two different methods that belong to this group:

• Hypnosis: subjects are instructed to remember and imagine emotional situa-


tions while they are hypnotized.
• Imagination: it is assumed that it is possible to induce emotions by imag-
ining emotion-related events. Therefore subjects are asked to imagine / re-
experience an emotional situation.
Guided Mental Generation of Emotional States
If guided mental generation of emotional states is used, subjects are presented
emotion-inducing material with additional instructions to get into a certain emo-
tional state. Methods belonging to this category are:

• Velten: self-referent statement technique (developed by Velten (1968)). A


number of statements describing positive or negative self-evaluations or so-
matic states are used for auto-suggestion of the subject.
• Film / story: a film / story is presented to the subjects with a short introduc-
tion to the situation.
• Music: subjects listen to music charged with emotion after they are told to
get in the emotional state expressed by the music.
Presentation of Emotion-Inducing Material
In contrast to the methods presented in 2.3.3, emotional stimuli are presented with-
out explicit instructions to get into the emotional state. The following methods can
be used:

• Film / story: a film or story is presented to the subjects without any additional
information.
• Music: a piece of music is presented without information about its emotional
character.
• Gift: subjects are offered an unexpected gift. This is based on the assumption
that an unexpected gift usually leads to elation.
Presentation of Need-Related Emotional Situations
Procedures using presentation of need-related emotional situations provoke emotions
by satisfaction or frustration of needs. Methods belonging to this group are:

• Success / failure: subjects are given (false-)positive or (false-)negative feed-


back about their performance in a test, e.g. about cognitive abilities, percep-
tual motor skills, intelligence, or analytic abilities.
• Social interaction: subjects are exposed to certain social interactions which
are arranged by the experimenter. Usually they interact with a person who is
trained to act in a certain emotional way.
2.3. Psychophysiology of Emotions 17

Generation of Emotionally Relevant Physiological States


Gernerating emotionally relevant physiological states means a systematic variation
of physiological states. This can be achieved with the following methods:

• Drugs: drugs like epinephrine or a placebo introduced to subjects as a mood-


inducing drug are used to induce physiological arousal.

• Facial expression: subjects are instructed to contract or relax different muscles


to produce certain facial expressions.

In a meta-analysis Westermann et al. (1996) found that out of these methods, Velten-
technique is the most widely used for inducing positive or negative emotions followed
by the film / story method without instructions and the induction per imagination.
They also found that film / story with or without instruction is the most effective
method to induce positive and negative emotional states.
18 2. Emotion and Affective Computing

2.4 Registering Emotions from Physiological Sig-


nals
To develop an affective human-computer interaction, the computer has to be able
to get information about the affective state of a user. There are two different ways
to make the computer aware of human emotions. The first is to ask the user to
explicitly report his or her current state to the computer (e.g. over the keyboard).
Secondly the computer can be given the ability to recognize the affective state itself
by interpreting emotion related changes in the human behavior or by the human’s
physiological signals. There are different kinds of physiological signals that can be
used for emotion recognition.
The following section provides an overview about physiological signals that are com-
monly used to enable the computer to recognize human emotions. Next, we give an
overview about examples where combined physiological measures were already used
for emotion recognition. Unless otherwise noted, the descriptions of the physiological
signals are taken from Schandry (1989).

2.4.1 Electrical Brain Activity


To monitor electrical brain activity, electroencephalography is used. The electroen-
cephalogram (EEG) provides information about the electrical activity of the human
brain which results from post synaptic potentials of the cortical neurons. Compared
to other methods for monitoring brain activity such as functional Magnetic Reso-
nance Imaging (fMRI) or Proton Emission Tomography (PET), EEG has a high
temporal resolution. However, as the origin of the signal can not be localized pre-
cisely, the spacial resolution of the EEG is worse than the spacial resolution of other
methods.
There are two different types of EEG signals:

1. The spontaneous EEG measures the continuous activity which has amplitudes
between 1 and 200 µV. According to international conventions the frequencies
that can be observed are subdivided into different frequency bands: δ (0.5 - 4
Hz), θ (5 - 7 Hz), α (8 - 13 Hz), β (14 - 30 Hz) and γ (≥ 30 Hz).

2. Evoked potentials occur as a response while stimuli (e.g. a visual flash of light
or an auditory click) are presented. The duration of this event is usually less
than 500 ms.

To measure EEG, electrodes are placed on the surface of the skull. Each electrode
measures the electric potential of the surrounding head area. The signal can be
conducted unipolar or bipolar. When using the unipolar method a so-called active
electrode is placed on the area of interest and a reference electrode is placed on a
relatively inactive area like the earlobe. In contrast, using the bipolar recording
technique requires two active electrodes placed over the cortical areas of interest.
Either the difference or the sum of the electric potentials between the two electrodes
is recorded. One major advantage of bipolar recordings can be seen in the fact
that no inactive reference area has to be selected (this can be quite problematic
as there’s no place which is completely inactive). However, the main disadvantage
2.4. Registering Emotions from Physiological Signals 19

of bipolar recordings is that a combined activity of two locations is produced what


makes it useless for clinical and research purposes where data of symmetrically placed
electrodes or activity of special brain areas has to be compared. (Andreassi, 2000)

2.4.1.1 Electrode Placement


Although there are no limitations at which position of the skull electrodes are placed,
in most cases the international 10-20 system (Jasper, 1958) is used. According to
this system, electrode positions are described depending on geometric proportions of
the skull, which is divided into sections of 10 and 20 percent. Important anatomical
reference points are the nasion (onset of the nose on the skull) and the inion (a
projection of bone at the back of the head found over the occipital area). The exact
electrode positions are shown in Figure 2.5.

Figure 2.5: Electrode positions in the 10-20-system

The name of the electrode positions refers to the region of the cortex (Figure 2.6)
above which the electrode is located. The letter F refers to the frontal lobe, T to
the temporal lobe, P to the parietal lobe and O to the occipital lobe. Finally, C
refers to the central lobe (which is also known as insula) which is located within the
cerebral cortex, beneath the frontal, parietal and temporal lobe.

Figure 2.6: Lobes of the cortex (Andreassi, 2000)

2.4.1.2 Electrophysical Origin of EEG Signals


Informations in the brain are passed by neurons (Figure 2.7). The dendrites collect
information from other neurons that are processed in the soma (cell body). The
20 2. Emotion and Affective Computing

information is passed on by the axon to the terminal buttons as an action potential.


From the terminal buttons the action potential is transmitted across the synapse (a
junction between the terminal button of an axon and the membrane of the dendrites
of another neuron) to another neuron.

Figure 2.7: Main components of a single neuron (Carlson, 2007)

Neurons can be exited by different stimuli, such as natural stimuli from receptor
organs or electrical stimuli. A neuron will not fire until a minimum level of stim-
ulation is reached. If a stimulus is strong enough to trigger the firing of a neuron,
the electrical impulse will continue across the entire axon and potentially via the
synapse to the dendrites of another neuron. As neurons do not touch, transmission
across the synapse is done by neurotransmitters which can cause excitatory and in-
hibitory activity at the synapse. An excitatory post synaptic potential (EPSP) leads
to a depolarisation of the membrane of the next neuron, an inhibitory post synaptic
potential causes a hyperpolarisation. Figure 2.8 illustrates excitatory and inhibitory
activity at the synapse. The action potential that is conducted along the axon to
an excitatory synapse produces an EPSP which can cause another action potential
in the post synaptic neuron (A). In contrast, the IPSP suppresses the generation of
an action potential.
The signals measured by EEG are summed inhibitory and excitatory post synaptic
potentials which occur in the pyramidal cells of the cerebral cortex. An important
characteristic of pyramid cells are their very long dendrites which point to the outside
parts of the cortex. As the resulting voltage of an action potential of a single neuron
is too small to be visible on the surface of the skull, a synchronous activation of a few
thousand neurons pointing into the same direction is necessary, such that potential
differences can be registered.

2.4.1.3 Artifacts

As the amplitude of the EEG-signal is quite low, the signal can easily be influenced
by artifacts - especially because of the high amplification, also small electrical inter-
ferences can have an impact on the signal. Trimmel (1990) differs between technical
artifacts and artifacts with a biological origin.
Technical artifacts can result from relative electrode movement on the skin which
can for instance be caused by movements of the face. Moreover, movements of
the electrode cables can cause capacity changes. Another source for artifacts is
2.4. Registering Emotions from Physiological Signals 21

Figure 2.8: Excitatory and inhibitory activity at the synapse. (Andreassi, 2000)

the noise of an amplifier, as well as interfering fields from power supply voltage.
Finally electrostatic voltages can affect the EEG-signal. Most of these artifacts can
be delimited by a careful preparation such as insulation of the cables.

Biological artifacts result from physiological activity of the body. For example eye
blinks produce high amplitudes which occur especially at the frontal electrodes.
Eye movements can also cause artifacts due to the electrical polarity of the eye.
Moreover, muscle potentials can influence the signal which can easily be identified
by their higher amplitude and frequency though. As the tip of the tongue has a
negative voltage, movements can also influence the EEG signal. This influence can
be increased by some dental fillings.

2.4.1.4 Emotions and EEG

Davidson (1992) states that left and right hemisphere of the brain are specialized to
different classes of emotions. While the left anterior hemisphere is specialized on ap-
proach3 , the right anterior hemisphere is responsible for withdrawal. He explains the
specialization of the left anterior hemisphere with the findings of Luria (1973) who
described the left frontal region as an important center for intention, self-regulation
and planning. Moreover, damage of left frontal region has been proved to cause ap-
athetic behavior in combination with a loss of interest and pleasure in objects and
people what can be seen as a deficit in approach. Evidence for the specialization
of the right anterior region can be seen in findings that indicate a high activation
of right frontal and anterior temporal regions during arousal of withdrawal-related
emotional states (e.g. fear and disgust).
3
Davidson uses the term approach as an antonym of withdrawal.
22 2. Emotion and Affective Computing

This theory supports the findings by Davidson et al. (1990) who detected less alpha
power4 in right frontal regions for disgust than for happiness while happiness caused
less alpha power in the left frontal region than disgust.
Additionaly, in an EEG study about brain asymmetries during reward and punish-
ment (which can be seen as a positive and a negative emotional stimulus), Sobotka
et al. (1997) found that punishment was associated with less alpha power in right
mid and lateral frontal regions of the brain (electrodes F4 and F8) whereas reward
trials were associated with less alpha power in the left mid and lateral frontal regions
(electrodes F3 and F7).
In an experiment where three emotions (fear, happiness and sadness) were induced
with visual and auditory stimuli, Baumgartner et al. (2006) showed that alpha power
over the left hemisphere increases in happy conditions compared to negative emo-
tional conditions.
There are also several neuroimaging studies about brain activity during emotions.
As neuroimaging methods are not subject of this thesis we refer to a meta-analysis
conducted by Murphy et al. (2003) for more informations about this topic.

2.4.2 Muscle Activity


When a muscle is contracted, electrical potentials are generated by the muscle fibres.
These electrical potentials can be measured by electromyography (EMG). Especially
facial electromyography can provide a sensitive and objective measure for emotion
studies as some facial muscles cannot be activated voluntary (e.g Ekman (1993);
Ekman et al. (2003)). The electrode positions which are often used for measuring
facial EMG are shown in Figure 2.9.

Figure 2.9: EMG electrode placements for surface differential recording over major
facial mimetic muscles. (Andreassi, 2000)

Schwartz et al. (1976) found that positive thoughts come with a higher activity of
zygomaticus major whereas unpleasant thoughts lead to an increase of the activity
of corrugator supercilii when self inducing different emotions. Cacioppo et al. (1986)
4
As high activation of a brain region suppresses alpha activity in this region, alpha power is
often used as an indicator for activation of a certain part of the brain.
2.4. Registering Emotions from Physiological Signals 23

corroborated this study when they found that corrugator supercilii and orbicularis
oculi regions varied in conjunction with valence and intensity of pictures: if sub-
jects liked a scene, corrugator supercilii and zygomaticus major activity was higher
and orbicularis oculi activity was lower compared to neutral or unpleasant stimuli.
More examples for research in emotion specific facial EMG activity can be found at
Tassinary and Cacioppo (1992).
As the amplitude of facial EMG is quite weak, the signal can easily be affected by
external inference. Additionaly, movement artifacts can influence the signal. For
example the movement of a muscle can deform the skin under the electrodes which
would lead to a change of the skin-electrode impedance. It might also happen that
electrodes change their position because of muscle contractions. It is also possible
that crosstalk from adjacent muscles is recorded.

2.4.3 Skin conductance


Skin conductance measures the electrical resistance of the skin which is influenced
by the activity of the perspiratory glands. The more perspiratory glands are active
the higher is the skin conductance. Skin conductance is usually measured at the
hand or at the feet. As most emotions cause an increased activation of the human
nervous system, skin conductance can be seen as a good indicator for the existence
of an emotion although there is usually a latency of one to two seconds between
stimulus and response. For example Winton et al. (1984) proved that skin conduc-
tivity response changes linearly with emotional arousal. This view was supported
by Ekman et al. (1983) who verified that fear and disgust lead to larger changes of
skin conductance than happiness. Beyond that Ax (1953) used skin conductance to
differentiate between anger and fear.
There are some things that have to be kept in mind when analyzing skin conductance.
If the body temperature increases, skin conductance increases as well. Therefore,
the temperature should be kept stable during the whole time signals are recorded.
Moreover, respiration has an influence on the signal which can cause artifacts while
a person is speaking. Skin irritation can also become a problem if the electrodes are
attached to the skin with adhesive tape.

2.4.4 Skin Temperature


Skin temperature is the temperature that can be measured on the surface of the
body. To measure skin temperature, a thermistor (a sensor which resistance varies
according to its temperature) is usually placed at the finger tip or the palm.
McFarland (1985) states that arousing, negative emotions lead to a decrease of skin
temperature, whereas calm, positive emotions caused an increase of skin tempera-
ture. In contrast, Ekman et al. (1983) found that finger temperature increased more
in anger than in happiness. In addition, a distinction between the negative emotions
anger, fear and sadness was possible as sadness and fear caused lower changes in
skin temperature than anger.
When skin temperature is measured, one has to take care that no heat accumulation
takes place under the electrode. Moreover, the temperature of the environment
should be constant, as it influences the skin temperature.
24 2. Emotion and Affective Computing

2.4.5 Heart Rate


Heart rate is recorded using electrocardiography. The recorded signal reflects the
summed action potentials of the heart muscle.
Electrocardiographic measures can be influenced by changes of skin potential which
causes slow oscillation in the signal. In addition to that muscle potential artifacts
can corrupt the signal if the electrodes are placed on muscles. Another problem is
that heart rate also increases as a consequence of physical activity.
Instead of measuring heart rate directly, it can also be derived from blood volume.
Blood volume refers to the amount of blood that is present in the periphal blood ves-
sels in a certain region at a given time. It is usually measured by photoplethysmog-
raphy which is an optical technique that measures the amount of infrared light that
is reflected by the skin. Compared to electrocardiographic electrodes, photoplethys-
mographic sensors are easier to apply and to wear and therefore more comfortable
for a person.
When blood volume is measured, it is important to have a constant temperature as
the temperature of the body influences the dilatation of blood vessels and therefore
also the blood volume. Moreover, the relative position to the heart should be kept
constant because hydrostatic pressure also has an influence on the blood volume.
Anttonen and Surakka (2005) found that heart rate decelerated in response to emo-
tional stimulation, especially in response to negative stimuli compared to responses
to positive and neutral stimuli. Vrana et al. (1986) also reported a higher heart
rate acceleration during fear imagery than neutral imagery and silent repetition of
fearful or neutral sentences respectively. Moreover, Leng et al. (2007) verified that
amusement produces a larger average and standard deviation heart rate of the heart
rate than fear.

2.4.6 Respiration Rate


The respiration rate refers to the number of times a person inhales and exhales per
time unit. Picard and Healey (1997) assume that it can also be useful to recog-
nize a subject’s affective state. As heart rate and respiration are highly correlated,
recordings of respiration rate can also be used to analyze which heart rate changes
are effected by changes in respiration.
One main problem when measuring respiration is that humans are able to control
their respiration which might not be desired for the experiment.

2.4.7 Combined measures for affect recognition


To improve recognition rates of affective system, different physiological measures
can be combined.
In a user-independent study using skin temperature, electrodermal activity (skin
conductance response) and heart rate as input signals, Kim et al. (2004) achieved
a recognition rate of 78.43 percent for three emotion categories (sadness, anger and
stress) and a recognition rate of 61.76 percent for four categories (sadness, anger,
stress, surprise). For emotion elicitation, a multimodal approach which included
2.4. Registering Emotions from Physiological Signals 25

visual, illumination, and auditory stimuli was used. The classification was done
using a support vector machine.
Picard et al. (2001) conducted a study where electromyographic, blood volume pres-
sure, skin conductance, and respiration information were collected from one single
person over a period of some weeks. For eight emotion categories (no emotion, anger,
hate, grief, platonic love, romantic love, joy and reverence) a recognition rate of 81
percent was achieved.
A combination of electromyography, electrodermal activity, skin temperature, blood
volume pulse, electrocardiogram, and respiration was used in a study conducted
by Haag et al. (2004) with a single person on different days. In the experiment
emotional states of high and low arousal and high and low valence were elicited
with a blockwise presentation of emotional pictures from the international affective
picture system (IAPS). To classify the emotions, a neural network was used. For
arousal a classification rate of 96.58 percent could be achieved whereas high and low
valence could be distinguished with a correctness of 89.93 percent.
In a study with multiple subjects using electroencephalography, heart rate, and pulse
were conducted by Takahashi and Tsukaguchi (2003), who tried to elicit positive
and negative emotion by acoustic stimuli (music). For data analysis, a multi layer
neural network was compared with a support vector machine. With the neural
network classifier a recognition rate of 62.3 percent was achieved whereas the SVM
classifier recognized 59.7 percent. Takahashi (2004) extended these results with
a study where five emotions (joy, anger, sadness, fear, and relax) were induced
by audio-video contents. For classification of the signals (electroencephalography,
skin conductance, heart rate) a support vector machine was used and achieved a
recognition rate of 41.7 percent (66.7 percent on three emotions).
26 2. Emotion and Affective Computing
3. Data Collection

Before being able to conduct the experiments described in 5, data had to be col-
lected. After an overview about preceding considerations in section 3.1, this chapter
describes the design of the experimental procedure (3.2) followed by ethical con-
siderations (3.3) and a critical reconsideration of the experimental setup in section
3.4.

3.1 Preceding Considerations


The following section gives an overview about the preceding considerations previous
to the experiment. This includes the selection of the emotions which are object
of the investigation as well as the emotional model. Furthermore, an appropriate
method for emotion elicitation had to be selected.

3.1.1 Selection of Emotions


For the present study, the dimensional model was selected (see 2.2.5.2). Although
there is still no consensus about how emotions can be defined, the dimensional
model is well-established among psychologists. It also offers an adequate resolution
to differ between emotions and can easily be transformed to digital systems. Finally,
the dimensional model has already been used for similar experiments.
In this experiment the three states pleasant, neutral and unpleasant were selected.
The difference between these states can easily be seen on the valence-axis of the
dimensional model.

3.1.2 Emotion Induction


For the reasons introduced in section 2.3.3 we decided to use controlled elicited
emotions in the current study. After the decision for this kind of emotion, one im-
portant challenge was the choice of an appropriate emotion induction procedure.
As illustrated in section 2.3.3, there are many different methods to induce emotion
like films, music, pictures or imagination. For this study it was important to have
a reliable and valid emotion induction method. Moreover, the method had to be
28 3. Data Collection

replicable as there would be more than one repetition of the experiment. The last
important point was the comparability to other studies. For this reason the Inter-
national Affective Picture System (IAPS) was chosen for emotion induction. With
a database of nearly 1000 photographs Lang et al. (2005) provide standardized and
well researched material for visual emotion stimulation. For each picture means and
standard deviations on a 9-point scale are provided for the three dimensions valence,
arousal and dominance. Figure 3.1 shows the pictures of the IAPS in a 2-dimensional
space based on mean valence and arousal ratings. A low value on the valence axis
indicates an unpleasant picture a high value a pleasant picture, respectively. The
Valence

Arousal

Figure 3.1: Pictures of the IAPS in a 2-dimensional space based on mean pleasure
and arousal ratings

method has been proven to be a valid and reliable instrument for investigations of
emotion (Hamm and Vaitl, 1993). Besides that, experimental settings can easily be
repeated and compared to other studies where the IAPS has been used. Moreover,
the subject is not moving during watching the pictures, thus, there are no or only
few body movements which could influence the signal.

3.2 Experimental Design


This section describes the details of the experimental design used for the current
study. While section 3.2.1 describes the hardware we utilized, section 3.2.2 gives
an overview about the software for recording and evaluating the data. After a
description about the electrode placement we used for our experiment (3.2.3), we
give an overview about the subject who took part in this study (3.2.4). Next, we
depict stimulus material for emotion induction (3.2.5) and the presentation of this
material (3.2.6). Finally, section 3.2.7 describes the experimental procedure.
3.2. Experimental Design 29

3.2.1 Hardware Design


TM
To amplify and digitize the signals, the VarioPort amplifier (Becker, 2003) and
recorder were used. The technical specifications of the amplifier are summarized
in Table 3.1. To avoid aliasing, a sampling rate of 300 Hz was used for all the
recordings. The amplifier and the recorder were connected to the computer through
an optical waveguide interface in order to minimize inference. The interface was
connected to one of the computers USB-ports with an RS-232 to USB adapter. All
recordings were done with an IBM T60 laptop with a Core 2 Duo T7200 2 GHz
processor and 1 GByte RAM.

Amplification factor 2775


Input Range ±450µV
A/D conversion 12 Bit (4096 steps)
Resolution 0,22 V/Bit
Frequency Range 0,9 . . . 60 Hz
Input Channels 16

TM
Table 3.1: Technical specifications of Varioport EEG amplifier

3.2.2 Software Design


This section gives an overview about the software we used in our experiments.

3.2.2.1 Recording Software


All recordings were done with the UKA EEG/EMG Studio Visual v1.0 which is a
modified version of the UKA EEG/EMG Studio (Mayer, 2005). As the previous
versions of UKA EEG/EMG Studio were used to present a list of words to a sub-
ject, UKA EEG/EMG Studio visual was modified in a way that it was possible to
present pictures in full screen mode. Moreover, the presentation of the pictures was
automated, such that for every session presentation only had to be started once.
Figure 3.2 illustrates the screen as it is visible to the supervisor and the subject.

EEG signals

control window
supervisor’s window

Figure 3.2: Left side shows screen as it is visible to the supervisor, right side shows
the subject’s screen where pictures for emotion elicitation are presented
30 3. Data Collection

3.2.2.2 Recognition Software


For data processing two different software setups were used. In the first setup
MATLAB R
was used for processing of the data. For classification libSVM (Chang
and Lin, 2001) - which is a software for support vector classification (see 4.1.2) - was
integrated into MATLAB.
The second setup was used to compare the first setup with sequence modelling
based on hidden Markov models (HMMs) as it is often used for speech recogni-
tion. For this purpose the Janus Recognition Toolkit (JRTk), which is developed
at the University of Karlsruhe (TH), Germany and the Carnegie Mellon University,
Pittsburgh, USA, was used. The system is based on a modification of a Tcl based
script collection, which had already been used for unspoken speech recognition from
electroencephalographic data by Wester (2006); Wand (2007); Porbadnigk (2008).

3.2.3 Electrode Placement


Most of the EEG recordings were done with a standard EEG cap (Electro-Cap
International, Inc) which consists of 19 Ag/AgCl electrodes that are placed according
to the international 10-20 system (see 2.4.1.1 for further information). Moreover,
a ground electrode is placed at the forehead. As the amplifier was restricted to 16
channels, only Fp1, Fp2, F3, F4, F7, F8, Fz, C3, C4, Cz, P3, Pz, T3, T4, T5,
T6 were selected for recording. Because visual processing is not that important for
emotion recognition, O1 and O2, covering the optical cortical regions, and P4 were
left out. The EEG was conducted unipolar against the ear lobes (A1, A2).
Recordings of five subjects were done with a headband, which has been developed
by Honal (2005). The headband consists of four Ag/AgCl electrodes attached at the
positions Fp1, Fp2, F7 and F8. As ground electrode a disposable electrode at the
back of the neck was used. The reference electrode was placed at the left mastoid.

3.2.4 Subjects
Most of the 23 subjects were students of the University of Karlsruhe (TH) who
participated voluntarily in this study. 20 of them were male, three were female with
an average age of 26 (standard deviation: 2.07). All subjects had perfect or near
perfect vision. 19 of the participants were right handed, four were left handed. All
of them stated to feel healthy on the day of the experiment and had not used any
medication that could affect the EEG signals. The subjects were organized in two
groups. In one of the groups which consisted of 17 subjects, data of 16 electrodes
was recorded with a standard EEG cap, the recordings of the other six subjects
were done with the EEG-headband. Due to technical errors, one of the headband
subjects had to be excluded from the evaluation. A detailed view of statistical data
of the subjects can be found in Table 3.2.

3.2.5 Stimulus Material for Emotion Induction


To induce emotions to the subjects, 90 pictures of the International Affective Picture
System (IAPS) were selected. The IAPS pictures were used to induce the three
emotional states pleasant, neutral and unpleasant. For each emotional state, 30
pictures were selected. As some pictures have different effects on male and female
3.2. Experimental Design 31

ID age sex handedness setting


1 24 M R EEG-cap (16 electrodes)
2 28 M L EEG-cap (16 electrodes)
3 27 M R EEG-cap (16 electrodes)
4 24 F L EEG-cap (16 electrodes)
5 30 M R EEG-cap (16 electrodes)
6 24 M L EEG-cap (16 electrodes)
7 27 M R EEG-cap (16 electrodes)
8 24 M R EEG-cap (16 electrodes)
9 26 M R EEG-cap (16 electrodes)
10 25 M R EEG-cap (16 electrodes)
11 27 F R EEG-cap (16 electrodes)
12 30 F R EEG-cap (16 electrodes)
13 29 M R EEG-cap (16 electrodes)
14 27 M R EEG-cap (16 electrodes)
15 24 M L EEG-cap (16 electrodes)
16 25 M R EEG-cap (16 electrodes)
17 23 M R EEG-cap (16 electrodes)
41 27 M R headband (4 electrodes)
42 27 M R headband (4 electrodes)
43 25 M R headband (4 electrodes)
44 23 M R headband (4 electrodes)
45 27 M R headband (4 electrodes)
46 25 M R headband (4 electrodes)

Table 3.2: Overview about the subjects

people (e.g. opposite sex erotica), two different picture sets were chosen for male and
female subjects. A list of the pictures that were selected can be found in appendix
B. As illustrated in section 3.1.2, pictures are organized on a 9-point scale where
high values indicate pleasant and low values indicate unpleasant pictures. To induce
positive emotions only pictures with a valence higher than 7.5 were selected from
categories like food, family or opposite sex erotica. The neutral pictures which
include neutral faces and objects range from 4 to 6. Unpleasant pictures have a
valence lower than 2.6 and consist of pictures like mutilation, contamination, human
or animal threat. Mean values and standard deviation for valence and arousal are
shown in Table 3.3. Only pictures with the same aspect ratio were selected. All
pictures were converted to a size of 1200 x 900 pixel.

3.2.6 Picture Presentation


The selected pictures were presented on a 19”TFT wide screen monitor (Belinea 1945
S1W), which was placed about 100 cm away from the subject. The pictures were
presented in two blocks of 45 pictures (15 pictures for each emotion) in a randomized
order. Before a picture was presented to the subject, a black fixation cross was
shown for two seconds to help the subject to focus to the middle of the screen.
After that, the picture was shown for eight seconds. During this time subjects were
told to avoid eye blinks and movements to reduce artifacts. Only the data recorded
32 3. Data Collection

Valence Mean Arousal Mean


(Standard Deviation) (Standard Deviation)
Emotion Men Women Men Women
Pleasant 7.78 (1.43) 8.17 (1.31) 5.88 (2.17) 5.04 (2.58)
Neutral 4.91 (1.23) 4.98 (1.23) 2.63 (1.89) 2.80 (1.93)
Unpleasant 2.15 (1.46) 1.52 (1.03) 5.93 (2.30) 6.72 (2.16)

Table 3.3: Characteristics of IAPS pictures used for emotion induction

during these eight seconds was used to analyze the emotional state. Before the next
presentation cycle started, a gray bar was shown for 15 seconds to allow the subjects
to normalize their emotion. The whole process of picture presentation is shown in
Figure 3.3. Between the two blocks there was an intermission of five minutes. Every
block took about 20 minutes.

2 sec 8 sec 15 sec

Figure 3.3: Process of picture presentation

3.2.7 Experimental Procedure


All experiments took place in the afternoon, all in the same room in the computer
science building of the University of Karlsruhe (TH). Before the experiment started,
subjects were asked for personal data like age, vision, handedness and state of health.
Afterwards, the electrodes were attached to the subject according to the description
in 3.2.3. Meanwhile, the subjects were asked to read the experimental instructions
(see Appendix C) which were handed out to the subject as a paper copy, so that all
subjects had the same instructions. Finally the subject had to sign a consent form
that the recorded data can be used for research purposes.
After everything was readily prepared and the subject had no further questions, a
small practice picture set was shown to the participants consisting of five pictures to
help the subjects to get familiar with the experimental procedure. Another intention
was to point out that some of the pictures might be disgusting or frightening. After
that, last open questions were answered, subjects were told again to avoid eye and
body movements during picture presentation. They were also told that they had
the possibility to stop the experiment whenever they wanted to.
During the main experiment the supervisor was in another room to avoid distrac-
tion of the subjects. Subjects were told to notify the supervisor if there were any
problems. When the experiment was finished, electrodes were detached from the
3.3. Ethical Considerations 33

subject and they were given a pile of hard copies with all the pictures they had seen
before, with the instruction to sort them into the three categories pleasant, neutral
and unpleasant.
After the rating procedure, the experiment was over. The whole experiment includ-
ing electrode placement and picture rating took about 90 minutes per subject.

3.3 Ethical Considerations


When developing a system for emotion recognition, very personal and sensitive data
is collected from the subjects. Therefore the collected data has to be handeled very
carefully in order of not violating the subjects’ privacy. Reynolds and Picard (2004)
name three important questions that have an impact on the experiment design:

1. which emotions are analyzed,

2. who has access to the recognition results, and

3. how is the information about the recognized emotions used.

In this study, the three states pleasant, neutral and unpleasant were investigated.
As it is questionable whether it is ethical to induce negative emotions, participants
were informed in the experimental instructions that pictures also include unpleasant
pictures. Moreover, unpleasant pictures were included in the practice picture set.
In a study about ethical contracts Reynolds and Picard (2004) proved that people
who had signed a contract about the use of their physiological data felt their privacy
more respected than people without an ethical contract. All subjects were informed
in the consent form, that their data was anonymized and used for research purposes
only.
There are a lot of ways to misuse emotional data like lie detection or systematic
emotion manipulation. Nevertheless, the only purpose of this research is to develop
a system that is able to detect a person’s emotions to be able to improve interaction
with computers and support them in their tasks and not for invading a person’s
privacy against its will.

3.4 Problems
There were several problems regarding the emotion induction procedure that can be
ascribed to the International Affective Picture System.
According to Lang et al. (1997) all pictures in the IAPS were selected ”that are easy
to resolve, have clear figure - ground relationships, and communicate affective quality
relatively quickly”. Nevertheless, the point of action was not always in the center of
the picture. Therefore, for some pictures it was hard to conceive the content at first
view. One subject even stated that he found eight seconds too short to perceive an
emotion.
One also has to pay attention to the fact that the same picture may have a different
influence on different people. For example a picture of a boy playing chess will have
34 3. Data Collection

a positive influence on person who likes playing chess, whereas a person who is not
interested in chess might perceive this picture as neutral or even negative. Beyond
that, the more often we are in a similar situation, the less intense our reaction to a
stimulus is. As the unpleasant pictures included several pictures of mutilation, one
could argue that the first time a person sees one of these pictures he or she reacts
more intense than later on when seeing a similar picture.
For the reasons listed above, all subjects were asked to rate the pictures subsequent
to the experiment.

3.5 Summary of Collected Data


All in all, data of 24 subjects was recorded. Out of these recordings, 17 recordings
were conducted with the EEG-cap, 6 with the headband. Due to technical errors,
data of two subjects (3 and 41) had to be discarded. As the recording procedure for
subject 4 differed from the other subjects (45 pictures were presented blockwise, 45
pictures randomized), data was also excluded from the analysis.
For each subject, a total amount of 720 seconds (that is 90 pictures) of emotional
data was recorded which divides in equal shares to the three states pleasant, neutral
and unpleasant (240 seconds each). Summing up, 10800 seconds of data were in-
cluded in analysis of subjects with the EEG-cap whereas 3600 seconds of data were
used for analysis of headband subjects.
4. Methods

As mentioned in section 3.2.2, two different systems are used for the recognition
task which will be described in the following. In section 4.1, we give an overview
about the processing steps done in the system based on support vector machines.
Subsequent, we describe a system based on hidden Markov models (4.2).

4.1 Classification with Support Vector Machines


In the following section, we describe a system for emotion recognition based on
support vector machines (SVMs). Section 4.1.1 first describes the steps of the pre-
processing. Next, training and classification of the system are illustrated (4.1.2).

4.1.1 Data Preprocessing


This section gives an overview about the preprocessing steps performed on the
data, which include feature extraction (4.1.1.1), the organization of feature vectors
(4.1.1.2), and dimensionality reduction (4.1.1.3)

4.1.1.1 Feature Extraction


For training and classification, feature vectors have to be extracted from the raw
EEG signal. Feature vectors are extracted separately for each electrode by com-
puting a short time Fourier transform (STFT) for every picture. A Hann window
is applied to each segment with a window shift of half of the window length. The
short time Fourier transform for a signal x(n) (where n denotes the time and m the
window shift) with a frequency ω is defined as follows:

X
STFT(x[n]) = X(ω, m) = x[n] · w[n − m] · e−iωn (4.1)
n=−∞

where w[n] is the window function which determines the window length. The result
of the STFT is complex valued, containing information about amplitude and phase
shift. As we are only interested in amplitude values, the phase is eliminated by
taking the absolute value of the STFT result.
36 4. Methods

After computing the STFT, a bandpass filter is applied to the transformed signal to
eliminate all frequencies that are not of particular interest. The number of frequency
components after applying a bandpass filter with an upper frequency of fu and a
lower frequency of fl to a window of size nf t can be computed as follows:

(fu − fl )  nf t 
fs
· + 1 (4.2)
2
2

where fs is the sampling frequency.


To minimize artifacts which result from fluctuations over time, we perform an aver-
aging over k previous feature vectors.
Next, we normalize the feature vectors. This allows a better comparability of dif-
ferent feature vectors as after the normalization all feature values are located in the
same interval and have similar ranges and offsets. Two different normalization tech-
niques are investigated. In earlier work on EEG signals (Honal, 2005), these methods
had already proven to have a positive influence on performance of the recognition
rate.

GlobalNorm: For each feature vector, mean xi and standard deviation σi are cal-
culated. With these values, mean subtraction and variance normalization are
performed. The normalized feature value xnorm
i for a given feature value xi
can be computed according to the following equation:
xi − xi
xGN
i = (4.3)
σi
After this normalization all feature vectors have an average value of zero and
a standard deviation of one which makes it easier to compare feature vectors.

RelPower: The value of each frequency component xi is divided by the sum of


all frequency components for a given feature vector x with dimensionality
N . Thus, the normalized value xi for the ith frequency component xi can be
computed by
xi
xRP
i = PN (4.4)
j=1 xj

The advantage of this method is that the relations of frequency components


within a feature vector are preserved, though a drawback can be seen in the
fact that the relation of (total) power between adjacent feature vectors gets
lost.

4.1.1.2 Obtaining Feature Vectors

For further processing of the data, feature vectors from the different electrode chan-
nels are concatenated to form one large feature vector for each time segment. The
dimensionality of the resulting feature vectors can be computed by multiplying the
number of electrodes with the result obtained from equation 4.2. For instance, if we
use a window size of two seconds (i.e. 600 frames) and a bandpass filter from 5 to 45
Hz by using equation 4.2 we obtain feature vectors with a length of 16 · 80 = 1280.
4.1. Classification with Support Vector Machines 37

4.1.1.3 Dimensionality Reduction

As exemplified in section 4.1.1.2, the dimensionality of the feature vectors after


the STFT is very high compared to the number of feature vectors. Therefore, it
is reasonable to perform a feature reduction before training and classification is
done. Two different methods for dimensionality reduction are investigated. The
first method is to summarize frequency bands into bins by simply averaging over
adjacent frequency bands. The other method which is subject to investigation is a
feature reduction based on the correlation coefficient between each feature and the
independent variable. A linear discriminant analysis (see 4.2.1) - which is often used
for feature reduction of classification problems - is not applicable here as it might
become ill-conditioned due to sparse scatter matrices.

A very simple method for dimensionality reduction is an averaging over adjacent


frequency bands. Two different averaging alternatives are tested:

• The average is calculated for a fixed number of adjacent frequency bands.

• Frequency bands are put into physiologically relevant groups (e.g. α-, β-, and
γ-band or sub bands of this bands) and an average is calculated for each of
this group.

As a second approach, we apply a correlation-based feature reduction method. This


method selects those features, which correlate best with the reference variable (i.e.
the emotional state). In a study of Honal (2005) this method proved to provide good
results for feature reduction in a task demand experiment. The correlation-based
method also respects the continuous nature of the reference variable on the valence
scale. For feature reduction, features are ranked according to the absolute value of
their corresponding correlation coefficient. For a feature xi the correlation coefficient
rxi ,y is computed by
PR (j) (j)
j=1 (xi − xi )(y − y)
rxi ,y = PR (j) P R
(4.5)
2 (j) − y)2
j=1 (xi − xi ) · j=1 (y

(j)
where xi is the ith feature of the j th feature vector, xi the mean of the ith feature
value over all feature vectors. y (j) is the reference value belonging to the j th feature
vector and y the mean over all reference values. R denotes the total number of feature
vectors. When the dimensionality is reduced to k we keep only the k features with
the highest ranking.

4.1.2 Training and Classification


Training and classification are done by a support vector machine (SVM) with 6-
fold cross-validation. The basic principle of a support vector machine is that given
an n-dimensional space with data from different classes a separating hyperplane
is constructed in the way that the margin between data of different data sets is
maximized.
38 4. Methods

For a given training set of instance-label pairs (xi , yi ), i = 1, ..., l with xi ∈ Rn and
y ∈ {1, −1}l the following optimization problem has to be solved:

l
1 T X
min w w+C ξi
w,b,ξ 2 i=1 (4.6)
subject to yi (wT φ(xi ) + b) ≥ 1 − ξi
ξi ≥ 0
The slack variable ξi measures the degree of misclassification of xi , C > 0 specifies
the penalty parameter of the error term. w is the normal vector of the separating
hyperplane. The function φ maps the training vectors xi to a higher dimensional
space. In this higher dimensional space the SVM finds a linear separating hyperplane
with the maximal margin.
There are different kernel functions K(xi , xj ) ≡ φ(xi )T φ(xj ) which are used for
support vector classification. In this study, two different kernels are investigated:

• linear: K(xi , xj ) = xTi xj .

• radial basis function (RBF): K(xi , xj ) = exp(−γ kxi − xj k2 ), γ > 0 (γ is


the kernel parameter which determines the RBF width).

For the linear kernel, feature space and input space are exactly the same. In con-
trast to the linear kernel, the RBF kernel nonlinearly maps samples into a higher
dimensional space. This means that it can handle situations with a nonlinear rela-
tion between class labels and attributes. The differences between the linear and the
RBF-kernel are illustrated in Figure 4.1.

Figure 4.1: Comparison between classification with linear and RBF-kernel. The left
figure shows the original classification problem, while the figure in the middle shows
classification results achieved with a linear kernel. Classification on the left figure
was done with a RBF-kernel (parameter C was set to 1000 for both kernels).

Originally, SVMs were designed to solve binary classification problems. For multi-
class classification LIBSVM uses the one-against-one approach, in which k ·(k −1)/2
classifiers are constructed. Each classifier trains data from two different classes ac-
cording to the classification problem depicted in equation 4.6.
A more detailed introduction to classification with SVMs can be found at Burges
(1998).
4.2. Sequential Recognition System 39

4.2 Sequential Recognition System


In this section we describe the sequential recognition system based on hidden Markov
models (HMMs). Section 4.2.1 first describes the preprocessing steps that are per-
formed. Next, section 4.2.2 gives an overview about training and classification.

4.2.1 Data Preprocessing


For preprocessing of the EEG data first a mean subtraction is performed for each
picture and each electrode. Afterwards, a bandpass filter is applied to the data.
Next, a Dual-Tree Complex Wavelet Transform (DTCWT) is performed as this
method had been proved to perform best on word recognition tasks from EEG signals
(Wand, 2007). Therefore, we assume that it will also perform good on emotion
recognition tasks. The DTCWT - which is an extension of the discrete wavelet
transform - calculates the complex transform of a signal using two separate DWT
decompositions (see Selesnick et al. (2005) for more details). We use a decomposition
level of four for all experiments. After the DTCWT feature vectors are obtained by
concatenating features from all electrode channels. When all 16 electrodes are used,
dimensionality for each feature vector is 128 (2 · 4 · 16) .
Finally, we reduce the dimensionality of the feature vectors. As the dimensional-
ity of the feature vectors is much smaller than in the recognition system described
in section 4.1, linear discriminant analysis (LDA)- which is a powerful tool for di-
mensionality reduction for classification problems - is applicable here. The basic
idea of the LDA is that the dimensionality of a feature vector is reduced while dis-
criminative information is preserved as good as possible, i.e. that data of different
classes remains separable. Suppose there are C classes which are described by their
mean vectorPµi (i = 1, 2, . . . , C). Let Ni be the number of samples within class
i and N = C i=0 Ni the total number of samples. Based on this information, the
between-class scatter matrix SB describing the separation of the class means, and
the within-class scatter matrix SW which measures the spread of the clusters are
computed by

C
X
SB = (µi − µ)(µi − µ)T
i=1
Ni
C X
X
SW = (xj,i − µi ) (xj,i − µi )T (4.7)
i=1 j=1
C
1 X
where µ = µi
C i=1

xj,i is the j th sample belonging to class i (j = 1, 2, . . . , Ni ).


A transformation matrix T which tries to preserve the differences between the classes
is computed by maximizing the scatter between classes and minimizing the scatter
within classes:

TT S B T
TLDA = arg max (4.8)
T TT S W T
40 4. Methods

The columns of the transformation matrix are the eigenvectors of Sw−1 Sb which are
solutions of the general eigenvalue problem:
−1
SW SB T = λT (4.9)

4.2.2 Training and Classification


Training and classification are performed with a left-to-right hidden Markov model
(HMM) using a leave one out-setting, i.e. in each round one sample of each emotion
category was left out of the training procedure and used for testing.
HMMs can be used to characterize the observed data samples of a discrete-time
series. An HMM λ = (A, B, π) is a five-tuple consisting of

• a set of states S = {s1 , s2 , . . . , sn } where n is the number of states,

• the initial probability distribution π where π(si ) = P (q1 = si ) is the probability


of si to be the first state of a sequence,

• a matrix of state transition probabilities A = aij with aij = P (qt+1 = sj |qt =


si ) representing the probability of state sj following state si ,

• a set of emission probability densities B = {b1 , b2 , . . . , bn } where bi (x) =


P (ot = x|qt = si ) specifies the probability to observe a feature vector x in
state si , and

• a set of symbols V which can be emitted where v is the number of distinct


symbols.

The emission probabilities are modeled by Gaussian Mixture Models (GMMs). For
more information about HMMs please refer to Rabiner (1989).
During the training, the individual states are associated with the input feature
vectors. By four iterations of the Expectation Maximization algorithm - which is
an iterative optimization method to estimate unknown (or hidden) parameters from
given measurement data (see Dellaert (2002) for a more detailed description) - the
HMM parameters are updated such that the likelihood of the new HMM model λ0 is
larger for the training data than the likelihood of the old λ, i.e. P (x|λ0 ) > P (x|λ). To
classify a given utterance x, a Viterbi path - which finds the most likely sequence of
hidden states within the HMM - is computed for each trained HMM corresponding to
one of the three emotional states, i.e. we calculate P (x|λi ), i ∈ {pleasant, neutral,
unpleasant}. The λi whose likelihood yields the highest result is reported as the
classification result.
5. Experiments and Results

This chapter presents the analysis of the emotional data which was collected ac-
cording to the description in chapter 3. First, a short comparison of subjective user
ratings with the ratings according to the international affective picture system is
conducted (5.1). Afterwards, we describe an emotion recognition system based on
support vector machines and the optimization of this system (5.2). Next, we com-
pare the results from section 5.2 to those of a sequence modeling system based on
hidden Markov models (5.3). Beyond that, in section 5.4 we investigate the influence
on accuracy if only a subset of electrodes is used. Chapter 5.5 finally investigates
the temporal structure of the signal. Throughout all experiments, optimization of
the preprocessing is performed subject independent whereas optimization of training
and classification are done separately for each subject.
Data and tables corresponding to the figures in this chapter can be found in Ap-
pendix D.

5.1 IAPS Ratings versus Subject Ratings


Subsequent to the experiment, subjects were asked to rate the pictures presented
during the recording procedure. As one can see in Table 5.1, subjects tend to rate
most pictures as neutral (37.15 percent) whereas only 29.95 percent were rated as
pleasant and 32.90 percent as unpleasant. Please note that the table contains rating
data of all 23 subjects. The high number of neutral ratings can be explained by the
effect of central tendency, which states that people tend to avoid extreme ratings.
The most suitable method to avoid this effect would have been to use an even-
numbered scale which is not applicable in our situation. A more detailed analysis
of these ratings can be found in Schaaff (2008).
As subject ratings differ from IAPS ratings, two different data corpora are used for
further analysis:

IR corpus: Using the IR corpus, original ratings of the international affective pic-
ture system are used to analyze the data. This has the distinct advantage,
that all classes include the same number of samples
42 5. Experiments and Results

PS NS US Σ
PI 574 109 7 690
NI 43 614 33 690
UI 3 46 641 690
Σ 620 769 681 2070
[%] (29.95) (37.15) (32.90)

Table 5.1: Cumulated subject ratings of IAPS pictures. Columns show subject
ratings (S), rows contain ratings according to the IAPS (I).

SR corpus: Within the SR corpus, data is labeled according to the subjective


ratings of each subject. The major advantage is, that pictures better reflect
the emotional state of a subject than the IAPS ratings. Nevertheless, classes
may include different numbers of samples.

5.2 SVM-based Emotion Recognition


In this section we describe the recognition results based on the system described in
section 4.1. Starting from a baseline configuration (5.2.1), a parameter optimization
is performed. Optimization of the data preprocessing (5.2.3) is done with the same
parameters for all subjects to achieve good generalization properties. By contrast,
optimization of training and classification (5.2.4) is done separately for every subject.
The results of the SVM-based recognizer are discussed in section 5.2.5.

5.2.1 Description of the Baseline System


In the following, we try to optimize the performance of our recognition system. For
this purpose, the baseline configuration of the system was set to the following values:

• Window size for STFT: 8 seconds

• No averaging over adjacent features

• Frequency band: 5 - 45 Hz

• Normalization with method GlobalNorm

• No feature reduction

• SVM with RBF-Kernel is used with the default values for C and γ (C = 1,
γ = k1 , k := number of attributes in the input data)

• All 16 electrodes are used.

The mean recognition rate over all 15 subjects for the above setting is 36.60 percent
with a standard deviation of 6.54 percent.
5.2. SVM-based Emotion Recognition 43

P N U Σ P N U Σ
P 158 150 142 450 P 44 307 33 384
N 159 158 133 450 N 38 464 18 520
U 146 130 174 450 U 40 356 50 446
Σ 463 438 449 1350 Σ 122 1127 101 1350
[%] (34.30) (32.44) (33.26) [%] (9.04) (83.48) (7.48)

Table 5.2: Confusion matrix of IR cor- Table 5.3: Confusion matrix of SR cor-
pus (absolute values) pus (absolute values)

5.2.2 Data Corpus Selection

For a comparison of SR and IR corpus, confusion matrices were computed for both
corpora and summed up over all subjects. The results are displayed in Table 5.2
and 5.3.

Although using the SR corpus produces a much better recognition rate (41.33 per-
cent) compared to IR corpus (36.30 percent), classification data is biased towards
the class neutral in the way that more than half of the pictures are assigned to this
class. This can be ascribed to the fact, that data is unbalanced such that more
samples (i.e. more training data) are available for neutral pictures than for pleasant
or unpleasant pictures.

To allow a better comparability of the results of both corpora, we normalize the


confusion matrix of SR corpus. For this purpose we divide each row by the total
number of samples per class according to the subjects ratings (i.e. 384 for pleasant,
520 for neutral, and 446 for unpleasant pictures) and multiply by the total number
of samples per class according to the IAPS ratings (which is always 450). Table
5.4 shows the relative values of the normalized confusion matrix for the SR corpus.
Nonetheless, the recognition results for the different classes remain quite unbalanced.

P [%] N [%] U [%] Σ [%]


P [%] 3.82 26.65 2.86 33.33
N [%] 2.44 29.74 1.15 33.33
U [%] 2.99 26.61 3.74 33.33
Σ [%] 9.24 83.00 7.76 100.00

Table 5.4: Normalized confusion matrix for SR corpus (relative values)

Additionally, we obtain a decreased accuracy of 37.30 percent which is quite close


to the one of the IR corpus (36.60 percent).

As the classification results of SR corpus are apparently biased towards the class
neutral, IR corpus is used for further analysis.
44 5. Experiments and Results

5.2.3 Optimization of Data Preprocessing


5.2.3.1 Window Size and Average over Feature Vectors

When the STFT is performed, pictures are split into overlapping windows. To
determine the influence of the window size on the recognition rate, different window
sizes are applied to the raw data. A window shift of half of the window size is used.
Given the presentation time for each picture of tpic the number of feature vectors nf
per picture after the STFT with a window size of twin seconds can be computed by
tpic
nf = 2 ∗ −1 (5.1)
twin

As the window size has a straight influence on the number of feature vectors and
therefore on the number of feature vectors that can be used for averaging navg , both
were optimized in the same step. The results are shown in Figure 5.1.

Figure 5.1: Mean recognition rates subject to window size and average size

The computation of the ratio on the x-axis is done according to equation 5.2 which
returns values in the range of [0,1].

1 navg
Ratio = · (5.2)
2 ttpic
win

Figure 5.1 shows that the combination of a window size of two seconds and an
averaging over two adjacent feature vectors performs slightly better than other com-
binations. In most cases the recognition rate gets worse if no averaging is performed
(i.e. averaging over one feature vector) or if averaging is performed over all feature
vectors. According to the results, for further optimization a window size of two
seconds combined with an averaging over two adjacent feature vectors is used.
5.2. SVM-based Emotion Recognition 45

5.2.3.2 Filter Properties


The recorded EEG data consists of different frequency bands (see 2.4.1) which can
contain more or less information. Therefore, different bandpass filters were applied
to the data to investigate which combination of frequency bands contains most
information. Figure 5.2 shows the mean recognition rates depending on different
frequency bands. Bars are organized according to the recognition rate: the frequency
band with the highest accuracy is on the left, the lowest accuracy at the right.
As all frequency bands have a different size, we obtain a different number of frequency
components (i.e. a different length of feature vectors) after performing the STFT.
For this reason, for all settings we reduce the dimensionality after the STFT to
32 dimensions by the correlation based approach described in section 4.1.1.3. This
allows a better comparison of settings with a different width of frequency bands.

Figure 5.2: Mean recognition rate subject to frequency band (whiskers indicate
standard deviation)

A bandpass filter of 8 - 45 Hz - which is a combination of α-, β-, and γ-band performs


slightly better than the other filters and is therefore selected for further optimization
steps. Surprisingly using no filter at all (which includes frequencies from 0 to 150
Hz due to the sampling frequency of 300 Hz) provides quite reasonable results. It is
likely, that this is due to the method we chose for feature reduction as the no filter
condition is a superset of all other conditions.

5.2.3.3 Normalization
In the next optimization step the influence of the normalization method is inves-
tigated. Two different normalization methods (see 4.1.1.1) are used for data pre-
processing and compared with the recognition accuracy when no normalization is
performed. The results are depicted in Table 5.5. As one can see, normalization
mode RelPower performs better than GlobalNorm. This can be explained by the
46 5. Experiments and Results

Normalization Mode Accuracy [%] Standard deviation [%]


GlobalNorm 38.85 5.33
RelPower 39.19 5.71
no Normalization 33.33 0.00

Table 5.5: Influence of normalization mode on recognition accuracy

fact that within a feature vector, relations of frequency bands are preserved if we
use RelPower. Table 5.5 apparently shows that normalization is an essential step for
data preprocessing as recognition rate when no normalization is performed is 33.33
percent (i.e. chance level) with a standard deviation of zero. The reason for this
result is, that recognition process does not work at all as all data is assigned to the
same class.
According to the results explained above, normalization mode RelPower is used for
further processing.

5.2.3.4 Dimensionality Reduction

Figure 5.3: Mean recognition rate subject to average size (whiskers indicate standard
deviation

First, we study the influence of averaging over a fixed number of adjacent frequency
bands. The averaging is performed separately for each electrode. The reduced
number of features is obtained by dividing the original number of features by the
size of the average (reduction factor). Figure 5.3 shows, that this kind of feature
reduction actually does deteriorate the results. The only exception is when averaging
is performed over four adjacent frequency bands which causes a small increase from
39.18 percent to 39.41 percent. It is likely, that the decrease of recognition rate is
due to the loss of information which is caused by the averaging. The only advantage
5.2. SVM-based Emotion Recognition 47

of this method can be seen in a decrease of calculating time due to the reduced
dimensionality.
The second averaging method does - in contrast to the first method - respect physi-
ological meaningful groups of frequency bands. Table 5.6 shows a comparison of the
recognition rates when (a) no averaging (i.e. no feature reduction) is performed, (b)
an average is calculated for α-, β-, γ-, and θ-band, (c) the relative portion of each
frequency band is calculated and used for the recognition task.

Condition Accuracy [%] Standard Deviation [%]


(a) 39.19 5.71
(b) 36.41 5.23
(c) 35.56 5.56

Table 5.6: Mean recognition rate with different averaging conditions

Methods (b) and (c) reduce the number of features per electrode to three which
corresponds to a reduction factor of 50. Again, the reduction of features decreases
the accuracy. However, if feature reduction is performed by simply averaging over
adjacent frequency bands, accuracy is higher (36.41 percent) as if we use the relative
portions of each frequency band (35.56 percent) for feature reduction.

Figure 5.4: Mean recognition rate subject to dimensionality (whiskers indicate stan-
dard deviation)

Finally, we investigate a method for feature reduction which tries to find those
features which discriminate best by performing a correlation analysis. The feature
reduction is calculated for the whole feature vector, i.e. over all electrodes. To
find out, which number of dimensions performs best, we compare the accuracy for
a reduction to 20 , 21 , . . . 210 dimensions. Figure 5.4 shows the improvement of
48 5. Experiments and Results

the mean recognition rate subject to the number of features that is kept after the
correlation analysis.
It is obvious that the correlation-based method outperforms the other methods for
dimensionality reduction. The shape of the curve shows that it is useful to keep only
those features that contain much discriminative information. Due to the reduced
dimensionality of the feature vectors, a better training of the classification system is
possible which explains the fact, that recognition rate increases when dimensionality
decreases. The curve also illustrates that when too many features are discarded, too
little information is left for classification. The best recognition rate (52.74 percent)
can be achieved with a dimensionality reduction to 128 features and is therefore
selected for further processing.

5.2.4 Optimization of Training and Classification


After the preprocessing parameters are optimized over all subjects, we start with
a subject-dependent optimization of the SVM-parameters. According to the op-
timization performed in chapter 5.2.3 the following configuration of preprocessing
parameters is selected for the following experiments:

• Window size: 4 seconds

• Window shift: 2 seconds

• Averaging over two adjacent features

• Frequency band: 8 - 45 Hz

• Normalization with method RelPower

• Features are reduced to 128 with a correlation-based approach.

For optimization the results of an SVM with a linear kernel are compared to the
results of an SVM with an RBF kernel. As the choice of the kernel parameters
(i.e. the penalty parameter C and the kernel parameter γ for the RBF kernel and
C for the RBF kernel) has an important influence on the accuracy of the SVM, a
parameter optimization is performed separately for both kernels using grid search.
Following the suggestions by Hsu et al. (2003), parameters C and γ are varied as
follows:

• C = 2−5 , 2−3 , . . . , 225

• γ = 2−15 , 2−13 , . . . , 2−5

Table 5.7 shows the optimal values of C and γ for each subject. For the linear kernel
we achieve a mean recognition rate over all subjects of 60.81 percent with a standard
deviation of 6.13 percent. By the optimization of C and γ for the RBF kernel we
achieve a mean recognition rate of 62.07 percent with a standard deviation of 5.88
percent.
5.2. SVM-based Emotion Recognition 49

RBF kernel linear kernel


Subject ID log2 (C) log2 (γ) Accuracy [%] log2 (C) Accuracy [%]
1 21 -11 62.78 9 60.56
2 19 -9 73.33 9 72.78
5 21 -13 60.00 9 58.89
6 19 -9 61.11 11 60.00
7 25 -13 62.78 11 61.11
8 3 5 67.78 11 64.44
9 23 -9 61.11 13 60.00
10 7 3 60.56 11 60.56
11 19 -9 54.44 11 53.89
12 9 5 61.67 15 57.78
13 3 5 65.00 9 65.00
14 11 1 55.56 9 53.33
15 5 1 68.33 7 68.33
16 19 -9 50.00 7 48.89
17 5 3 66.67 9 66.67
Average 62.07 60.81

Table 5.7: Optimal values of C and γ for RBF and linear kernel

As recognition rates with two different kernels show, the RBF kernel seems to be
better suited for our recognition task. However, the high accuracy of the linear
kernel shows, that it is also possible to linearly separate the data.
An inspection of the optimal values of C and γ for the RBF kernel shows that
these values are highly correlated (correlation coefficient r = −0.9601), i.e. when
C increases the value of γ decreases. A high value of the penalty parameter C in
combination with a low γ, indicating the RBF width, can be seen as an indicator that
the model is overtrained and therefore, does not generalize enough. Thus, for later
studies it should be considered to limit the values of C and γ to avoid overtraining
although this means a decrease of recognition rate.

5.2.5 Analysis of Results


Due to different optimization steps, recognition rate could be significantly improved.
Optimization of preprocessing - which was performed subject independent - could
increase the recognition rate from 36.60 percent to 52.74 percent. Subject depen-
dent optimization of training and classification parameters allowed an additional
improvement to 62.07 percent. Table 5.8 shows, what kind of variation of parame-
ters caused which change of results. Parameters are organized according to their
influence of the recognition rate.
The largest improvements can be observed when a feature reduction is performed,
which in our study is done by a correlation-based approach. This can be explained
by the high dimensionality of the feature vectors (2368) before feature reduction
which is quite high compared to the small number of samples (60 per class at a
window size of four seconds and an averaging over two adjacent feature vectors).
50 5. Experiments and Results

parameter baseline value optimized absolute relative


value improvement improvement
[%] [%]
feature none correlation 18.55 34.58
reduction based, 128
features
SVM para- C: 1, γ: 1 user 9.33 17.69
meters dependent
averaging 1 2 1.75 4.71
window size 8 sec 4 sec 0.54 1.48
normalization GlobalNorm RelPower 0.34 0.88
method
SVM kernel RBF RBF 0.00 0.00
bandpass fil- 5 - 45 Hz 8 - 45 Hz - 0.05(1) - 1.02(1)
ter
(1)
For optimization we used only a reduced number of features. Using all features causes
a small decrease of recognition rate which can be explained by the different number of
features of both frequency bands.
Table 5.8: Summary of consecutive improvements due to parameter optimization

5.3 Comparison to Sequence Modeling with Hid-


den Markov Models
Until now, our recognizing system was based on support vector machines which - in
contrast to HMMs - do not look at sequential patterns. Therefore, in this section
we make a comparison with a HMM-based recognizer which analyzes the emotional
data as a temporal sequence of several states. A short description of this system can
be found in section 4.2.
In section 5.3.1 we present the optimization steps of the HMM-based system. Af-
terwards, the recognition results of this system are discussed (5.3.4).

5.3.1 Description of the Baseline System


In this section we try to optimize the recognition rate of the baseline system by vary-
ing parameters that might have an influence on the recognition rate. The baseline
system has the following configuration:

• No filter is applied to the data


• Number of dimensions after LDA is 35
• A 5-state left-to-right HMM with one Gaussian mixture model is used for
classification task

The mean recognition rate for all subjects in the baseline setting is 33.85 percent
(standard deviation: 4.24 percent).
5.3. Comparison to Sequence Modeling with Hidden Markov Models 51

5.3.2 Optimization of Data Preprocessing


During the optimization of preprocessing of sequential data, we investigate which
bandpass filter is most appropriate for our signals (5.3.2.1). Moreover, we analyze
which dimensionality after feature reduction provides the best results (5.3.2.2).

5.3.2.1 Bandpass Filter Properties

Figure 5.5: Mean recognition rates for different frequency bands (whiskers indicate
standard deviation)

To improve the recognition rate, different bandpass filters were applied to the raw
EEG signal. The results are displayed in Figure 5.5.
The figure shows, that applying a bandpass filter to the data can help to improve
the recognition rate. A bandpass filter of 5 - 45 Hz - which includes α, β, γ and
θ band - performs best on the given data. Therefore, this filter is used for further
optimization.

5.3.2.2 Dimensionality Reduction


As the dimensionality of EEG data is really high, we investigate if the recognition
rate improves if we use the most important dimensions, only. For this reason the
number of dimensions after the dimensionality reduction was varied. Best results
could be gained with a dimensionality of 35 (see Figure 5.6). This can be explained
by the fact that we have only a limited amount of training data which is to small for
training with a high dimensionality (e.g. 128 dimensions) and therefore the HMM
is undertrained.

5.3.3 Optimization of Training and Classification


As introduced in section 4.2.2 a HMM classifier is used to test whether sequence
modeling is an appropriate method for emotion recognition. Previous studies with
a similar setup (Wester, 2006; Porbadnigk, 2008) have shown that variation of the
52 5. Experiments and Results

Figure 5.6: Recognition rate depending on number of dimensions after LDA

Figure 5.7: Recognition rate subject to number of HMM states (whiskers indicate
standard deviation). Bars show the relative portion of subjects whose recognition
rate was best at this number of HMM states.
5.3. Comparison to Sequence Modeling with Hidden Markov Models 53

HMM topology can have an influence on recognition accuracy. Thus, the number
of HMM states and the number of Gaussian mixture models per state were varied.
The optimization was done separately for each subject.

First, the number of HMM states was varied. Values between one and 20 were
investigated. Figure 5.7 shows the influence of variation of the HMM states on the
mean recognition accuracy.

The unsteady shape of the curve can be explained by the fact that it is computed
as a mean over the recognition rates from recordings of 15 subjects. Although the
same preprocessing steps were performed for all subjects, there can still be large
differences in data structure, e.g. caused by a different physical and mental state on
the day of the experiment. Therefore, bars were added to Figure 5.7 indicating the
number of subjects who achieved maximum recognition rates at a certain number
of HMM-states. As this number is correlated with the peaks in the curve, this can
be seen as an explanation for the shape of the curve.

After identifying the optimal number of HMM states for each subject, the number
of GMMs which yields to best recognition results is investigated. For this reason,
we varied the number of GMMs within the following values: 20 , 21 , 22 , 23 , 24 , 25 ,
and 26 . The best combinations of number of HMM states (nHM M ) and number of
GMMs (nGM M ) are shown in Table 5.9.

Subject ID opt. nHM M opt. nGM M Accuracy [%]


1 5 1 55.56
2 20 16 44.44
5 1 1 44.44
6 8 1 41.11
7 5 1 51.11
8 16 1 52.22
9 16 1 43.33
10 16 1 47.78
11 8 2 45.56
12 2 8 37.78
13 8 2 47.78
14 5 1 43.33
15 5 32 42.22
16 2 32 51.11
17 12 1 44.44

Table 5.9: Best combination of number of HMM-states and number of GMMs for
each subject

The table shows that for most subjects recognition rate is best with one GMM and
in most of the cases the optimal number of HMM states is quite low. This can be
explained with the small number of samples for each emotion and each subject that
is available for training and classification.
54 5. Experiments and Results

5.3.4 Analysis of Results


After the optimization steps are performed, for all subjects a recognition rate which
is larger than 37.78 percent is achieved. The mean recognition rate over all subjects
is 46.15 percent with a standard deviation of 4.75 percent which is much better
than chance level (33.33 percent). The results indicate that it is possible to use
a sequential recognition system for emotion recognition. Nevertheless, the SVM-
based recognizer performs much better on the recognition tasks. Regardless, it is
very hard to compare both systems as the whole data processing is very different
for both systems.

5.4 Electrode Positioning


Although a standardized electrode cap like the one used in our experiments is often
used for medical research, it is quite awkward to use such a cap in everyday life for
improving human computer interaction. Therefore, we investigate how far a subset
of electrodes is sufficient for the recognition task. To do this analysis, we use the
optimized parameters obtained in section 5.2.3. Only the number of features after
feature reduction is adjusted, as the number of features correlates with the number
of electrodes. The SVM parameters are optimized in the same way as in section
5.2.4.

5.4.1 Using Frontal Electrodes Only


Following Honal (2005), we start with an electrode set consisting of electrodes fp1,
fp2, f7, and f8. The major advantages of this selection are that

• all electrodes can easily be integrated to a headband which is easy to attach


and to wear and
• as all electrode positions are at the forehead, electrode gel does not get in
contact with the hair.

These factors are really important for a usage in everyday life.


For this analysis we use data of the 15 electrode-cap subjects and, moreover, the data
of five subjects from whom data had been recorded using the headband developed
by Honal (2005).
Feature reduction shows, that the best performance (46.28 percent) is achieved when
features are reduced to 128 by the correlation-based approach (see Figure 5.8). Com-
pared to the setting where 16 electrodes are evaluated, the improvement due to the
dimensionality reduction is rather small. This can be explained by the smaller num-
ber of features if only a subset of electrodes is used.
Following these results we perform optimization of the SVM parameters with 128
dimensions.
Table 5.10 shows that also if only four electrodes are used, RBF kernel performs
better (53.28 percent) than the linear kernel (51.61 percent) for all subjects. In some
cases, the linear kernel can achieve the same recognition rate as the RBF kernel, but
there is no case where recognition rate outperforms the RBF kernel. Like in the
setting, where all electrodes are used, the optimal parameters are different for each
subject.
5.4. Electrode Positioning 55

Figure 5.8: Recognition rate depending on number of dimensions after feature re-
duction using the frontal electrodes. (whiskers indicate standard deviation)

RBF kernel linear kernel


Subject ID log2 (C) log2 (γ) Accuracy [%] log2 (C) Accuracy [%]
1 7 3 56.67 11 55.5556
2 25 -11 55.00 -5 51.6667
5 1 5 54.44 7 53.8889
6 25 -11 48.33 11 45
7 7 5 56.67 11 56.1111
8 17 -9 61.67 9 61.1111
9 5 5 53.89 7 52.2222
10 21 -11 52.78 9 50
11 9 3 47.22 7 46.1111
12 19 -11 49.44 7 44.4444
13 21 -7 57.22 9 56.6667
14 21 -13 51.11 9 46.1111
15 3 5 52.78 9 52.7778
16 -5 3 42.22 -5 42.2222
17 3 5 57.78 9 55
42 23 -13 46.11 9 45.5556
43 -5 -9 51.67 -5 51.1111
44 19 -13 61.67 9 60
45 21 -11 55.00 11 54.4444
46 19 -9 53.89 9 52.2222
Average 53.28 51.61

Table 5.10: Optimal values for C and γ for frontal electrodes for RBF and lin-
ear kernel. Recordings for subjects with subject IDs 42 - 46 were done with the
headband.
56 5. Experiments and Results

5.4.2 Using Midline Electrodes Only


Next, we analyze recognition rates when only midline electrodes (i.e. fz, cz, pz) are
used. These electrodes were selected as they cover the frontal, central, and parietal
lobe and therefore, most of the brain regions.

Figure 5.9: Recognition rate depending on number of dimensions after feature re-
duction using only midline electrodes. (whiskers indicate standard deviation)

Figure 5.9 shows that again a dimensionality of 128 performs best after feature
reduction. Therefore we select this dimensionality for SVM optimization. Surpris-
ingly, the curve decreases if dimensionality is reduced to 16. A possible explanation
can be seen in the comparatively high standard deviation.
Similar to the previous analyses we compare the recognition rates between the linear
and the RBF kernel.
Table 5.11 shows that for the midline electrodes as well, RBF kernel performs better
(51.74 percent) than the linear kernel (50.44 percent). Although we use only three
electrodes instead of four, recognition rates for midline electrodes are quite similar
to those of the frontal electrodes.

5.4.3 Comparison of Different Electrode Positions


The recognition rates that can be achieved with only a subset of electrodes are
quite reasonable. To allow a comparison between different sets of electrodes we
use the same system configuration for all electrode sets. Following the results of
section 5.2 we use a window size of four seconds, an averaging over two previous
features, a bandpass of 8 - 45 Hz, and RelPower as a normalization method. The
dimensionality of feature vectors is reduced to 128. For training and classification
an SVM with an RBF kernel is used with the default values of C and γ (C = 1,
γ = k1 ). Moreover, for evaluation of frontal electrodes we use only those 15 subjects
whose data was recorded with the electrode cap to have an equal number of subjects
5.4. Electrode Positioning 57

RBF kernel linear kernel


Subject ID log2 (C) log2 (γ) Accuracy [%] log2 (C) Accuracy [%]
1 17 -5 53.89 13 52.78
2 23 -9 56.67 15 55.56
5 23 -13 52.22 9 50.00
6 11 1 52.78 13 52.22
7 21 -13 63.89 9 62.78
8 5 5 51.11 11 47.78
9 9 3 47.22 13 47.22
10 21 -13 53.33 5 50.56
11 19 -13 45.00 7 44.44
12 21 -13 47.22 9 46.11
13 -5 5 46.67 -5 46.67
14 25 -11 51.67 15 48.33
15 9 -1 54.44 9 53.89
16 7 5 50.00 15 48.89
17 19 -11 50.00 11 49.44
Average 51.74 50.44

Table 5.11: Optimal values for C and γ for midline electrodes for RBF and linear
kernel

Electrode set Accuracy [%] Standard deviation [%]


all electrodes (16) 52.74 6.73
frontal electrodes (4) 45.19 4.66
midline electrodes (3) 45.11 5.07

Table 5.12: Summary of recognition rates subject to electrode selection


58 5. Experiments and Results

for each electrode set. Table 5.12 summarizes the recognition rates for all electrode
settings.
As one can see, best recognition rate is achieved when all recorded electrodes are
used. The recognition rate for the frontal electrode subset differs only slightly from
the subset where midline electrodes were used. The lower standard deviation for
frontal and midline electrode subsets can be seen as an indicator that signals at these
positions behave more similar for different users than the signals of all electrodes.

5.5 Analysis of Temporal Progression


As described in section 3.2.6 pictures are presented for emotion elicitation for eight
seconds. To find out at what time emotions can be discriminated best, we divide
pictures in eight parts, each of one second length. For every time segment training
and classification is performed as described in section 4.1, i.e. to compute the
recognition rate for the ith second i we use segment i for training and classification.
The system configuration is as follows:

• Window size: 1 second (→ one feature vector per picture, no averaging over
adjacent feature vectors possible)

• Frequency band: 8 - 45 Hz

• Normalization with method GlobalNorm

• Correlation based feature reduction to 32 features (as window size is 14 of data


used for optimization, i.e. the number of frequency components per electrode
is four times smaller)

• SVM with RBF kernel is used with default parameters (C = 1, γ = k1 )

Figure 5.10 shows the accuracy for the time segments.


Surprisingly, accuracy during the first second of picture presentation is quite low,
whereas the highest accuracy is reached during the fifth second. A possible explana-
tion can be that it takes some time to understand the message of a picture. This is
likely, since the eyes are not always focused at the point of action of a picture from
the beginning. After a picture is processed, intensity decreases again.
For further investigations one could try to select only those pictures whose focus of
attention is at the same place, e.g. in the middle of the picture.
5.5. Analysis of Temporal Progression 59

Figure 5.10: Recognition rate for each time segment (whiskers indicate standard
deviation)
60 5. Experiments and Results
6. Conclusion and Future Work

6.1 Conclusion
In this thesis we developed an emotion recognition system based on EEG signals.
For this purpose, first a data corpus had to be built. This was done by eliciting
emotions by presenting pictures from the IAPS. For each subject 30 utterances of
the three emotional states pleasant, neutral, and unpleasant were recorded.

For emotion recognition two different recognition systems were compared. With an
SVM based system a mean recognition rate of 62.07 percent could be achieved which
is significantly better than random guessing. Moreover, a sequence modeling system
based on HMMs as it is commonly used for speech recognition was investigated.
With this system we attained a mean recognition rate of 46.15 percent. Although
it is not easy to compare both systems as the processing steps are quite different it
is obvious that the SVM-based system performs much better than the HMM-based
system. This can be seen as an indicator that the nature of emotions is better
modeled with a non-sequential system.

Besides that, we analyzed the performance of two different electrode subsets. Al-
though it turned out, that recognition rate is best when all recorded electrodes are
included in the analysis, the accuracy that could be achieved with both subsets of
electrodes was still significantly above chance level.

Finally, a time analysis showed that accuracy is best for time segments from during
the middle of the presentation time, whereas accuracy decreased for time segments
at the beginning and at the end of the picture presentation.

6.2 Future Work


There are still a lot of challenges about emotion recognition. Some ideas how the
results of this thesis could be enhanced are listed below.
62 6. Conclusion and Future Work

Extending Data Corpus

One major principle from speech recognition is also applicable for emotion recogni-
tion: ”There is no better data than more data”. Therefore, data corpus should be
extended. This would lead to more reliable results, especially when doing parameter
optimization. Moreover, an increase of feature vectors can help to reduce the curse
of dimensionality which can cause undertraining.

Combination with Other Biosignals

In section 2.4.7 we presented studies using combined biosignals for emotion recog-
nition. For future work, also other biosignals could be investigated for emotion
recognition and combined with the EEG signals for emotion recognition.

Multimodal Scenario

Besides the possibility to combine EEG signals with other biosignals, it is also pos-
sible to use biosignals in a multimodal scenario. This would mean a combination of
biosensors with cameras and microphones. With such a scenario the disadvantages
of both methods could be weakened. For instance, if a person gets out of reach of
a camera, biosignals could still be used for emotion recognition as the sensors are
applied directly to the body.

Subject Independent Recognition

In this study, we only investigated in subject dependent emotion recognition. For


further studies it would be interesting to build a system that is able to do subject
independent emotion recognition. For this purpose, emotion specific patterns have
to be found, that are similar across subjects.

Robustness

The EEG signals used for this study were all recorded in a laboratory setting with
very stable conditions. As it is very unlikely to have such stable conditions in
everyday life, a more robust system is needed. There are two ways to make the
system more robust. At first, electrodes could be improved in a way that they are not
that susceptible to external electrical influences. This could be reached for instance
by using electrodes that are better isolated against external influences. Moreover, the
amplifier could be directly integrated to the electrodes. This would avoid artifacts
that occur at the wires between electrodes and amplifier. At second, better methods
for artifact removal could be implemented in the preprocessing process in order to
eliminate artifacts resulting e.g. from muscle contraction activity.

Online Recognition

The system in this study was built for offline recognition only. For future research
an online recognition system would be helpful.
6.2. Future Work 63

Spontaneous Emotion
As explained in section 2.3.3 there are three different kinds of emotions that can
be analyzed: spontaneous emotion, controlled elicited emotion, and acted emotion.
Certainly, spontaneous emotion is the most natural kind of emotion and should
therefore be considered for further studies. The major problem with spontaneous
emotion is, that it is hard to elicit - especially in a laboratory environment. Moreover,
labeling of the emotional data is quite complicated.

Improvement of Usability
In the future, emotion recognition could be used for various applications in every-
day life. Therefore, the disturbance for the user wearing the EEG recording devices
should be as little as possible. The headband used in our study is a first step to-
wards this direction. Making sensors smaller and wireless could also help to improve
usability. One major challenge can be seen in the fact, that the less sensors we have
the more comfortable for the user. But on the other hand, the less sensors we have,
the more complicated is the task to extract the desired information from the signals.

Categorical Emotions
For our investigations, we took the dimensional emotion model as a basis. Although
this model has a lot of advantages compared to the categorical model one major
disadvantage can be seen that some emotions which have a different character are
located quite close to each other. For instance, anger and frustration are located
very close on the dimensional arousal-valence scale. However, it is not for sure how
far the EEG signals differ for emotions that have a different character but are very
close in the dimensional model. Therefore, differences in EEG signals for categorical
emotions should be investigated.
64 6. Conclusion and Future Work
A. Prototype Model

Although Russell and Fehr (1984) introduced the prototype model of emotion, they
did not introduce a description for the tree like structure proposed in this approach.
For this reason, Shaver et al. (1987) conducted a study to find an appropriate de-
scription of the hierarchical structure of emotions. In order to organize emotions
according to this structure, 100 participants were asked to sort cards with 135 emo-
tion names into categories representing which emotions are similar to each other and
which are different. Figure A.1 shows the results of a hierarchical cluster analysis of
the results.
Node A was labeled joy, B was labeled cheerfulness, C was labeled sadness and D
was also labeled sadness.
66

Figure A.1: Results of a hierarchical cluster analysis of 135 emotion names (A = joy, B = cheerfulness, C and D = Sadness). The scale at
the left indicates the cluster strength, asterisks indicate empirically selected subcluster names.
A. Prototype Model
B. IAPS-picturesets

Men Women
Description Slide No. Valence Arousal Valence Arousal
Mean (SD) Mean (SD) Mean (SD) Mean (SD)
Seal 1440 7.96 (1.59) 4.76 (2.25) 8.43 (1.44) 4.47 (2.82)
Family 2340 7.65 (1.36) 5.35 (2.03) 8.34 (1.10) 4.53 (2.29)
Mountains 5700 7.70 (1.36) 5.94 (2.28) 7.54 (1.56) 5.44 (2.38)
Brownie 7200 7.50 (1.78) 4.90 (2.67) 7.77 (1.71) 4.85 (2.55)
Sailing 8080 7.73 (1.25) 7.12 (1.95) 7.73 (1.43) 6.25 (2.34)
PolarBears 1441 7.71 (1.17) 3.84 (2.10) 8.14 (1.33) 4.00 (2.55)
Skier 8190 8.13 (1.29) 6.41 (2.60) 8.08 (1.48) 6.16 (2.57)
Kitten 1460 7.80 (1.47) 4.20 (2.69) 8.58 (0.76) 4.42 (2.60)
Rafting 8370 7.67 (1.19) 6.46 (2.22) 7.86 (1.37) 6.98 (2.25)
Puppies 1710 8.02 (1.21) 5.53 (2.07) 8.59 (0.99) 5.31 (2.54)
Bunnies 1750 7.89 (1.26) 4.21 (2.22) 8.59 (0.75) 4.02 (2.40)
Tubing 8420 7.61 (1.61) 5.71 (2.42) 7.90 (1.50) 5.41 (2.34)
Rollercoaster 8499 7.51 (1.47) 6.69 (1.71) 7.70 (1.36) 5.56 (2.61)
Porpoise 1920 7.83 (1.29) 4.21 (2.49) 7.94 (1.61) 4.31 (2.57)
Beach 5833 8.15 (1.19) 6.37 (2.37) 8.27 (0.99) 5.14 (2.79)
Money 8501 8.14 (1.24) 6.86 (2.00) 7.67 (1.97) 6.02 (2.50)
Fireworks 5910 7.41 (1.20) 5.37 (2.32) 8.16 (1.15) 5.80 (2.75)
Baby 2150 7.46 (1.60) 4.66 (2.37) 8.31 (1.49) 5.29 (2.83)
Baby 2070 7.69 (1.59) 4.02 (2.30) 8.50 (1.28) 4.84 (2.97)
Baby 2040 7.63 (2.01) 4.33 (2.19) 8.74 (0.64) 4.97 (2.85)

Table B.1: Pleasant pictures for all subjects


68 B. IAPS-picturesets

Men Women
Description Slide No. Valence Arousal Valence Arousal
Mean (SD) Mean (SD) Mean (SD) Mean (SD)
Couple 2530 7.25 (1.84) 4.23 (2.03) 8.25 (1.10) 3.80 (2.17)
Couple 2550 7.37 (1.20) 4.15 (2.03) 8.14 (1.53) 5.16 (2.67)
Sunset 5830 7.37 (1.80) 4.98 (2.40) 8.54 (0.82) 4.88 (2.86)
IceCream 7330 7.29 (2.21) 4.54 (2.55) 7.96 (1.49) 5.54 (2.53)
Baby 2660 7.28 (1.59) 4.09 (2.20) 8.18 (1.24) 4.76 (2.56)
Seagulls 5831 7.07 (1.10) 3.93 (2.28) 8.05 (1.00) 4.79 (2.59)
Father 2160 6.87 (1.87) 5.31 (2.10) 8.16 (1.28) 5.03 (2.25)
Father 2165 6.74 (1.39) 3.89 (2.24) 8.29 (1.17) 5.05 (2.67)
Father 2057 7.16 (1.31) 4.32 (1.98) 8.39 (0.94) 4.73 (2.75)
Family 2360 6.98 (1.76) 3.65 (2.02) 8.20 (1.59) 3.67 (2.52)

Table B.2: Pleasant pictures for female subjects

Men Women
Description Slide No. Valence Arousal Valence Arousal
Mean (SD) Mean (SD) Mean (SD) Mean (SD)
EroticFemale 4002 7.69 (1.48) 7.15 (1.81) 4.14 (1.82) 3.72 (2.30)
AttractiveFem 4150 7.80 (1.36) 6.41 (2.18) 5.36 (1.44) 3.44 (1.98)
EroticFemale 4210 8.25 (1.30) 7.80 (1.90) 3.13 (1.66) 4.31 (2.47)
EroticFemale 4220 7.81 (1.74) 6.64 (1.90) 5.61 (1.31) 4.00 (1.95)
EroticFemale 4225 7.57 (1.43) 6.94 (1.83) 5.15 (1.37) 4.40 (2.16)
EroticFemale 4250 8.39 (0.93) 7.02 (2.02) 5.18 (1.55) 3.31 (2.07)
EroticFemale 4311 7.56 (1.38) 7.35 (1.81) 5.89 (1.68) 6.08 (2.32)
EroticCouple 4659 7.70 (1.64) 7.43 (1.80) 6.15 (2.01) 6.47 (2.18)
EroticCouple 4660 7.63 (1.30) 6.92 (1.74) 7.22 (1.40) 6.31 (1.95)
EroticCouple 4680 7.73 (1.61) 5.94 (2.30) 6.91 (1.92) 6.07 (2.26)

Table B.3: Pleasant pictures for male subjects


69

Men Women
Description Slide No. Valence Arousal Valence Arousal
Mean (SD) Mean (SD) Mean (SD) Mean (SD)
Man 2190 4.73 (1.25) 2.27 (1.72) 4.90 (1.31) 2.50 (1.86)
Secretary 2383 4.62 (1.24) 3.49 (1.90) 4.79 (1.44) 3.36 (1.79)
Chess 2840 4.92 (1.79) 2.31 (1.88) 4.90 (1.23) 2.55 (1.76)
Mushroom 5500 5.49 (1.67) 2.82 (2.58) 5.34 (1.49) 3.18 (2.25)
RollingPin 7000 4.93 (0.35) 2.73 (1.86) 5.06 (1.10) 2.15 (1.70)
Plate 7233 5.01 (1.21) 2.51 (1.74) 5.15 (1.66) 2.96 (2.05)
Building 7491 4.87 (0.94) 2.60 (1.95) 4.79 (1.09) 2.24 (1.87)
Rain 9210 4.41 (1.85) 2.89 (2.05) 4.64 (1.82) 3.26 (2.20)
Farmer 2191 5.49 (1.49) 3.63 (2.10) 5.14 (1.71) 3.60 (2.17)
Tourist 2850 4.69 (1.40) 2.58 (1.79) 5.69 (1.22) 3.38 (2.01)
Mushroom 5530 5.33 (1.64) 2.87 (2.47) 5.44 (1.57) 2.87 (2.12)
Spoon 7004 4.89 (0.60) 2.09 (1.75) 5.14 (0.59) 1.94 (1.60)
NeutFace 2210 4.41 (1.33) 2.72 (1.92) 4.60 (0.98) 3.44 (1.74)
Factoryworker 2393 4.82 (1.08) 2.90 (1.80) 4.92 (1.05) 2.95 (1.95)
Mug 7009 4.96 (1.05) 2.69 (1.95) 4.89 (0.96) 3.26 (1.96)
Basket 7010 4.95 (1.43) 1.55 (1.36) 4.92 (0.48) 1.97 (1.58)
Fan 7020 5.02 (1.22) 2.15 (1.71) 4.94 (0.88) 2.19 (1.72)
Shipyard 7036 5.08 (1.02) 3.47 (2.09) 4.71 (1.10) 3.18 (1.98)
DustPan 7040 4.72 (1.19) 2.46 (1.86) 4.66 (1.00) 2.90 (1.99)
Baskets 7041 4.96 (1.14) 2.68 (1.76) 5.02 (1.11) 2.53 (1.79)
HairDryer 7050 4.81 (0.71) 2.59 (1.79) 5.04 (0.87) 2.90 (1.82)
Fork 7080 5.43 (1.26) 1.98 (1.63) 5.10 (0.88) 2.67 (1.99)
Book 7090 4.95 (1.54) 2.30 (1.90) 5.44 (1.35) 2.92 (2.15)
Umbrella 7150 4.76 (0.73) 2.66 (1.68) 4.69 (1.19) 2.56 (1.83)
Fabric 7160 4.98 (0.97) 3.06 (2.08) 5.05 (1.19) 3.08 (2.09)
Pole 7161 4.99 (0.86) 2.79 (1.81) 4.97 (1.16) 3.15 (2.14)
Lamp 7175 4.78 (1.18) 1.55 (0.96) 4.95 (0.80) 1.87 (1.48)
IroningBoard 7234 4.36 (1.41) 2.83 (1.79) 4.12 (1.73) 3.05 (1.99)
Building 7500 5.44 (1.36) 3.46 (2.23) 5.23 (1.50) 3.08 (2.15)
Tissue 7950 4.62 (1.26) 2.30 (1.89) 5.17 (1.12) 2.27 (1.77)

Table B.4: Neutral pictures


70 B. IAPS-picturesets

Men Women
Description Slide No. Valence Arousal Valence Arousal
Mean (SD) Mean (SD) Mean (SD) Mean (SD)
SadChildren 2703 2.33 (1.53) 5.73 (1.99) 1.59 (0.87) 5.81 (2.47)
SadChild 2800 2.31 (1.36) 4.94 (1.97) 1.41 (0.79) 5.87 (2.13)
Mutilation 3000 1.69 (1.47) 6.74 (2.37) 1.17 (0.54) 7.63 (2.11)
Mutilation 3010 2.19 (1.42) 7.12 (1.75) 1.29 (0.82) 7.44 (2.21)
Mutilation 3060 1.94 (1.39) 6.89 (2.08) 1.66 (1.71) 7.34 (2.10)
Mutilation 3064 1.78 (1.26) 5.44 (2.70) 1.15 (0.44) 7.30 (2.22)
Mutilation 3068 2.47 (1.92) 6.44 (2.46) 1.18 (0.70) 7.09 (2.49)
Mutilation 3069 2.10 (1.66) 6.70 (2.60) 1.32 (1.01) 7.33 (2.20)
Mutilation 3071 2.06 (1.59) 6.61 (2.13) 1.69 (1.14) 7.10 (1.95)
Mutilation 3080 1.63 (1.11) 6.84 (2.06) 1.33 (0.75) 7.61 (1.81)
BurnVictim 3100 1.88 (1.14) 5.88 (2.34) 1.35 (0.96) 7.02 (2.02)
BurnVictim 3110 2.10 (1.56) 6.43 (2.26) 1.47 (0.89) 6.98 (2.04)
DeadBody 3120 1.80 (1.32) 6.20 (2.55) 1.33 (0.74) 7.49 (1.96)
Mutilation 3130 1.90 (1.57) 6.56 (2.11) 1.26 (0.68) 7.39 (1.97)
BatteredFem 3180 2.27 (1.33) 5.17 (2.05) 1.67 (0.90) 6.19 (2.24)
Mutilation 3225 2.06 (1.24) 5.39 (2.41) 1.66 (1.20) 6.32 (2.43)
DyingMan 3230 2.44 (1.50) 5.00 (2.35) 1.67 (0.99) 5.75 (2.04)
Tumor 3261 1.98 (1.19) 5.51 (2.70) 1.70 (1.43) 5.92 (2.60)
Attack 3530 2.10 (1.53) 6.85 (2.13) 1.51 (1.00) 6.80 (2.07)
Soldier 6212 2.59 (1.47) 5.47 (2.44) 1.81 (1.41) 6.53 (2.35)
Attack 6313 2.43 (1.42) 6.54 (2.11) 1.61 (1.22) 7.27 (2.29)
Attack 6540 2.53 (1.84) 6.51 (2.27) 1.86 (1.14) 7.14 (1.98)
Attack 6560 2.57 (1.49) 6.17 (2.28) 1.78 (1.23) 6.86 (2.52)
StarvingChild 9040 1.88 (1.17) 5.10 (2.11) 1.50 (0.97) 6.44 (2.00)
Cow 9140 2.56 (1.42) 4.90 (2.29) 1.88 (1.26) 5.79 (2.04)
Cemetery 9220 2.27 (1.61) 3.83 (2.33) 1.86 (1.46) 4.16 (1.84)
Assault 9254 2.28 (1.51) 5.57 (2.45) 1.88 (1.22) 6.33 (2.26)
Soldier 9410 1.96 (1.56) 6.38 (2.26) 1.20 (0.58) 7.54 (1.78)
DeadMan 9433 2.39 (1.38) 5.00 (2.65) 1.35 (0.71) 6.71 (2.27)
Dog 9570 1.90 (1.40) 5.84 (2.41) 1.47 (1.00) 6.45 (2.19)

Table B.5: Unpleasant pictures


C. Experimental Instructions

As described in section 3.2.7, experimental instruction were handed out to the sub-
jects as a paper copy to ascertain that all subjects have the same information about
the experiment. The instruction is shown below.

Liebe Versuchsperson!

In diesem Experiment wird untersucht, wie Bilder mit unterschiedlichen


Inhalten von Ihnen wahrgenommen werden und wie sich diese Wahr-
nehmung auswirkt. Da jede Bewegung während der Aufzeichnung zu
einer Verfälschung der Daten führt, sollten Sie während des Experiments
versuchen, sich möglichst wenig zu bewegen. Neben Körperbewegungen
ist es auch wichtig Augenbewegungen zu vermeiden.

Vor Beginn des Experiments gibt es zunächst einen Übungsdurchgang,


der dazu dient, dass Sie sich mit dem Ablauf des nachfolgenden Teils
vertraut machen können. Anschließend folgen zwei Präsentationsblöcke,
die jeweils ca. 20 Minuten dauern werden.
72 C. Experimental Instructions

Im Rahmen des Experiments werden verschiedene Bilder dargeboten wer-


den, von denen auch einige unangenehm bzw. hässlich sein können. Im-
mer dann, wenn Sie auf dem Bildschirm ein Bild sehen, werden Daten
aufgezeichnet. Bitte vermeiden Sie es in dieser Phase, die Augen zu be-
wegen oder zu blinzeln. Vor jedem Bild erscheint ein Kreuz in der Mitte
des Bildschirms. Bitte richten Sie Ihren Blick auf das Kreuz und ver-
suchen Sie Blinzeln und Augenbewegungen zu vermeiden. Direkt im An-
schluss daran folgt die Präsentation eines Bildes. Bitte lassen Sie dieses
Bild als Ganzes möglichst intensiv auf sich wirken und versuchen Sie sich
in das Dargestellte hinein zu versetzen. Nachdem das Bild verschwun-
den ist, erscheint für etwa 20 Sekunden Zeit ein grauer Balken auf dem
Bildschirm. Diese Zeit können Sie zum Blinzeln nutzen. Anschließend
erscheint erneut ein Kreuz, das nächste Bild, der nächste Balken und so
weiter.

Sollten Sie noch Fragen zu dem Experiment haben, wenden Sie sich bitte
an die Versuchsleiterin.

Vielen Dank für Ihre Mithilfe bei diesem Experiment!


D. Data

In this part of the appendix we give a detailed report of the recognition rates from
the experiments in section 5. Best recognition rates are marked bold.

D.1 Data for section 5.2


Avg Size Ratio Window Mean Recogni- Standard De-
Size [sec] tion Rate [%] viation [%]
1 0.50 8.0 36.30 6.54
1 0.25 4.0 37.14 5.14
2 0.50 4.0 38.89 5.55
3 0.75 4.0 37.33 6.10
1 0.13 2.0 35.64 3.60
2 0.25 2.0 36.40 4.28
3 0.38 2.0 36.77 4.90
4 0.50 2.0 37.11 5.80
5 0.63 2.0 37.70 6.49
6 0.75 2.0 37.41 6.79
7 0.88 2.0 35.04 5.92
1 0.06 1.0 34.67 3.00
2 0.13 1.0 35.61 3.14
3 0.19 1.0 35.97 3.34
4 0.25 1.0 36.44 3.40
5 0.31 1.0 37.27 3.51
6 0.38 1.0 37.30 3.87
7 0.44 1.0 37.50 4.28
8 0.50 1.0 37.30 4.15
9 0.56 1.0 37.21 4.09
10 0.63 1.0 37.44 3.90
11 0.69 1.0 37.76 4.77
12 0.75 1.0 37.81 5.15
13 0.81 1.0 37.56 4.99
74 D. Data

Avg Size Ratio Window Mean Recogni- Standard De-


Size [sec] tion Rate [%] viation [%]
14 0.88 1.0 36.78 4.23
15 0.94 1.0 35.19 4.61
1 0.03 0.5 33.95 2.58
2 0.06 0.5 34.24 2.62
3 0.09 0.5 34.26 2.94
4 0.13 0.5 34.42 3.10
5 0.16 0.5 34.31 3.35
6 0.19 0.5 34.39 3.82
7 0.22 0.5 34.37 4.10
8 0.25 0.5 34.37 4.15
9 0.28 0.5 34.73 4.49
10 0.31 0.5 34.54 4.77
11 0.34 0.5 34.56 4.57
12 0.38 0.5 34.66 4.82
13 0.41 0.5 34.31 4.95
14 0.44 0.5 34.60 4.78
15 0.47 0.5 34.76 4.96
16 0.50 0.5 34.90 4.88
17 0.53 0.5 35.17 4.99
18 0.56 0.5 34.84 5.30
19 0.59 0.5 34.48 5.53
20 0.63 0.5 34.32 5.21
21 0.66 0.5 34.48 5.15
22 0.69 0.5 34.45 5.13
23 0.72 0.5 34.32 4.76
24 0.75 0.5 34.24 5.02
25 0.78 0.5 34.49 4.74
26 0.81 0.5 34.74 4.73
27 0.84 0.5 34.50 5.82
28 0.88 0.5 34.67 6.42
29 0.91 0.5 35.04 6.87
30 0.94 0.5 34.78 6.03
31 0.97 0.5 35.48 6.58
Table D.1: Mean recognition rates for variation of win-
dow size and averaging over adjacent feature vectors
D.1. Data for section 5.2 75

Frequency EEG Band Mean Recogni- Standard Devi-


Band [Hz] tion Rate [%] ation [%]
5-7 θ 38.89 5.93
5 - 13 α, θ 46.52 6.22
5 - 30 α, β, θ 55.33 5.09
5 - 45 α, β, γ, θ 57.70 7.05
8 - 13 α 45.37 4.62
8 - 30 α, β 55.00 4.40
8 - 45 α, β, γ 58.19 6.39
14 - 30 β 52.81 6.01
14 - 45 β, γ 55.11 4.62
30 - 45 γ 51.89 7.94
0 - 150 all 57.67 6.20

Table D.2: Mean recognition rates for variation of frequency band

Average Size Mean Recogni- Standard Devi-


tion Rate [%] ation [%]
1 39.19 5.71
2 38.81 5.07
4 39.41 5.33
8 38.56 4.58
16 37.96 5.86
32 37.15 3.89
64 36.44 3.76
128 36.52 4.32

Table D.3: Mean recognition rates for variation of average size


76 D. Data

Number of dimensions Mean Recogni- Standard Devi-


tion Rate [%] ation [%]
1 43.30 2.98
2 43.07 3.92
4 46.67 3.81
8 47.48 4.63
16 49.59 5.26
32 50.15 4.64
64 51.07 5.81
128 52.74 6.73
256 52.44 5.37
512 49.96 5.83
1024 46.41 5.81
2368 39.19 5.71

Table D.4: Mean recognition rate subject to number of dimensions after correlation-
based feature reduction
D.2. Data for section 5.3 77

D.2 Data for section 5.3


Frequency EEG Band Mean Recogni- Standard Devi-
Band [Hz] tion Rate [%] ation [%]
5 - 13 α, θ 36.00 4.24
5 - 30 α, β, θ 37.11 5.33
5 - 45 α, β, γ, θ 39.93 6.89
8 - 13 α 37.11 4.20
8 - 30 α, β 37.93 4.77
8 - 45 α, β, γ 36.67 6.65
no filter 33.85 4.24

Table D.5: Mean recognition rates for variation of frequency band

Number of dimensions Mean Recogni- Standard Devi-


tion Rate [%] ation [%]
1 32.67 2.81
2 32.74 3.38
4 34.52 5.47
8 36.44 5.56
16 39.78 5.76
32 39.26 7.47
35 39.93 6.89
64 37.78 4.95
128 36.59 6.01

Table D.6: Mean recognition rate subject to number of dimensions after LDA
78 D. Data

Number of HMM states Mean Recogni- Standard Devi-


tion Rate [%] ation [%]
1 35.19 8.90
2 38.81 5.52
3 37.26 5.57
4 37.48 6.31
5 39.93 6.89
6 37.33 6.70
7 36.81 5.86
8 38.15 7.27
9 37.04 5.80
10 36.67 6.40
11 37.26 4.62
12 36.59 5.08
13 36.07 4.91
14 37.85 5.25
15 37.41 4.80
16 38.30 6.31
17 37.70 5.37
18 34.22 6.58
19 37.11 5.66
20 36.96 5.43

Table D.7: Mean recognition rate subject to number of HMM-states

Number of subjects
Number of HMM states relative [%] absolute
1 0.07 1
2 0.13 2
5 0.27 4
8 0.20 3
12 0.07 1
16 0.20 3
20 0.07 1

Table D.8: Optimal number of HMM states


D.3. Data for section 5.4 79

D.3 Data for section 5.4


Number of dimensions Mean Recogni- Standard Devi-
tion Rate [%] ation [%]
1 41.14 3.75
2 42.22 3.36
4 44.14 3.74
8 44.72 3.70
16 45.69 3.97
32 45.61 5.58
64 46.00 5.63
128 46.28 5.30
256 43.89 5.37
512 37.92 4.24
592 36.42 4.57

Table D.9: Mean recognition rate depending on number of dimensions after feature
reduction for frontal electrodes

Number of dimensions Mean Recogni- Standard Devi-


tion Rate [%] ation [%]
1 41.96 3.46
2 42.22 3.85
4 43.67 3.93
8 43.44 4.19
16 42.93 6.00
32 44.70 4.56
64 44.93 4.44
128 45.11 5.07
256 40.93 4.33
444 36.74 5.80

Table D.10: Mean recognition rate depending on number of dimensions after feature
reduction for midline electrodes
80 D. Data

D.4 Data for section 5.5


Second Mean Recogni- Standard Devi-
tion Rate [%] ation [%]
1 47.85 4.41
2 48.89 6.36
3 47.48 3.79
4 47.56 5.48
5 52.22 3.28
6 48.67 5.28
7 48.74 3.94
8 49.26 5.21

Table D.11: Mean recognition rate for each time segment


Bibliography

Andreassi, J. (2000). Psychophysiology: Human Behavior & Physiological Response,


(fourth edition ed.). Lawrence Erlbaum Associates.

Anttonen, J. & Surakka, V. (2005). Emotions and Heart Rate while Sitting on a
Chair. In: CHI ’05: Proceedings of the SIGCHI conference on Human factors in
computing systems, pages 491–499. ACM.

Ax, A. F. (1953). The Physiological Differentiation between Fear and Anger in


Humans. Psychosomatic Medicine, 15:433–442.

Axelrod, L. (2004). The Affective Connection: How and When Users Communicate
Emotion. In: CHI Extended Abstracts, pages 1033–1034.

Badzinski, D. M. (1991). Children’s Cognitive Representations of Discourse: Effects


of Vocal Cues on Text Comprehension. Communication Research, 18:715–736.
influence of emotional voice to children.

Baumgartner, T., Esslen, M., & Jancke, L. (2006). From Emotion Perception to
Emotion Experience: Emotions Evoked by Pictures and Classical Music. Inter-
national Journal of Psychophysiology, 60:34 –43.
TM
Becker, K. (2003). VarioPort . http://www.becker-meditec.com/.

Beedie, C., Terry, P., & Lane, A. (2005). Distinctions between Emotion and Mood.
Cognition and Emotion, 19:847–878.

Bradley, M. M. & Lang, P. J. (1994). Measuring Emotion: The Self-Assessment


Manikin and the Semantic Differential. Journal of Behavior Therapy and Exper-
imental Psychiatry, 25:49–59.

Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recogni-


tion. Data Mining and Knowledge Discovery, 2:121 – 167.

Cacioppo, J. T., Losch, M. L., Tassinary, L. G., & Petty, R. E. (1986). The Role
of Affect in Consumer Behavior: Emerging Theories and Applications, chapter
Properties of affect and affect-laden information processing as viewed through the
facial response system, pages 87–118. Lexington, MA: D. C. Heath. EMG.

Cannon, W. B. (1927). The James-Lange theory of emotion: A critical examination


and an alternative theory. American Journal of Psychology, 39:10–124.

Carlson, N. R. (2007). Physiology of Behavior. Allyn & Bacon.


82 Bibliography

Chang, C.-C. & Lin, C.-J. (2001). LIBSVM: a Library for Support Vector Machines.
Software available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm.

Cowie, R., Douglas-Cowie, E., Apolloni, B., Taylor, J., Romano, A., & Fellenz, W.
(1999). What a Neural Net needs to know about Emotion Words. In: CSCC’99
Proceedings, pages 5311–5316.

Davidson, R. J. (1992). Anterior Cerebral Asymmetry and the Nature of Emotion.


Brain and Cognition, 20(1):125 – 151.

Davidson, R. J., Ekman, P., Sarona, C. D., Senulis, J. A., & Friesen, W. V. (1990).
Approach / Withdrawal and Cerebral Asymmetry: Emotional Expression and
Brain Physiology. Journal of Personality and Social Psychology, 58(2):330 – 341.

Dellaert, F. (2002). The Expectation Maximization Algorithm. Technical report,


College of Computing, Georgia Institute of Technology.

Downey, G., Mougios, V., Ayduk, O., London, B. E., & Shoda, Y. (2004). Rejection
Sensitivity and the Defensive Motivational System: Insights From the Startle
Response to Rejection Cues. Psychological Science, 15(10):668–673.

Dryer, D. & Horowitz, L. (1997). When Do Opposites Attract? Interpersonal Com-


plementary versus Similarity. Journal of Personality and Social Psychology, 72:592
– 603.

Ekman, P. (1992). An Argument for Basic Emotions. Cognition and Emotion,


6:169–200.

Ekman, P. (1993). Facial Expression and Emotion. American Psychologist, 48:384–


392.

Ekman, P., Campos, J., R.J., D., & De Waals, F. (2003). Emotions Inside Out:
130 Years After Darwin’s the Expression of the Emotions in Man and Animals,
chapter Darwin, Deception, and Facial Expression, pages 205 –221. New York
Academy of Sciences.

Ekman, P., Levenson, R., & Friesen, W. (1983). Autonomic Nervous System Activity
Distinguishes among Emotions. Science, 221:1208 – 1210.

Frijda, N. (1986). The Emotions. New York: Cambridge University Press.

Gerrards-Hesse, A., Spies, K., & Hesse, F. W. (1994). Experimental Inductions of


Emotional Sstates and their Effectiveness: A Review. British Journal of Psychol-
ogy, 85(1):55–78.

Haag, A., Goronzy, S., Schaich, P., & Williams, J. (2004). Emotion Recognition
Using Bio-Sensors: First Steps Towards an Automatic System. Lecture Notes in
Computer Science, 3068:33–48.

Hamm, A. O. & Vaitl, D. (1993). Emotionsinduktion durch visuelle Reize: Va-


lidierung einer Stimulationsmethode auf drei Reaktionsebenen. Psychologische
Rundschau, 44:143–161.
Bibliography 83

Healey, J., Picard, R., & Dabek, F. (1998). A New Affect-Perceiving Interface and
Its Application to Personalized Music Selection. Proceedings of the 1998 Workshop
on Perceptual User Interfaces.
Honal, M. (2005). Determining User State and Mental Task Demand from Elec-
troencephalographic Data, Diplomarbeit, Universität Karlsruhe (TH), Karlsruhe,
Germany.
Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector
classification. Technical report, Department of Computer Science. Last updated:
May 21, 2008.
Izard, C. E. (1994). Die Emotionen des Menschen, (3. aufl. ed.). Beltz, Psychologie-
Verl.-Union.
James, W. (1950). The Principles of Psychology, Vol. 1, (reprint edition ed.). Dover
Publications.
Jasper, H. H. (1958). The Ten-Twenty Electrode System of the International Fed-
eration in Electroencephalography and Clinical Neurophysiology. EEG Journal,
10:371–375.
Kim, K. H., Bang, S. W., & Kim, S. R. (2004). Emotion Recognition System using
Short-term Monitoring of Physiological Signals. Medical and Biological Engineer-
ing and Computing, 42:419–427.
Klein, J., Moon, Y., & Picard, R. W. (2002). This Computer Responds to User Frus-
tration - Theory, Design, Results and Implications. Interacting with Computers,
14:119–140.
Kleinginna, Paul R., J. & Kleinginna, A. M. (1981). A Categorized List of Emo-
tion Definitions, with Suggestions for a Consensual Definition. Motivation and
Emotion, 5(4):345–379.
Lang, P., Bradley, M., & Cuthbert, B. (1997). International Affective Picture System
(IAPS): Technical Manual and Affective Ratings.
Lang, P., Bradley, M., & Cuthbert, B. (2005). International Affective Picture System
(IAPS): Technical Manual and Affective Ratings. Technical report, Gainesville,
Fl: NIMH Center for the Study of Emotion and Attention (CSEA), University of
Florida.
Lang, P. J. (1995). The Emotion Probe: Studies of Motivation and Attention.
American Psychologist, 50:372–385.
Leng, H., Lin, Y., & Zanzi, L. A. (2007). An Experimental Study on Physiological
Parameters Toward Driver Emotion Recognition. Lecture Notes in Computer
Science, 4566:237–246.
Levenson, R. W. (1992). Autonomic Nervous System Differences among Emotions.
Psychological Science, 3(1):23–27.
Levenson, R. W. (1999). The Intrapersonal Functions of Emotion. Cognition and
Emotion, 13(5):481–504.
84 Bibliography

Levenson, R. W., Ekman, P., & Friesen, W. V. (1990). Voluntary Facial Action Gen-
erates Emotion-specific Autonomic Nervous System Activity. Psychophysiology,
27(4):363–384.
Luria, A. R. (1973). The Working Brain. New York: Basic Books.
Mayer, C. (2005). UKA EMG/EEG Studio v2.0.
McFarland, R. A. (1985). Relationship of Skin Temperature Changes to the Emo-
tions Accompanying Music. Applied Psychophysiology and Biofeedback, 10:255–
267.
Mowrer, O. (1960). Learning Theory and Behavior. New York: Wiley.
Murphy, F., Nimmo-Smith, I., & Lawrence, A. (2003). Functional Neuroanatomy of
Emotions: A Meta-Analysis. Cognitive, Affective, and Behavioral Neuroscience,
3(3):207 – 233.
Nass, C., Fogg, B. J., & Moon, Y. (1996). Can Computers be Teammates? Inter-
national Journal of Human-Computer Studies, 49(6):669–678.
Nass, C. & Moon, Y. (2000). Machines and Mindlessness: Social Responses to
Computers. Journal of Social Issues, 56(1):81 – 103.
Nass, C., Steuer, J., & Tauber, E. R. (1994). Computers are Social Actors. In:
CHI ’94: Proceedings of the SIGCHI conference on Human factors in computing
systems, pages 72–78. ACM.
Ortony, A. & Turner, T. J. (1990). What’s Basic about Basic Emotions? Psycho-
logical Review, 97(3):315–331.
Osgood, C. E. (1952). The Nature and Measurement of Meaning. Psychological
Bulletin, 49:197–237.
Picard, R. (1995). Affective Computing. Technical Report 321, MIT Media Labo-
ratory, Perceptual Computing Section.
Picard, R. W. (1997). Affective Computing. The MIT Press.
Picard, R. W. & Healey, J. (1997). Affective Wearables. In: ISWC, pages 90–97.
Picard, R. W., Vyzas, E., & Healey, J. (2001). Toward Machine Emotional Intelli-
gence: Analysis of Affective Physiological State. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 23:1175 – 1191.
Porbadnigk, A. K. (2008). EEG-based Speech Recognition: Impact of Experimental
Design on Performance, Studienarbeit, Universität Karlsruhe (TH), Karlsruhe,
Germany.
Rabiner, L. R. (1989). A Tutorial on Hidden Markov Models and Selected Applica-
tions in Speech Recognition. Proceedings of the IEEE, 77(2):257–286.
Reeves, B. & Nass, C. (1995). The Media Equation: How People Treat Computers,
Televisions, and New Media as Real People and Places. Cambridge University
Press.
Bibliography 85

Reynolds, C. & Picard, R. (2004). Affective Sensors, Privacy, and Ethical Contracts.
In: CHI ’04: CHI ’04 extended abstracts on Human factors in computing systems,
pages 1103–1106. ACM.

Russel, J. A. (1979). Affective Space is Bipolar. Journal of Personality and Social


Psychology, 37:345–356.

Russel, J. A. (1980). A Circumplex Model of Affect. Journal of Personality and


Social Psychology, 39:1161–1178.

Russel, J. A. & Mehrabian, A. (1977). Evidence for a Three-Factor Theory of


Emotions. Journal of Research in Personality, 11:273–294.

Russell, J. & Fehr, B. (1984). Concept of Emotion Viewed from a Prototype Per-
spective. Journal of Experimental Psychology: General, 113:464–486.

Schaaff, K. (2008). Challenges on Emotion Induction with the International Affective


Picture System, Studienarbeit, Universität Karlsruhe (TH), Karlsruhe, Germany.

Schandry, R. (1989). Lehrbuch der Psychophysiologie, (2., überarb. u. erw. aufl. ed.).
Psychologie-Verl.-Union.

Schmidt-Atzert, L. (1981). Emotionspsychologie. Kohlhammer.

Schwartz, G., Fair, P., P. Salt, M. M., & Klerman, G. (1976). Facial Muscle Pat-
terning to Affective Imagery in Depressed and Nondepressed Subjects. Science,
192:489–491.

Selesnick, I. W., Baraniuk, R. G., & Kingsbury, N. C. (2005). The dual-tree complex
wavelet transform. Signal Processing Magazine, IEEE, 22(6):123–151.

Shaver, P., Schwartz, J., Kirson, D., & O’Connor, C. (1987). Emotion Knowledge:
Further Exploration of a Prototype Approach. Journal of Personality and Social
Psychology, 52:1061–1086.

Silbernagl, S. & Despopoulos, A. (2001). Taschenatlas der Physiologie, (5., komplett


überarb. u. neu gestaltete aufl. ed.). Thieme.

Sobotka, S. S., Davidson, R. J., & Senulis, J. A. (1997). Anterior Brain Electrical
Asymmetries in Response to Reward and Punishment. Electroencephalography
and Clinical Neurophysiology, 83(4):236 – 247.

Spencer, H. (1890). The Principles of Psychology, volume Vol. 1. New York: Ap-
pleton.

Stickel, C., Fink, J., & Holzinger, A. (2007). Enhancing Universal Access - EEG
Based Learnability Assessment. Lecture Notes in Computer Science, 4556:813–
822.

Takahashi, K. (2004). Remarks on SVM-Based Emotion Recognition from Multi-


Modal Bio-Potential Signals. Robot and Human Interactive Communication, 2004.
ROMAN 2004. 13th IEEE International Workshop on, pages 95–100.
86 Bibliography

Takahashi, K. & Tsukaguchi, A. (2003). Remarks on Emotion Recognition from


Multi-Modal Bio-Potential Signals. Systems, Man and Cybernetics, 2003. IEEE
International Conference on, 2:1654–1659.

Tassinary, L. G. & Cacioppo, J. T. (1992). Unobservable Facial Actions and Emo-


tion. Psychological Science, 3(1):28– 33.

Trimmel, M. (1990). Angewandte und experimentelle Neuropsychophysiologie.


Springer.

Velten, E. (1968). A laboratory Task for Induction of Mood States. Behaviour


Research and Therapy, 6:473–482.

Vrana, S. R., Cuthbert, B. N., & Lang, P. J. (1986). Fear Imagery and Text Pro-
cessing. Psychophysiology, 23:247 – 253.

Wand, M. (2007). Wavelet-based Preprocessing of Electroencephalographic and


Electromyographic Signals for Speech Recognition, Studienarbeit, Universität
Karlsruhe (TH), Karlsruhe, Germany.

Wester, M. (2006). Unspoken Speech - Speech Recognition Based On Electroen-


cephalography, Diplomarbeit, Universität Karlsruhe (TH), Karlsruhe, Germany.

Westermann, R., Spies, K., Stahl, G., & Hesse, F. W. (1996). Relative Effectiveness
and Validity of Mood Induction Procedures: a Metaanalysis. European Journal
of Social Psychology, 26:557–580.

Winton, W. M., Putnam, L. E., & M., K. R. (1984). Facial and Autonomic Manifes-
tations of the Dimensional Structure of Emotion. Journal of Experimental Social
Psychology, 20:195–216.

Wundt, W. M. (1896). Grundriss der Psychologie. Wilhelm Engelmann.

Yerkes, R. & Dodson, J. (1908). The Relation of Strength of Stimulus to the Rapidity
of Habit Formation. Journal of Comparative Neurology and Psychology, 18:459–
482.

You might also like