Professional Documents
Culture Documents
4
4
von
cand. inform.
Kristina Schaaff
Matrikelnr.: 1323079
Betreuer:
Prof. Dr. Tanja Schultz
Dipl.-Math. Michael Wand
Abstract
First of all, I would like to thank my advisor, Prof. Dr. Tanja Schultz for giving
me the opportunity to work on this interesting field of research and for her constant
guidance and all the valuable discussions about my work. I also want to express my
gratitude to Michael Wand and Matthias Honal, who repeatedly provided helpful
advice about the technical details of this work.
Thanks a lot to all the people who volunteered for the experiments. Without the
data collected from these experiments, this work would not have been possible.
Special thanks to my family and my boyfriend for standing my frustration and
cheering me up when I needed it.
viii
Contents
1 Introduction 1
1.1 Goals of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Structure of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3 Data Collection 27
3.1 Preceding Considerations . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.1 Selection of Emotions . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.2 Emotion Induction . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.1 Hardware Design . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.2 Software Design . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.2.1 Recording Software . . . . . . . . . . . . . . . . . . . 29
3.2.2.2 Recognition Software . . . . . . . . . . . . . . . . . . 30
3.2.3 Electrode Placement . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.4 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.5 Stimulus Material for Emotion Induction . . . . . . . . . . . . 30
3.2.6 Picture Presentation . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.7 Experimental Procedure . . . . . . . . . . . . . . . . . . . . . 32
3.3 Ethical Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5 Summary of Collected Data . . . . . . . . . . . . . . . . . . . . . . . 34
4 Methods 35
4.1 Classification with Support Vector Machines . . . . . . . . . . . . . . 35
4.1.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.1.1 Feature Extraction . . . . . . . . . . . . . . . . . . . 35
4.1.1.2 Obtaining Feature Vectors . . . . . . . . . . . . . . . 36
4.1.1.3 Dimensionality Reduction . . . . . . . . . . . . . . . 37
4.1.2 Training and Classification . . . . . . . . . . . . . . . . . . . . 37
4.2 Sequential Recognition System . . . . . . . . . . . . . . . . . . . . . . 39
4.2.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2.2 Training and Classification . . . . . . . . . . . . . . . . . . . . 40
A Prototype Model 65
B IAPS-picturesets 67
C Experimental Instructions 71
D Data 73
D.1 Data for section 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
D.2 Data for section 5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
D.3 Data for section 5.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
D.4 Data for section 5.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Bibliography 81
List of Figures
5.1 Mean recognition rates subject to window size and average size . . . . 44
5.2 Mean recognition rate subject to frequency band (whiskers indicate
standard deviation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.3 Mean recognition rate subject to average size (whiskers indicate stan-
dard deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4 Mean recognition rate subject to dimensionality (whiskers indicate
standard deviation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
xiv List of Figures
5.5 Mean recognition rates for different frequency bands (whiskers indi-
cate standard deviation) . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.6 Recognition rate depending on number of dimensions after LDA . . . 52
5.7 Recognition rate subject to number of HMM states (whiskers indicate
standard deviation). Bars show the relative portion of subjects whose
recognition rate was best at this number of HMM states. . . . . . . . 52
5.8 Recognition rate depending on number of dimensions after feature
reduction using the frontal electrodes. (whiskers indicate standard
deviation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.9 Recognition rate depending on number of dimensions after feature
reduction using only midline electrodes. (whiskers indicate standard
deviation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.10 Recognition rate for each time segment (whiskers indicate standard
deviation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
TM
3.1 Technical specifications of Varioport EEG amplifier . . . . . . . . . 29
3.2 Overview about the subjects . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Characteristics of IAPS pictures used for emotion induction . . . . . 32
D.1 Mean recognition rates for variation of window size and averaging
over adjacent feature vectors . . . . . . . . . . . . . . . . . . . . . . . 74
D.2 Mean recognition rates for variation of frequency band . . . . . . . . 75
D.3 Mean recognition rates for variation of average size . . . . . . . . . . 75
D.4 Mean recognition rate subject to number of dimensions after correlation-
based feature reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 76
D.5 Mean recognition rates for variation of frequency band . . . . . . . . 77
D.6 Mean recognition rate subject to number of dimensions after LDA . . 77
D.7 Mean recognition rate subject to number of HMM-states . . . . . . . 78
D.8 Optimal number of HMM states . . . . . . . . . . . . . . . . . . . . . 78
D.9 Mean recognition rate depending on number of dimensions after fea-
ture reduction for frontal electrodes . . . . . . . . . . . . . . . . . . . 79
D.10 Mean recognition rate depending on number of dimensions after fea-
ture reduction for midline electrodes . . . . . . . . . . . . . . . . . . 79
D.11 Mean recognition rate for each time segment . . . . . . . . . . . . . . 80
1. Introduction
”Look Dave, I can see you’re really upset about this. I honestly think you
ought to sit down calmly, take a stress pill, and think things over.”
When Stanley Kubrick made his film ’2001 - a space odyssey’ in 1968, he tried to
draw a picture of the year 2001 which is as close to reality as possible. In his film
one of the protagonists is the board computer HAL. Besides many other abilities
this computer is able to recognize emotions.
Contrary to Kubrick’s expectations, the research in developing computers, which
are able to recognize emotions, is still at the beginning. The reason why there is
so little activity in emotion research can be seen in the problems with defining and
measuring emotions. It was not until the last two decades that scientists started
to realize the importance of emotion in human-computer interaction. Since then
research in affective computing - which means computing, that ”relates to, arises
from, or deliberately influences emotions” (Picard and Healey, 1997) - has become
more and more important.
Many people have problems with the logical and rational way in which computers
react. As the importance of computers in everyday life is constantly increasing, it is
necessary to improve human-computer interaction. One crucial step is to develop a
more natural way of communication. Therefore, computers have to learn to recognize
and react to human emotions.
Emotions are expressed through posture and facial expression as well as through
internal processes, such as heart rate and blood pressure or brain activity. Moreover,
emotions can also be expressed in speech, e.g. by raising the voice. There are a lot of
different ways to recognize emotions. Typical communication channels that indicate
emotions are voice and facial expressions. One problem in using face and speech
recognition is that an emotion can only be detected when a person is speaking or
looking in the direction of the camera. Moreover, speech recognition returns only
a string of spoken words which often does not say anything about the emotional
state of the speaker. In addition, face expression and speech can be intentionally
influenced to ’fake’ an emotion. For this reason it can be helpful to use physiological
2 1. Introduction
data like brain waves, skin conductance, heart rate, etc. Using physiological signals
for emotion recognition provides a lot of advantages:
• As sensors are attached directly to the body, a person cannot move out of
reach from a camera or a microphone that is placed in a room
• Biosignals are controlled by the central nervous system and therefore cannot
be influenced intentionally
2.1 Motivation
In the following section we try to demonstrate the importance of emotions for
human-computer interaction. Section 2.1.1 first outlines the advantages of emo-
tional human-computer interaction. Next, section 2.1.2 provides some examples
for affective applications (i.e. applications that recognize and respond to a user’s
emotion).
and Horowitz (1997) showed that people interacting with partners that are similar
or complementary to themselves perform better when working in the same team.
Therefore, a computer that adapts to a user’s emotional needs should also help to
increase the user’s performance.
As an example, in a study about the effectiveness of an agent that responds to
user frustration, Klein et al. (2002) found that computers are able to reduce strong
negative emotions - even if they are the source of these emotions - by responding in
an understanding way to the user’s emotions. For this purpose, computers do not
even need to be personified characters or use advanced interaction techniques such
as speech. In addition, they found that applications that simply let people vent
their anger, do not help to recover from negative emotional states.
Axelrod (2004) showed that there are also significant differences in human behavior
when people interact with an affective system compared to a normal computer, as
people tend to act more emotional when they believe to be interacting with an
affective system.
It strongly depends on the domain of application how these questions are answered
in a specific case.
In the following we illustrate various applications for affective computers as suggested
by Picard (1997).
For instance, frustration (and therefore a higher arousal) can be avoided by making
a quiz easier if the computer notices that the user is overstrained. Similarly, if the
computer notices a decrease in arousal from the optimum, a quiz could be made
more challenging.
2.1.2.3 Gaming and Entertainment
Gaming and entertainment offer a wide range of applications for computers which
are able to deal with emotions.
6 2. Emotion and Affective Computing
Healey et al. (1998) proposed the concept of an affective DJ. For this reason, they
designed a wearable computer, that gets to know a person’s preferences by recog-
nizing and responding to its emotional signals. This information is used to help the
person with the music selection according to the present mood.
In most computer games success is only based on the actions that are taken not
on how they are performed. Physiological signals could be used to adapt the game
to the current emotional state of a user. For instance, a calmer behavior might
be rewarded by introducing a new companion or a brave action while somebody is
highly aroused might lead to some extra points.
2.2 Definitions
In the following section we give an introduction to the concept of affective computing.
Furthermore, we present different approaches of emotion definitions.
Table 2.1: Four categories of affective computing, focusing on expression and recog-
nition (Picard, 1995)
Picard (1995) differs between four relevant cases of affective computing which are
summarized in Table 2.1.
Most computers fall in category (I) having no affect perception and expression at all.
Out of the three affective categories, category (II) is probably the one with the most
advanced technology: computers that have voices with natural intonation or faces
with natural expressions fall into this category. Category (III) enables a computer to
perceive a person’s affective state and to adjust its response to this information. The
last category provides truly ’personal’ and ’user friendly’ computing by maximizing
the emotional communication between humans and computers. It is important to
note that this does not mean that the computer would be driven by its emotions.
In this thesis we focus on emotion recognition, but do not develop a system that is
able to express emotions. The research done in this thesis will therefore confine to
the third category.
2.2.2 Emotions
”Everyone knows what an emotion is, until asked to give a definition.”
(Russell and Fehr, 1984)
Although emotions are very central in human behavior and communication, there
is still no commonly accepted definition of the term emotion. In fact, there are
many different approaches to find an appropriate definition which are quite often
contradictory. Schmidt-Atzert (1981) names two reasons why it is so hard to find
an exact definition for emotion. First, the term has been applied to many different
phenomena in varying contexts which are only connected by the term ’emotion’.
Secondly, it is hard to distinguish emotional events against non-emotional events as
a certain arousal might be considered as emotional or non-emotional.
In 1981 Kleinginna and Kleinginna analyzed and classified 92 definitions and nine
skeptical statements about emotions and came to the conclusion that there is little
consistency among definitions and many definitions are too vague. They proposed
the following definition trying to emphasize the manifold possible aspects of emotion:
This quite generic definition tries to merge all significant aspects of emotion, al-
though these aspects are often contradictory. Therefore this definition presents
rather a cross-section of all opinions than an actual version of a working definition.
After a comparison and differentiation of the two terms emotion and mood, this
chapter outlines the most important concepts of emotion starting with the com-
ponent model presented by Schmidt-Atzert (1981). Subsequently, two groups of
emotion definitions are compared: categorical models (also known as discrete mod-
els) of emotions and dimensional models. The first group defines emotions as a set of
discrete categories while the latter tries to treat emotions as continuous dimensions.
This comparison indicates that although there are no quantitative boundaries be-
tween emotions and moods (e.g. there is no exact duration defined which discrim-
inates a mood from an emotion), it can easily be distinguished between those two
affective phenomena. The rest of this thesis will solely deal with emotions.
1. the subjectively felt emotion, which refers to states that are named as emotions
by a person itself,
2. the emotional physiological reactions in the brain and the nervous system that
can be attributed to emotional stimuli, and
The subjectively felt emotion can be identified by self descriptions while emotional
physiological reactions can be measured by biophysiological signals such as elec-
troencephalography (EEG), electromyography (EMG) or skin conductance. More-
over, the emotional behavior can be identified e.g. from gesture or face recognition.
The three components influence each other, e.g. laughter can not only be observed
with the EMG of zygomatic major (physiological reaction) but also from the facial
expression (emotional behavior) and self reports (subjective experienced).
3. Distinctive physiology
6. Quick onset
7. Brief duration
8. Automatic appraisal
9. Unbidden occurrence
10 2. Emotion and Affective Computing
Despite this, there is still little agreement among scientists about how many, which,
and why emotions are basic. Different theorists consider different emotions to be
basic, but they all share the idea that there are emotions which are more basic than
other emotions. For example Mowrer (1960) considers only pleasure and pain to be
basic whereas Frijda (1986) identifies 18 basic emotions. A summary of proposals
of a representative set of emotion theorists who hold (or held) some sort of basic-
emotion position can be found at Ortony and Turner (1990).
Besides the basic emotion approach there is the prototype approach which also agrees
- similar to the basic emotion approach - that there is a set of emotions which are
more basic than others. However, these basic emotions are seen as prototypes from
which other emotions can be derived. This leads to a treelike hierarchical structure
of emotions like the one shown in Appendix A. A detailed description about the
prototype approach be found in Russell and Fehr (1984) and Shaver et al. (1987).
Although the categorical model of emotion is supported by many scientists, there are
some limitations that have to be considered. At first, there is no exact information
about the number of basic emotions and the notations of the model are often am-
biguous. Moreover, the denotations of the different emotions provide a large scope
for ambiguity. Finally the tight categorization of emotions does not represent the
reality.
• Arousal refers to the quantitative activation level (ranging from calm to ex-
cited).
• Dominance relates to the degree of control a person feels to have over a situ-
ation (ranging from weak to strong).
Out of these, arousal and valence are the most frequently used dimensions. The deci-
sion for only two dimensions can be justified by the results of Russel and Mehrabian
(1977); Russel (1979, 1980) who found that pleasure and arousal accounted for most
of the major proportion of variance in affect scales while dominance accounted only
for a quite small amount. The two-dimensional circumplex model of affect as sug-
gested by Russel (1980) is shown in Figure 2.2. According to this model, emotions
are specified by their position in the two-dimensional space spanned by the two axes
valence (horizontal axis) and arousal (vertical axis).
Compared to discrete emotion models, there is one major advantage of dimensional
models as the emotional experience can be evaluated without the constrictions by
2.3. Psychophysiology of Emotions 11
Figure 2.3: The major components of the limbic system. (Carlson, 2007)
tary and transmits impulses from the CNS to the periphal organs. It includes the
sympathetic and the parasympathetic nervous systems which control e.g. the heart
rate, dilations and constrictions of blood vessels, the pupils or the air flow in the
lungs. The sympathetic nervous system responds to stress and danger, whereas the
parasympathetic nervous system is concerned with conservation and restauration of
energy. Figure 2.4 gives an overview about the functions of parasympathetic and
sympathetic nervous system. The figure clarifies that both systems work antagonis-
tic depending on the activating stimulus.
There are two opposite systems in the brain which can be seen as the structural
foundation of emotion. The aversive system is responsible for defensive / protective
reactions such as fear or escape. In contrast, the appetitive system responds to
positive stimuli (i.e. rewards) with preservative behavior (e.g. ingestion, copulation
and the nurture of progeny). In the dimensional model illustrated in section 2.2.5.2
valence determines whether the appetitive (positive) or the aversive (negative) sys-
tem is activated. In contrast to this, arousal determines the intensity with which
the system is activated. (Downey et al., 2004; Lang, 1995)
For the analysis of emotions it is important to know whether the physiological re-
actions are different for each emotion. There are two different theories relating
emotional expression and experience. According to the James-Lange theory (James,
1950), the experience of emotion is a response to physiological changes in our body,
in other words that we do not cry because we are sad but that we are sad because
we are crying. As a consequence, every emotion is an interpretation of the preceding
arousal. This would require distinct physiological signals for every emotion. In con-
trast, the Cannon-Bard theory (Cannon, 1927) proposes that emotional experience
can occur independently of emotional expression. In addition, Cannon and Bard
stated, that the same physiological changes can accompany different emotions.
2.3. Psychophysiology of Emotions 13
Table 2.3: Elements of the response package and their functions (Levenson, 1999)
liable differences between different autonomous reactions. Anger and fear produced
a larger increase of heart rate than happiness. Moreover, anger produced a larger
finger temperature increase than happiness whereas fear produced a temperature
decrease. While anger, fear, and sadness produced heart rate increases, disgust lead
to a heart rate decrease. Finally, sadness could be distinguished from anger, fear,
or disgust as it produced a larger increase of skin conductance. Levenson (1992)
confirmed these results in a study which showed that heart rate is higher when ex-
periencing anger, fear, or sadness than for disgust. Moreover, fear lead to a lower
diastolic blood pressure, cooler surface temperatures, greater vasconstriction and
less blood flow in the periphery than anger.
Levenson et al. (1990) explain the differences among autonomous reactions by taking
into account the origin of emotions. For instance, an increase of heart rate can
prepare the body for a fight in case of anger or for a flight in case of fear. In
addition, the decrease of finger temperature in case of fear can be explained by the
fact that the blood flow is diverted away from the periphery and redirected towards
the large skeletal muscles. Similarly, the increase of finger temperature in case of
anger can be seen as a result of the blood flow to the hand muscles to support
grasping weapons.
When talking about elicitation of emotions one has to differentiate between three
different kinds of elicited emotion which are listed below.
2.3. Psychophysiology of Emotions 15
Spontaneous emotion: this is the most authentic kind of emotion and therefore
seems to be most appropriate for emotion research. Spontaneous emotion
requires that a person does not know or does not care that data is recorded.
Acted emotion: in contrast to spontaneous emotion the subject knows about the
recordings and tries to put itself in the required emotion. The order and
duration of the emotion is specified in advance.
Controlled elicited emotion: this can be seen as a combination of the first two
kinds of emotion. The aim is to elicit as spontaneous emotion as possible under
controlled conditions. The subject is put in a situation that makes it easy to
experience a certain emotion.
Although spontaneous emotion might seem the most natural and therefore most
suitable kind of emotion, one has to take into account one of the major drawbacks
which is that it is nearly impossible to apply the sensors which are necessary to record
the physiological signals without disturbing the subject. Besides that, there is no
possibility to control which emotion occurs at what time. This makes it very hard
to label the data and to get a balanced number of samples for each emotion. Acted
emotions are often used for emotion detection from speech or faces. However, as not
all of the physiological reactions can be influenced voluntarily acted emotion might
be inappropriate for emotion recognition from physiological signals. Moreover, it is
difficult to behave natural under recording conditions. This is why controlled elicited
emotion is often used for experimental studies, although there is no guarantee that
the desired emotion is really elicited.
There are various ways how emotions can be elicited for experimental studies.
Gerrards-Hesse et al. (1994) differ between five categories of emotion induction meth-
ods2 which reflect the resemblance of their underlying functional principles. These
methods can also be combined to increase effectiveness:
• Film / story: a film or story is presented to the subjects without any additional
information.
• Music: a piece of music is presented without information about its emotional
character.
• Gift: subjects are offered an unexpected gift. This is based on the assumption
that an unexpected gift usually leads to elation.
Presentation of Need-Related Emotional Situations
Procedures using presentation of need-related emotional situations provoke emotions
by satisfaction or frustration of needs. Methods belonging to this group are:
In a meta-analysis Westermann et al. (1996) found that out of these methods, Velten-
technique is the most widely used for inducing positive or negative emotions followed
by the film / story method without instructions and the induction per imagination.
They also found that film / story with or without instruction is the most effective
method to induce positive and negative emotional states.
18 2. Emotion and Affective Computing
1. The spontaneous EEG measures the continuous activity which has amplitudes
between 1 and 200 µV. According to international conventions the frequencies
that can be observed are subdivided into different frequency bands: δ (0.5 - 4
Hz), θ (5 - 7 Hz), α (8 - 13 Hz), β (14 - 30 Hz) and γ (≥ 30 Hz).
2. Evoked potentials occur as a response while stimuli (e.g. a visual flash of light
or an auditory click) are presented. The duration of this event is usually less
than 500 ms.
To measure EEG, electrodes are placed on the surface of the skull. Each electrode
measures the electric potential of the surrounding head area. The signal can be
conducted unipolar or bipolar. When using the unipolar method a so-called active
electrode is placed on the area of interest and a reference electrode is placed on a
relatively inactive area like the earlobe. In contrast, using the bipolar recording
technique requires two active electrodes placed over the cortical areas of interest.
Either the difference or the sum of the electric potentials between the two electrodes
is recorded. One major advantage of bipolar recordings can be seen in the fact
that no inactive reference area has to be selected (this can be quite problematic
as there’s no place which is completely inactive). However, the main disadvantage
2.4. Registering Emotions from Physiological Signals 19
The name of the electrode positions refers to the region of the cortex (Figure 2.6)
above which the electrode is located. The letter F refers to the frontal lobe, T to
the temporal lobe, P to the parietal lobe and O to the occipital lobe. Finally, C
refers to the central lobe (which is also known as insula) which is located within the
cerebral cortex, beneath the frontal, parietal and temporal lobe.
Neurons can be exited by different stimuli, such as natural stimuli from receptor
organs or electrical stimuli. A neuron will not fire until a minimum level of stim-
ulation is reached. If a stimulus is strong enough to trigger the firing of a neuron,
the electrical impulse will continue across the entire axon and potentially via the
synapse to the dendrites of another neuron. As neurons do not touch, transmission
across the synapse is done by neurotransmitters which can cause excitatory and in-
hibitory activity at the synapse. An excitatory post synaptic potential (EPSP) leads
to a depolarisation of the membrane of the next neuron, an inhibitory post synaptic
potential causes a hyperpolarisation. Figure 2.8 illustrates excitatory and inhibitory
activity at the synapse. The action potential that is conducted along the axon to
an excitatory synapse produces an EPSP which can cause another action potential
in the post synaptic neuron (A). In contrast, the IPSP suppresses the generation of
an action potential.
The signals measured by EEG are summed inhibitory and excitatory post synaptic
potentials which occur in the pyramidal cells of the cerebral cortex. An important
characteristic of pyramid cells are their very long dendrites which point to the outside
parts of the cortex. As the resulting voltage of an action potential of a single neuron
is too small to be visible on the surface of the skull, a synchronous activation of a few
thousand neurons pointing into the same direction is necessary, such that potential
differences can be registered.
2.4.1.3 Artifacts
As the amplitude of the EEG-signal is quite low, the signal can easily be influenced
by artifacts - especially because of the high amplification, also small electrical inter-
ferences can have an impact on the signal. Trimmel (1990) differs between technical
artifacts and artifacts with a biological origin.
Technical artifacts can result from relative electrode movement on the skin which
can for instance be caused by movements of the face. Moreover, movements of
the electrode cables can cause capacity changes. Another source for artifacts is
2.4. Registering Emotions from Physiological Signals 21
Figure 2.8: Excitatory and inhibitory activity at the synapse. (Andreassi, 2000)
the noise of an amplifier, as well as interfering fields from power supply voltage.
Finally electrostatic voltages can affect the EEG-signal. Most of these artifacts can
be delimited by a careful preparation such as insulation of the cables.
Biological artifacts result from physiological activity of the body. For example eye
blinks produce high amplitudes which occur especially at the frontal electrodes.
Eye movements can also cause artifacts due to the electrical polarity of the eye.
Moreover, muscle potentials can influence the signal which can easily be identified
by their higher amplitude and frequency though. As the tip of the tongue has a
negative voltage, movements can also influence the EEG signal. This influence can
be increased by some dental fillings.
Davidson (1992) states that left and right hemisphere of the brain are specialized to
different classes of emotions. While the left anterior hemisphere is specialized on ap-
proach3 , the right anterior hemisphere is responsible for withdrawal. He explains the
specialization of the left anterior hemisphere with the findings of Luria (1973) who
described the left frontal region as an important center for intention, self-regulation
and planning. Moreover, damage of left frontal region has been proved to cause ap-
athetic behavior in combination with a loss of interest and pleasure in objects and
people what can be seen as a deficit in approach. Evidence for the specialization
of the right anterior region can be seen in findings that indicate a high activation
of right frontal and anterior temporal regions during arousal of withdrawal-related
emotional states (e.g. fear and disgust).
3
Davidson uses the term approach as an antonym of withdrawal.
22 2. Emotion and Affective Computing
This theory supports the findings by Davidson et al. (1990) who detected less alpha
power4 in right frontal regions for disgust than for happiness while happiness caused
less alpha power in the left frontal region than disgust.
Additionaly, in an EEG study about brain asymmetries during reward and punish-
ment (which can be seen as a positive and a negative emotional stimulus), Sobotka
et al. (1997) found that punishment was associated with less alpha power in right
mid and lateral frontal regions of the brain (electrodes F4 and F8) whereas reward
trials were associated with less alpha power in the left mid and lateral frontal regions
(electrodes F3 and F7).
In an experiment where three emotions (fear, happiness and sadness) were induced
with visual and auditory stimuli, Baumgartner et al. (2006) showed that alpha power
over the left hemisphere increases in happy conditions compared to negative emo-
tional conditions.
There are also several neuroimaging studies about brain activity during emotions.
As neuroimaging methods are not subject of this thesis we refer to a meta-analysis
conducted by Murphy et al. (2003) for more informations about this topic.
Figure 2.9: EMG electrode placements for surface differential recording over major
facial mimetic muscles. (Andreassi, 2000)
Schwartz et al. (1976) found that positive thoughts come with a higher activity of
zygomaticus major whereas unpleasant thoughts lead to an increase of the activity
of corrugator supercilii when self inducing different emotions. Cacioppo et al. (1986)
4
As high activation of a brain region suppresses alpha activity in this region, alpha power is
often used as an indicator for activation of a certain part of the brain.
2.4. Registering Emotions from Physiological Signals 23
corroborated this study when they found that corrugator supercilii and orbicularis
oculi regions varied in conjunction with valence and intensity of pictures: if sub-
jects liked a scene, corrugator supercilii and zygomaticus major activity was higher
and orbicularis oculi activity was lower compared to neutral or unpleasant stimuli.
More examples for research in emotion specific facial EMG activity can be found at
Tassinary and Cacioppo (1992).
As the amplitude of facial EMG is quite weak, the signal can easily be affected by
external inference. Additionaly, movement artifacts can influence the signal. For
example the movement of a muscle can deform the skin under the electrodes which
would lead to a change of the skin-electrode impedance. It might also happen that
electrodes change their position because of muscle contractions. It is also possible
that crosstalk from adjacent muscles is recorded.
visual, illumination, and auditory stimuli was used. The classification was done
using a support vector machine.
Picard et al. (2001) conducted a study where electromyographic, blood volume pres-
sure, skin conductance, and respiration information were collected from one single
person over a period of some weeks. For eight emotion categories (no emotion, anger,
hate, grief, platonic love, romantic love, joy and reverence) a recognition rate of 81
percent was achieved.
A combination of electromyography, electrodermal activity, skin temperature, blood
volume pulse, electrocardiogram, and respiration was used in a study conducted
by Haag et al. (2004) with a single person on different days. In the experiment
emotional states of high and low arousal and high and low valence were elicited
with a blockwise presentation of emotional pictures from the international affective
picture system (IAPS). To classify the emotions, a neural network was used. For
arousal a classification rate of 96.58 percent could be achieved whereas high and low
valence could be distinguished with a correctness of 89.93 percent.
In a study with multiple subjects using electroencephalography, heart rate, and pulse
were conducted by Takahashi and Tsukaguchi (2003), who tried to elicit positive
and negative emotion by acoustic stimuli (music). For data analysis, a multi layer
neural network was compared with a support vector machine. With the neural
network classifier a recognition rate of 62.3 percent was achieved whereas the SVM
classifier recognized 59.7 percent. Takahashi (2004) extended these results with
a study where five emotions (joy, anger, sadness, fear, and relax) were induced
by audio-video contents. For classification of the signals (electroencephalography,
skin conductance, heart rate) a support vector machine was used and achieved a
recognition rate of 41.7 percent (66.7 percent on three emotions).
26 2. Emotion and Affective Computing
3. Data Collection
Before being able to conduct the experiments described in 5, data had to be col-
lected. After an overview about preceding considerations in section 3.1, this chapter
describes the design of the experimental procedure (3.2) followed by ethical con-
siderations (3.3) and a critical reconsideration of the experimental setup in section
3.4.
replicable as there would be more than one repetition of the experiment. The last
important point was the comparability to other studies. For this reason the Inter-
national Affective Picture System (IAPS) was chosen for emotion induction. With
a database of nearly 1000 photographs Lang et al. (2005) provide standardized and
well researched material for visual emotion stimulation. For each picture means and
standard deviations on a 9-point scale are provided for the three dimensions valence,
arousal and dominance. Figure 3.1 shows the pictures of the IAPS in a 2-dimensional
space based on mean valence and arousal ratings. A low value on the valence axis
indicates an unpleasant picture a high value a pleasant picture, respectively. The
Valence
Arousal
Figure 3.1: Pictures of the IAPS in a 2-dimensional space based on mean pleasure
and arousal ratings
method has been proven to be a valid and reliable instrument for investigations of
emotion (Hamm and Vaitl, 1993). Besides that, experimental settings can easily be
repeated and compared to other studies where the IAPS has been used. Moreover,
the subject is not moving during watching the pictures, thus, there are no or only
few body movements which could influence the signal.
TM
Table 3.1: Technical specifications of Varioport EEG amplifier
EEG signals
control window
supervisor’s window
Figure 3.2: Left side shows screen as it is visible to the supervisor, right side shows
the subject’s screen where pictures for emotion elicitation are presented
30 3. Data Collection
3.2.4 Subjects
Most of the 23 subjects were students of the University of Karlsruhe (TH) who
participated voluntarily in this study. 20 of them were male, three were female with
an average age of 26 (standard deviation: 2.07). All subjects had perfect or near
perfect vision. 19 of the participants were right handed, four were left handed. All
of them stated to feel healthy on the day of the experiment and had not used any
medication that could affect the EEG signals. The subjects were organized in two
groups. In one of the groups which consisted of 17 subjects, data of 16 electrodes
was recorded with a standard EEG cap, the recordings of the other six subjects
were done with the EEG-headband. Due to technical errors, one of the headband
subjects had to be excluded from the evaluation. A detailed view of statistical data
of the subjects can be found in Table 3.2.
people (e.g. opposite sex erotica), two different picture sets were chosen for male and
female subjects. A list of the pictures that were selected can be found in appendix
B. As illustrated in section 3.1.2, pictures are organized on a 9-point scale where
high values indicate pleasant and low values indicate unpleasant pictures. To induce
positive emotions only pictures with a valence higher than 7.5 were selected from
categories like food, family or opposite sex erotica. The neutral pictures which
include neutral faces and objects range from 4 to 6. Unpleasant pictures have a
valence lower than 2.6 and consist of pictures like mutilation, contamination, human
or animal threat. Mean values and standard deviation for valence and arousal are
shown in Table 3.3. Only pictures with the same aspect ratio were selected. All
pictures were converted to a size of 1200 x 900 pixel.
during these eight seconds was used to analyze the emotional state. Before the next
presentation cycle started, a gray bar was shown for 15 seconds to allow the subjects
to normalize their emotion. The whole process of picture presentation is shown in
Figure 3.3. Between the two blocks there was an intermission of five minutes. Every
block took about 20 minutes.
subject and they were given a pile of hard copies with all the pictures they had seen
before, with the instruction to sort them into the three categories pleasant, neutral
and unpleasant.
After the rating procedure, the experiment was over. The whole experiment includ-
ing electrode placement and picture rating took about 90 minutes per subject.
In this study, the three states pleasant, neutral and unpleasant were investigated.
As it is questionable whether it is ethical to induce negative emotions, participants
were informed in the experimental instructions that pictures also include unpleasant
pictures. Moreover, unpleasant pictures were included in the practice picture set.
In a study about ethical contracts Reynolds and Picard (2004) proved that people
who had signed a contract about the use of their physiological data felt their privacy
more respected than people without an ethical contract. All subjects were informed
in the consent form, that their data was anonymized and used for research purposes
only.
There are a lot of ways to misuse emotional data like lie detection or systematic
emotion manipulation. Nevertheless, the only purpose of this research is to develop
a system that is able to detect a person’s emotions to be able to improve interaction
with computers and support them in their tasks and not for invading a person’s
privacy against its will.
3.4 Problems
There were several problems regarding the emotion induction procedure that can be
ascribed to the International Affective Picture System.
According to Lang et al. (1997) all pictures in the IAPS were selected ”that are easy
to resolve, have clear figure - ground relationships, and communicate affective quality
relatively quickly”. Nevertheless, the point of action was not always in the center of
the picture. Therefore, for some pictures it was hard to conceive the content at first
view. One subject even stated that he found eight seconds too short to perceive an
emotion.
One also has to pay attention to the fact that the same picture may have a different
influence on different people. For example a picture of a boy playing chess will have
34 3. Data Collection
a positive influence on person who likes playing chess, whereas a person who is not
interested in chess might perceive this picture as neutral or even negative. Beyond
that, the more often we are in a similar situation, the less intense our reaction to a
stimulus is. As the unpleasant pictures included several pictures of mutilation, one
could argue that the first time a person sees one of these pictures he or she reacts
more intense than later on when seeing a similar picture.
For the reasons listed above, all subjects were asked to rate the pictures subsequent
to the experiment.
As mentioned in section 3.2.2, two different systems are used for the recognition
task which will be described in the following. In section 4.1, we give an overview
about the processing steps done in the system based on support vector machines.
Subsequent, we describe a system based on hidden Markov models (4.2).
where w[n] is the window function which determines the window length. The result
of the STFT is complex valued, containing information about amplitude and phase
shift. As we are only interested in amplitude values, the phase is eliminated by
taking the absolute value of the STFT result.
36 4. Methods
After computing the STFT, a bandpass filter is applied to the transformed signal to
eliminate all frequencies that are not of particular interest. The number of frequency
components after applying a bandpass filter with an upper frequency of fu and a
lower frequency of fl to a window of size nf t can be computed as follows:
(fu − fl ) nf t
fs
· + 1 (4.2)
2
2
GlobalNorm: For each feature vector, mean xi and standard deviation σi are cal-
culated. With these values, mean subtraction and variance normalization are
performed. The normalized feature value xnorm
i for a given feature value xi
can be computed according to the following equation:
xi − xi
xGN
i = (4.3)
σi
After this normalization all feature vectors have an average value of zero and
a standard deviation of one which makes it easier to compare feature vectors.
For further processing of the data, feature vectors from the different electrode chan-
nels are concatenated to form one large feature vector for each time segment. The
dimensionality of the resulting feature vectors can be computed by multiplying the
number of electrodes with the result obtained from equation 4.2. For instance, if we
use a window size of two seconds (i.e. 600 frames) and a bandpass filter from 5 to 45
Hz by using equation 4.2 we obtain feature vectors with a length of 16 · 80 = 1280.
4.1. Classification with Support Vector Machines 37
• Frequency bands are put into physiologically relevant groups (e.g. α-, β-, and
γ-band or sub bands of this bands) and an average is calculated for each of
this group.
(j)
where xi is the ith feature of the j th feature vector, xi the mean of the ith feature
value over all feature vectors. y (j) is the reference value belonging to the j th feature
vector and y the mean over all reference values. R denotes the total number of feature
vectors. When the dimensionality is reduced to k we keep only the k features with
the highest ranking.
For a given training set of instance-label pairs (xi , yi ), i = 1, ..., l with xi ∈ Rn and
y ∈ {1, −1}l the following optimization problem has to be solved:
l
1 T X
min w w+C ξi
w,b,ξ 2 i=1 (4.6)
subject to yi (wT φ(xi ) + b) ≥ 1 − ξi
ξi ≥ 0
The slack variable ξi measures the degree of misclassification of xi , C > 0 specifies
the penalty parameter of the error term. w is the normal vector of the separating
hyperplane. The function φ maps the training vectors xi to a higher dimensional
space. In this higher dimensional space the SVM finds a linear separating hyperplane
with the maximal margin.
There are different kernel functions K(xi , xj ) ≡ φ(xi )T φ(xj ) which are used for
support vector classification. In this study, two different kernels are investigated:
For the linear kernel, feature space and input space are exactly the same. In con-
trast to the linear kernel, the RBF kernel nonlinearly maps samples into a higher
dimensional space. This means that it can handle situations with a nonlinear rela-
tion between class labels and attributes. The differences between the linear and the
RBF-kernel are illustrated in Figure 4.1.
Figure 4.1: Comparison between classification with linear and RBF-kernel. The left
figure shows the original classification problem, while the figure in the middle shows
classification results achieved with a linear kernel. Classification on the left figure
was done with a RBF-kernel (parameter C was set to 1000 for both kernels).
Originally, SVMs were designed to solve binary classification problems. For multi-
class classification LIBSVM uses the one-against-one approach, in which k ·(k −1)/2
classifiers are constructed. Each classifier trains data from two different classes ac-
cording to the classification problem depicted in equation 4.6.
A more detailed introduction to classification with SVMs can be found at Burges
(1998).
4.2. Sequential Recognition System 39
C
X
SB = (µi − µ)(µi − µ)T
i=1
Ni
C X
X
SW = (xj,i − µi ) (xj,i − µi )T (4.7)
i=1 j=1
C
1 X
where µ = µi
C i=1
TT S B T
TLDA = arg max (4.8)
T TT S W T
40 4. Methods
The columns of the transformation matrix are the eigenvectors of Sw−1 Sb which are
solutions of the general eigenvalue problem:
−1
SW SB T = λT (4.9)
The emission probabilities are modeled by Gaussian Mixture Models (GMMs). For
more information about HMMs please refer to Rabiner (1989).
During the training, the individual states are associated with the input feature
vectors. By four iterations of the Expectation Maximization algorithm - which is
an iterative optimization method to estimate unknown (or hidden) parameters from
given measurement data (see Dellaert (2002) for a more detailed description) - the
HMM parameters are updated such that the likelihood of the new HMM model λ0 is
larger for the training data than the likelihood of the old λ, i.e. P (x|λ0 ) > P (x|λ). To
classify a given utterance x, a Viterbi path - which finds the most likely sequence of
hidden states within the HMM - is computed for each trained HMM corresponding to
one of the three emotional states, i.e. we calculate P (x|λi ), i ∈ {pleasant, neutral,
unpleasant}. The λi whose likelihood yields the highest result is reported as the
classification result.
5. Experiments and Results
This chapter presents the analysis of the emotional data which was collected ac-
cording to the description in chapter 3. First, a short comparison of subjective user
ratings with the ratings according to the international affective picture system is
conducted (5.1). Afterwards, we describe an emotion recognition system based on
support vector machines and the optimization of this system (5.2). Next, we com-
pare the results from section 5.2 to those of a sequence modeling system based on
hidden Markov models (5.3). Beyond that, in section 5.4 we investigate the influence
on accuracy if only a subset of electrodes is used. Chapter 5.5 finally investigates
the temporal structure of the signal. Throughout all experiments, optimization of
the preprocessing is performed subject independent whereas optimization of training
and classification are done separately for each subject.
Data and tables corresponding to the figures in this chapter can be found in Ap-
pendix D.
IR corpus: Using the IR corpus, original ratings of the international affective pic-
ture system are used to analyze the data. This has the distinct advantage,
that all classes include the same number of samples
42 5. Experiments and Results
PS NS US Σ
PI 574 109 7 690
NI 43 614 33 690
UI 3 46 641 690
Σ 620 769 681 2070
[%] (29.95) (37.15) (32.90)
Table 5.1: Cumulated subject ratings of IAPS pictures. Columns show subject
ratings (S), rows contain ratings according to the IAPS (I).
• Frequency band: 5 - 45 Hz
• No feature reduction
• SVM with RBF-Kernel is used with the default values for C and γ (C = 1,
γ = k1 , k := number of attributes in the input data)
The mean recognition rate over all 15 subjects for the above setting is 36.60 percent
with a standard deviation of 6.54 percent.
5.2. SVM-based Emotion Recognition 43
P N U Σ P N U Σ
P 158 150 142 450 P 44 307 33 384
N 159 158 133 450 N 38 464 18 520
U 146 130 174 450 U 40 356 50 446
Σ 463 438 449 1350 Σ 122 1127 101 1350
[%] (34.30) (32.44) (33.26) [%] (9.04) (83.48) (7.48)
Table 5.2: Confusion matrix of IR cor- Table 5.3: Confusion matrix of SR cor-
pus (absolute values) pus (absolute values)
For a comparison of SR and IR corpus, confusion matrices were computed for both
corpora and summed up over all subjects. The results are displayed in Table 5.2
and 5.3.
Although using the SR corpus produces a much better recognition rate (41.33 per-
cent) compared to IR corpus (36.30 percent), classification data is biased towards
the class neutral in the way that more than half of the pictures are assigned to this
class. This can be ascribed to the fact, that data is unbalanced such that more
samples (i.e. more training data) are available for neutral pictures than for pleasant
or unpleasant pictures.
As the classification results of SR corpus are apparently biased towards the class
neutral, IR corpus is used for further analysis.
44 5. Experiments and Results
When the STFT is performed, pictures are split into overlapping windows. To
determine the influence of the window size on the recognition rate, different window
sizes are applied to the raw data. A window shift of half of the window size is used.
Given the presentation time for each picture of tpic the number of feature vectors nf
per picture after the STFT with a window size of twin seconds can be computed by
tpic
nf = 2 ∗ −1 (5.1)
twin
As the window size has a straight influence on the number of feature vectors and
therefore on the number of feature vectors that can be used for averaging navg , both
were optimized in the same step. The results are shown in Figure 5.1.
Figure 5.1: Mean recognition rates subject to window size and average size
The computation of the ratio on the x-axis is done according to equation 5.2 which
returns values in the range of [0,1].
1 navg
Ratio = · (5.2)
2 ttpic
win
Figure 5.1 shows that the combination of a window size of two seconds and an
averaging over two adjacent feature vectors performs slightly better than other com-
binations. In most cases the recognition rate gets worse if no averaging is performed
(i.e. averaging over one feature vector) or if averaging is performed over all feature
vectors. According to the results, for further optimization a window size of two
seconds combined with an averaging over two adjacent feature vectors is used.
5.2. SVM-based Emotion Recognition 45
Figure 5.2: Mean recognition rate subject to frequency band (whiskers indicate
standard deviation)
5.2.3.3 Normalization
In the next optimization step the influence of the normalization method is inves-
tigated. Two different normalization methods (see 4.1.1.1) are used for data pre-
processing and compared with the recognition accuracy when no normalization is
performed. The results are depicted in Table 5.5. As one can see, normalization
mode RelPower performs better than GlobalNorm. This can be explained by the
46 5. Experiments and Results
fact that within a feature vector, relations of frequency bands are preserved if we
use RelPower. Table 5.5 apparently shows that normalization is an essential step for
data preprocessing as recognition rate when no normalization is performed is 33.33
percent (i.e. chance level) with a standard deviation of zero. The reason for this
result is, that recognition process does not work at all as all data is assigned to the
same class.
According to the results explained above, normalization mode RelPower is used for
further processing.
Figure 5.3: Mean recognition rate subject to average size (whiskers indicate standard
deviation
First, we study the influence of averaging over a fixed number of adjacent frequency
bands. The averaging is performed separately for each electrode. The reduced
number of features is obtained by dividing the original number of features by the
size of the average (reduction factor). Figure 5.3 shows, that this kind of feature
reduction actually does deteriorate the results. The only exception is when averaging
is performed over four adjacent frequency bands which causes a small increase from
39.18 percent to 39.41 percent. It is likely, that the decrease of recognition rate is
due to the loss of information which is caused by the averaging. The only advantage
5.2. SVM-based Emotion Recognition 47
of this method can be seen in a decrease of calculating time due to the reduced
dimensionality.
The second averaging method does - in contrast to the first method - respect physi-
ological meaningful groups of frequency bands. Table 5.6 shows a comparison of the
recognition rates when (a) no averaging (i.e. no feature reduction) is performed, (b)
an average is calculated for α-, β-, γ-, and θ-band, (c) the relative portion of each
frequency band is calculated and used for the recognition task.
Methods (b) and (c) reduce the number of features per electrode to three which
corresponds to a reduction factor of 50. Again, the reduction of features decreases
the accuracy. However, if feature reduction is performed by simply averaging over
adjacent frequency bands, accuracy is higher (36.41 percent) as if we use the relative
portions of each frequency band (35.56 percent) for feature reduction.
Figure 5.4: Mean recognition rate subject to dimensionality (whiskers indicate stan-
dard deviation)
Finally, we investigate a method for feature reduction which tries to find those
features which discriminate best by performing a correlation analysis. The feature
reduction is calculated for the whole feature vector, i.e. over all electrodes. To
find out, which number of dimensions performs best, we compare the accuracy for
a reduction to 20 , 21 , . . . 210 dimensions. Figure 5.4 shows the improvement of
48 5. Experiments and Results
the mean recognition rate subject to the number of features that is kept after the
correlation analysis.
It is obvious that the correlation-based method outperforms the other methods for
dimensionality reduction. The shape of the curve shows that it is useful to keep only
those features that contain much discriminative information. Due to the reduced
dimensionality of the feature vectors, a better training of the classification system is
possible which explains the fact, that recognition rate increases when dimensionality
decreases. The curve also illustrates that when too many features are discarded, too
little information is left for classification. The best recognition rate (52.74 percent)
can be achieved with a dimensionality reduction to 128 features and is therefore
selected for further processing.
• Frequency band: 8 - 45 Hz
For optimization the results of an SVM with a linear kernel are compared to the
results of an SVM with an RBF kernel. As the choice of the kernel parameters
(i.e. the penalty parameter C and the kernel parameter γ for the RBF kernel and
C for the RBF kernel) has an important influence on the accuracy of the SVM, a
parameter optimization is performed separately for both kernels using grid search.
Following the suggestions by Hsu et al. (2003), parameters C and γ are varied as
follows:
Table 5.7 shows the optimal values of C and γ for each subject. For the linear kernel
we achieve a mean recognition rate over all subjects of 60.81 percent with a standard
deviation of 6.13 percent. By the optimization of C and γ for the RBF kernel we
achieve a mean recognition rate of 62.07 percent with a standard deviation of 5.88
percent.
5.2. SVM-based Emotion Recognition 49
Table 5.7: Optimal values of C and γ for RBF and linear kernel
As recognition rates with two different kernels show, the RBF kernel seems to be
better suited for our recognition task. However, the high accuracy of the linear
kernel shows, that it is also possible to linearly separate the data.
An inspection of the optimal values of C and γ for the RBF kernel shows that
these values are highly correlated (correlation coefficient r = −0.9601), i.e. when
C increases the value of γ decreases. A high value of the penalty parameter C in
combination with a low γ, indicating the RBF width, can be seen as an indicator that
the model is overtrained and therefore, does not generalize enough. Thus, for later
studies it should be considered to limit the values of C and γ to avoid overtraining
although this means a decrease of recognition rate.
The mean recognition rate for all subjects in the baseline setting is 33.85 percent
(standard deviation: 4.24 percent).
5.3. Comparison to Sequence Modeling with Hidden Markov Models 51
Figure 5.5: Mean recognition rates for different frequency bands (whiskers indicate
standard deviation)
To improve the recognition rate, different bandpass filters were applied to the raw
EEG signal. The results are displayed in Figure 5.5.
The figure shows, that applying a bandpass filter to the data can help to improve
the recognition rate. A bandpass filter of 5 - 45 Hz - which includes α, β, γ and
θ band - performs best on the given data. Therefore, this filter is used for further
optimization.
Figure 5.7: Recognition rate subject to number of HMM states (whiskers indicate
standard deviation). Bars show the relative portion of subjects whose recognition
rate was best at this number of HMM states.
5.3. Comparison to Sequence Modeling with Hidden Markov Models 53
HMM topology can have an influence on recognition accuracy. Thus, the number
of HMM states and the number of Gaussian mixture models per state were varied.
The optimization was done separately for each subject.
First, the number of HMM states was varied. Values between one and 20 were
investigated. Figure 5.7 shows the influence of variation of the HMM states on the
mean recognition accuracy.
The unsteady shape of the curve can be explained by the fact that it is computed
as a mean over the recognition rates from recordings of 15 subjects. Although the
same preprocessing steps were performed for all subjects, there can still be large
differences in data structure, e.g. caused by a different physical and mental state on
the day of the experiment. Therefore, bars were added to Figure 5.7 indicating the
number of subjects who achieved maximum recognition rates at a certain number
of HMM-states. As this number is correlated with the peaks in the curve, this can
be seen as an explanation for the shape of the curve.
After identifying the optimal number of HMM states for each subject, the number
of GMMs which yields to best recognition results is investigated. For this reason,
we varied the number of GMMs within the following values: 20 , 21 , 22 , 23 , 24 , 25 ,
and 26 . The best combinations of number of HMM states (nHM M ) and number of
GMMs (nGM M ) are shown in Table 5.9.
Table 5.9: Best combination of number of HMM-states and number of GMMs for
each subject
The table shows that for most subjects recognition rate is best with one GMM and
in most of the cases the optimal number of HMM states is quite low. This can be
explained with the small number of samples for each emotion and each subject that
is available for training and classification.
54 5. Experiments and Results
Figure 5.8: Recognition rate depending on number of dimensions after feature re-
duction using the frontal electrodes. (whiskers indicate standard deviation)
Table 5.10: Optimal values for C and γ for frontal electrodes for RBF and lin-
ear kernel. Recordings for subjects with subject IDs 42 - 46 were done with the
headband.
56 5. Experiments and Results
Figure 5.9: Recognition rate depending on number of dimensions after feature re-
duction using only midline electrodes. (whiskers indicate standard deviation)
Figure 5.9 shows that again a dimensionality of 128 performs best after feature
reduction. Therefore we select this dimensionality for SVM optimization. Surpris-
ingly, the curve decreases if dimensionality is reduced to 16. A possible explanation
can be seen in the comparatively high standard deviation.
Similar to the previous analyses we compare the recognition rates between the linear
and the RBF kernel.
Table 5.11 shows that for the midline electrodes as well, RBF kernel performs better
(51.74 percent) than the linear kernel (50.44 percent). Although we use only three
electrodes instead of four, recognition rates for midline electrodes are quite similar
to those of the frontal electrodes.
Table 5.11: Optimal values for C and γ for midline electrodes for RBF and linear
kernel
for each electrode set. Table 5.12 summarizes the recognition rates for all electrode
settings.
As one can see, best recognition rate is achieved when all recorded electrodes are
used. The recognition rate for the frontal electrode subset differs only slightly from
the subset where midline electrodes were used. The lower standard deviation for
frontal and midline electrode subsets can be seen as an indicator that signals at these
positions behave more similar for different users than the signals of all electrodes.
• Window size: 1 second (→ one feature vector per picture, no averaging over
adjacent feature vectors possible)
• Frequency band: 8 - 45 Hz
Figure 5.10: Recognition rate for each time segment (whiskers indicate standard
deviation)
60 5. Experiments and Results
6. Conclusion and Future Work
6.1 Conclusion
In this thesis we developed an emotion recognition system based on EEG signals.
For this purpose, first a data corpus had to be built. This was done by eliciting
emotions by presenting pictures from the IAPS. For each subject 30 utterances of
the three emotional states pleasant, neutral, and unpleasant were recorded.
For emotion recognition two different recognition systems were compared. With an
SVM based system a mean recognition rate of 62.07 percent could be achieved which
is significantly better than random guessing. Moreover, a sequence modeling system
based on HMMs as it is commonly used for speech recognition was investigated.
With this system we attained a mean recognition rate of 46.15 percent. Although
it is not easy to compare both systems as the processing steps are quite different it
is obvious that the SVM-based system performs much better than the HMM-based
system. This can be seen as an indicator that the nature of emotions is better
modeled with a non-sequential system.
Besides that, we analyzed the performance of two different electrode subsets. Al-
though it turned out, that recognition rate is best when all recorded electrodes are
included in the analysis, the accuracy that could be achieved with both subsets of
electrodes was still significantly above chance level.
Finally, a time analysis showed that accuracy is best for time segments from during
the middle of the presentation time, whereas accuracy decreased for time segments
at the beginning and at the end of the picture presentation.
One major principle from speech recognition is also applicable for emotion recogni-
tion: ”There is no better data than more data”. Therefore, data corpus should be
extended. This would lead to more reliable results, especially when doing parameter
optimization. Moreover, an increase of feature vectors can help to reduce the curse
of dimensionality which can cause undertraining.
In section 2.4.7 we presented studies using combined biosignals for emotion recog-
nition. For future work, also other biosignals could be investigated for emotion
recognition and combined with the EEG signals for emotion recognition.
Multimodal Scenario
Besides the possibility to combine EEG signals with other biosignals, it is also pos-
sible to use biosignals in a multimodal scenario. This would mean a combination of
biosensors with cameras and microphones. With such a scenario the disadvantages
of both methods could be weakened. For instance, if a person gets out of reach of
a camera, biosignals could still be used for emotion recognition as the sensors are
applied directly to the body.
Robustness
The EEG signals used for this study were all recorded in a laboratory setting with
very stable conditions. As it is very unlikely to have such stable conditions in
everyday life, a more robust system is needed. There are two ways to make the
system more robust. At first, electrodes could be improved in a way that they are not
that susceptible to external electrical influences. This could be reached for instance
by using electrodes that are better isolated against external influences. Moreover, the
amplifier could be directly integrated to the electrodes. This would avoid artifacts
that occur at the wires between electrodes and amplifier. At second, better methods
for artifact removal could be implemented in the preprocessing process in order to
eliminate artifacts resulting e.g. from muscle contraction activity.
Online Recognition
The system in this study was built for offline recognition only. For future research
an online recognition system would be helpful.
6.2. Future Work 63
Spontaneous Emotion
As explained in section 2.3.3 there are three different kinds of emotions that can
be analyzed: spontaneous emotion, controlled elicited emotion, and acted emotion.
Certainly, spontaneous emotion is the most natural kind of emotion and should
therefore be considered for further studies. The major problem with spontaneous
emotion is, that it is hard to elicit - especially in a laboratory environment. Moreover,
labeling of the emotional data is quite complicated.
Improvement of Usability
In the future, emotion recognition could be used for various applications in every-
day life. Therefore, the disturbance for the user wearing the EEG recording devices
should be as little as possible. The headband used in our study is a first step to-
wards this direction. Making sensors smaller and wireless could also help to improve
usability. One major challenge can be seen in the fact, that the less sensors we have
the more comfortable for the user. But on the other hand, the less sensors we have,
the more complicated is the task to extract the desired information from the signals.
Categorical Emotions
For our investigations, we took the dimensional emotion model as a basis. Although
this model has a lot of advantages compared to the categorical model one major
disadvantage can be seen that some emotions which have a different character are
located quite close to each other. For instance, anger and frustration are located
very close on the dimensional arousal-valence scale. However, it is not for sure how
far the EEG signals differ for emotions that have a different character but are very
close in the dimensional model. Therefore, differences in EEG signals for categorical
emotions should be investigated.
64 6. Conclusion and Future Work
A. Prototype Model
Although Russell and Fehr (1984) introduced the prototype model of emotion, they
did not introduce a description for the tree like structure proposed in this approach.
For this reason, Shaver et al. (1987) conducted a study to find an appropriate de-
scription of the hierarchical structure of emotions. In order to organize emotions
according to this structure, 100 participants were asked to sort cards with 135 emo-
tion names into categories representing which emotions are similar to each other and
which are different. Figure A.1 shows the results of a hierarchical cluster analysis of
the results.
Node A was labeled joy, B was labeled cheerfulness, C was labeled sadness and D
was also labeled sadness.
66
Figure A.1: Results of a hierarchical cluster analysis of 135 emotion names (A = joy, B = cheerfulness, C and D = Sadness). The scale at
the left indicates the cluster strength, asterisks indicate empirically selected subcluster names.
A. Prototype Model
B. IAPS-picturesets
Men Women
Description Slide No. Valence Arousal Valence Arousal
Mean (SD) Mean (SD) Mean (SD) Mean (SD)
Seal 1440 7.96 (1.59) 4.76 (2.25) 8.43 (1.44) 4.47 (2.82)
Family 2340 7.65 (1.36) 5.35 (2.03) 8.34 (1.10) 4.53 (2.29)
Mountains 5700 7.70 (1.36) 5.94 (2.28) 7.54 (1.56) 5.44 (2.38)
Brownie 7200 7.50 (1.78) 4.90 (2.67) 7.77 (1.71) 4.85 (2.55)
Sailing 8080 7.73 (1.25) 7.12 (1.95) 7.73 (1.43) 6.25 (2.34)
PolarBears 1441 7.71 (1.17) 3.84 (2.10) 8.14 (1.33) 4.00 (2.55)
Skier 8190 8.13 (1.29) 6.41 (2.60) 8.08 (1.48) 6.16 (2.57)
Kitten 1460 7.80 (1.47) 4.20 (2.69) 8.58 (0.76) 4.42 (2.60)
Rafting 8370 7.67 (1.19) 6.46 (2.22) 7.86 (1.37) 6.98 (2.25)
Puppies 1710 8.02 (1.21) 5.53 (2.07) 8.59 (0.99) 5.31 (2.54)
Bunnies 1750 7.89 (1.26) 4.21 (2.22) 8.59 (0.75) 4.02 (2.40)
Tubing 8420 7.61 (1.61) 5.71 (2.42) 7.90 (1.50) 5.41 (2.34)
Rollercoaster 8499 7.51 (1.47) 6.69 (1.71) 7.70 (1.36) 5.56 (2.61)
Porpoise 1920 7.83 (1.29) 4.21 (2.49) 7.94 (1.61) 4.31 (2.57)
Beach 5833 8.15 (1.19) 6.37 (2.37) 8.27 (0.99) 5.14 (2.79)
Money 8501 8.14 (1.24) 6.86 (2.00) 7.67 (1.97) 6.02 (2.50)
Fireworks 5910 7.41 (1.20) 5.37 (2.32) 8.16 (1.15) 5.80 (2.75)
Baby 2150 7.46 (1.60) 4.66 (2.37) 8.31 (1.49) 5.29 (2.83)
Baby 2070 7.69 (1.59) 4.02 (2.30) 8.50 (1.28) 4.84 (2.97)
Baby 2040 7.63 (2.01) 4.33 (2.19) 8.74 (0.64) 4.97 (2.85)
Men Women
Description Slide No. Valence Arousal Valence Arousal
Mean (SD) Mean (SD) Mean (SD) Mean (SD)
Couple 2530 7.25 (1.84) 4.23 (2.03) 8.25 (1.10) 3.80 (2.17)
Couple 2550 7.37 (1.20) 4.15 (2.03) 8.14 (1.53) 5.16 (2.67)
Sunset 5830 7.37 (1.80) 4.98 (2.40) 8.54 (0.82) 4.88 (2.86)
IceCream 7330 7.29 (2.21) 4.54 (2.55) 7.96 (1.49) 5.54 (2.53)
Baby 2660 7.28 (1.59) 4.09 (2.20) 8.18 (1.24) 4.76 (2.56)
Seagulls 5831 7.07 (1.10) 3.93 (2.28) 8.05 (1.00) 4.79 (2.59)
Father 2160 6.87 (1.87) 5.31 (2.10) 8.16 (1.28) 5.03 (2.25)
Father 2165 6.74 (1.39) 3.89 (2.24) 8.29 (1.17) 5.05 (2.67)
Father 2057 7.16 (1.31) 4.32 (1.98) 8.39 (0.94) 4.73 (2.75)
Family 2360 6.98 (1.76) 3.65 (2.02) 8.20 (1.59) 3.67 (2.52)
Men Women
Description Slide No. Valence Arousal Valence Arousal
Mean (SD) Mean (SD) Mean (SD) Mean (SD)
EroticFemale 4002 7.69 (1.48) 7.15 (1.81) 4.14 (1.82) 3.72 (2.30)
AttractiveFem 4150 7.80 (1.36) 6.41 (2.18) 5.36 (1.44) 3.44 (1.98)
EroticFemale 4210 8.25 (1.30) 7.80 (1.90) 3.13 (1.66) 4.31 (2.47)
EroticFemale 4220 7.81 (1.74) 6.64 (1.90) 5.61 (1.31) 4.00 (1.95)
EroticFemale 4225 7.57 (1.43) 6.94 (1.83) 5.15 (1.37) 4.40 (2.16)
EroticFemale 4250 8.39 (0.93) 7.02 (2.02) 5.18 (1.55) 3.31 (2.07)
EroticFemale 4311 7.56 (1.38) 7.35 (1.81) 5.89 (1.68) 6.08 (2.32)
EroticCouple 4659 7.70 (1.64) 7.43 (1.80) 6.15 (2.01) 6.47 (2.18)
EroticCouple 4660 7.63 (1.30) 6.92 (1.74) 7.22 (1.40) 6.31 (1.95)
EroticCouple 4680 7.73 (1.61) 5.94 (2.30) 6.91 (1.92) 6.07 (2.26)
Men Women
Description Slide No. Valence Arousal Valence Arousal
Mean (SD) Mean (SD) Mean (SD) Mean (SD)
Man 2190 4.73 (1.25) 2.27 (1.72) 4.90 (1.31) 2.50 (1.86)
Secretary 2383 4.62 (1.24) 3.49 (1.90) 4.79 (1.44) 3.36 (1.79)
Chess 2840 4.92 (1.79) 2.31 (1.88) 4.90 (1.23) 2.55 (1.76)
Mushroom 5500 5.49 (1.67) 2.82 (2.58) 5.34 (1.49) 3.18 (2.25)
RollingPin 7000 4.93 (0.35) 2.73 (1.86) 5.06 (1.10) 2.15 (1.70)
Plate 7233 5.01 (1.21) 2.51 (1.74) 5.15 (1.66) 2.96 (2.05)
Building 7491 4.87 (0.94) 2.60 (1.95) 4.79 (1.09) 2.24 (1.87)
Rain 9210 4.41 (1.85) 2.89 (2.05) 4.64 (1.82) 3.26 (2.20)
Farmer 2191 5.49 (1.49) 3.63 (2.10) 5.14 (1.71) 3.60 (2.17)
Tourist 2850 4.69 (1.40) 2.58 (1.79) 5.69 (1.22) 3.38 (2.01)
Mushroom 5530 5.33 (1.64) 2.87 (2.47) 5.44 (1.57) 2.87 (2.12)
Spoon 7004 4.89 (0.60) 2.09 (1.75) 5.14 (0.59) 1.94 (1.60)
NeutFace 2210 4.41 (1.33) 2.72 (1.92) 4.60 (0.98) 3.44 (1.74)
Factoryworker 2393 4.82 (1.08) 2.90 (1.80) 4.92 (1.05) 2.95 (1.95)
Mug 7009 4.96 (1.05) 2.69 (1.95) 4.89 (0.96) 3.26 (1.96)
Basket 7010 4.95 (1.43) 1.55 (1.36) 4.92 (0.48) 1.97 (1.58)
Fan 7020 5.02 (1.22) 2.15 (1.71) 4.94 (0.88) 2.19 (1.72)
Shipyard 7036 5.08 (1.02) 3.47 (2.09) 4.71 (1.10) 3.18 (1.98)
DustPan 7040 4.72 (1.19) 2.46 (1.86) 4.66 (1.00) 2.90 (1.99)
Baskets 7041 4.96 (1.14) 2.68 (1.76) 5.02 (1.11) 2.53 (1.79)
HairDryer 7050 4.81 (0.71) 2.59 (1.79) 5.04 (0.87) 2.90 (1.82)
Fork 7080 5.43 (1.26) 1.98 (1.63) 5.10 (0.88) 2.67 (1.99)
Book 7090 4.95 (1.54) 2.30 (1.90) 5.44 (1.35) 2.92 (2.15)
Umbrella 7150 4.76 (0.73) 2.66 (1.68) 4.69 (1.19) 2.56 (1.83)
Fabric 7160 4.98 (0.97) 3.06 (2.08) 5.05 (1.19) 3.08 (2.09)
Pole 7161 4.99 (0.86) 2.79 (1.81) 4.97 (1.16) 3.15 (2.14)
Lamp 7175 4.78 (1.18) 1.55 (0.96) 4.95 (0.80) 1.87 (1.48)
IroningBoard 7234 4.36 (1.41) 2.83 (1.79) 4.12 (1.73) 3.05 (1.99)
Building 7500 5.44 (1.36) 3.46 (2.23) 5.23 (1.50) 3.08 (2.15)
Tissue 7950 4.62 (1.26) 2.30 (1.89) 5.17 (1.12) 2.27 (1.77)
Men Women
Description Slide No. Valence Arousal Valence Arousal
Mean (SD) Mean (SD) Mean (SD) Mean (SD)
SadChildren 2703 2.33 (1.53) 5.73 (1.99) 1.59 (0.87) 5.81 (2.47)
SadChild 2800 2.31 (1.36) 4.94 (1.97) 1.41 (0.79) 5.87 (2.13)
Mutilation 3000 1.69 (1.47) 6.74 (2.37) 1.17 (0.54) 7.63 (2.11)
Mutilation 3010 2.19 (1.42) 7.12 (1.75) 1.29 (0.82) 7.44 (2.21)
Mutilation 3060 1.94 (1.39) 6.89 (2.08) 1.66 (1.71) 7.34 (2.10)
Mutilation 3064 1.78 (1.26) 5.44 (2.70) 1.15 (0.44) 7.30 (2.22)
Mutilation 3068 2.47 (1.92) 6.44 (2.46) 1.18 (0.70) 7.09 (2.49)
Mutilation 3069 2.10 (1.66) 6.70 (2.60) 1.32 (1.01) 7.33 (2.20)
Mutilation 3071 2.06 (1.59) 6.61 (2.13) 1.69 (1.14) 7.10 (1.95)
Mutilation 3080 1.63 (1.11) 6.84 (2.06) 1.33 (0.75) 7.61 (1.81)
BurnVictim 3100 1.88 (1.14) 5.88 (2.34) 1.35 (0.96) 7.02 (2.02)
BurnVictim 3110 2.10 (1.56) 6.43 (2.26) 1.47 (0.89) 6.98 (2.04)
DeadBody 3120 1.80 (1.32) 6.20 (2.55) 1.33 (0.74) 7.49 (1.96)
Mutilation 3130 1.90 (1.57) 6.56 (2.11) 1.26 (0.68) 7.39 (1.97)
BatteredFem 3180 2.27 (1.33) 5.17 (2.05) 1.67 (0.90) 6.19 (2.24)
Mutilation 3225 2.06 (1.24) 5.39 (2.41) 1.66 (1.20) 6.32 (2.43)
DyingMan 3230 2.44 (1.50) 5.00 (2.35) 1.67 (0.99) 5.75 (2.04)
Tumor 3261 1.98 (1.19) 5.51 (2.70) 1.70 (1.43) 5.92 (2.60)
Attack 3530 2.10 (1.53) 6.85 (2.13) 1.51 (1.00) 6.80 (2.07)
Soldier 6212 2.59 (1.47) 5.47 (2.44) 1.81 (1.41) 6.53 (2.35)
Attack 6313 2.43 (1.42) 6.54 (2.11) 1.61 (1.22) 7.27 (2.29)
Attack 6540 2.53 (1.84) 6.51 (2.27) 1.86 (1.14) 7.14 (1.98)
Attack 6560 2.57 (1.49) 6.17 (2.28) 1.78 (1.23) 6.86 (2.52)
StarvingChild 9040 1.88 (1.17) 5.10 (2.11) 1.50 (0.97) 6.44 (2.00)
Cow 9140 2.56 (1.42) 4.90 (2.29) 1.88 (1.26) 5.79 (2.04)
Cemetery 9220 2.27 (1.61) 3.83 (2.33) 1.86 (1.46) 4.16 (1.84)
Assault 9254 2.28 (1.51) 5.57 (2.45) 1.88 (1.22) 6.33 (2.26)
Soldier 9410 1.96 (1.56) 6.38 (2.26) 1.20 (0.58) 7.54 (1.78)
DeadMan 9433 2.39 (1.38) 5.00 (2.65) 1.35 (0.71) 6.71 (2.27)
Dog 9570 1.90 (1.40) 5.84 (2.41) 1.47 (1.00) 6.45 (2.19)
As described in section 3.2.7, experimental instruction were handed out to the sub-
jects as a paper copy to ascertain that all subjects have the same information about
the experiment. The instruction is shown below.
Liebe Versuchsperson!
Sollten Sie noch Fragen zu dem Experiment haben, wenden Sie sich bitte
an die Versuchsleiterin.
In this part of the appendix we give a detailed report of the recognition rates from
the experiments in section 5. Best recognition rates are marked bold.
Table D.4: Mean recognition rate subject to number of dimensions after correlation-
based feature reduction
D.2. Data for section 5.3 77
Table D.6: Mean recognition rate subject to number of dimensions after LDA
78 D. Data
Number of subjects
Number of HMM states relative [%] absolute
1 0.07 1
2 0.13 2
5 0.27 4
8 0.20 3
12 0.07 1
16 0.20 3
20 0.07 1
Table D.9: Mean recognition rate depending on number of dimensions after feature
reduction for frontal electrodes
Table D.10: Mean recognition rate depending on number of dimensions after feature
reduction for midline electrodes
80 D. Data
Anttonen, J. & Surakka, V. (2005). Emotions and Heart Rate while Sitting on a
Chair. In: CHI ’05: Proceedings of the SIGCHI conference on Human factors in
computing systems, pages 491–499. ACM.
Axelrod, L. (2004). The Affective Connection: How and When Users Communicate
Emotion. In: CHI Extended Abstracts, pages 1033–1034.
Baumgartner, T., Esslen, M., & Jancke, L. (2006). From Emotion Perception to
Emotion Experience: Emotions Evoked by Pictures and Classical Music. Inter-
national Journal of Psychophysiology, 60:34 –43.
TM
Becker, K. (2003). VarioPort . http://www.becker-meditec.com/.
Beedie, C., Terry, P., & Lane, A. (2005). Distinctions between Emotion and Mood.
Cognition and Emotion, 19:847–878.
Cacioppo, J. T., Losch, M. L., Tassinary, L. G., & Petty, R. E. (1986). The Role
of Affect in Consumer Behavior: Emerging Theories and Applications, chapter
Properties of affect and affect-laden information processing as viewed through the
facial response system, pages 87–118. Lexington, MA: D. C. Heath. EMG.
Chang, C.-C. & Lin, C.-J. (2001). LIBSVM: a Library for Support Vector Machines.
Software available at http://www.csie.ntu.edu.tw/˜cjlin/libsvm.
Cowie, R., Douglas-Cowie, E., Apolloni, B., Taylor, J., Romano, A., & Fellenz, W.
(1999). What a Neural Net needs to know about Emotion Words. In: CSCC’99
Proceedings, pages 5311–5316.
Davidson, R. J., Ekman, P., Sarona, C. D., Senulis, J. A., & Friesen, W. V. (1990).
Approach / Withdrawal and Cerebral Asymmetry: Emotional Expression and
Brain Physiology. Journal of Personality and Social Psychology, 58(2):330 – 341.
Downey, G., Mougios, V., Ayduk, O., London, B. E., & Shoda, Y. (2004). Rejection
Sensitivity and the Defensive Motivational System: Insights From the Startle
Response to Rejection Cues. Psychological Science, 15(10):668–673.
Ekman, P., Campos, J., R.J., D., & De Waals, F. (2003). Emotions Inside Out:
130 Years After Darwin’s the Expression of the Emotions in Man and Animals,
chapter Darwin, Deception, and Facial Expression, pages 205 –221. New York
Academy of Sciences.
Ekman, P., Levenson, R., & Friesen, W. (1983). Autonomic Nervous System Activity
Distinguishes among Emotions. Science, 221:1208 – 1210.
Haag, A., Goronzy, S., Schaich, P., & Williams, J. (2004). Emotion Recognition
Using Bio-Sensors: First Steps Towards an Automatic System. Lecture Notes in
Computer Science, 3068:33–48.
Healey, J., Picard, R., & Dabek, F. (1998). A New Affect-Perceiving Interface and
Its Application to Personalized Music Selection. Proceedings of the 1998 Workshop
on Perceptual User Interfaces.
Honal, M. (2005). Determining User State and Mental Task Demand from Elec-
troencephalographic Data, Diplomarbeit, Universität Karlsruhe (TH), Karlsruhe,
Germany.
Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector
classification. Technical report, Department of Computer Science. Last updated:
May 21, 2008.
Izard, C. E. (1994). Die Emotionen des Menschen, (3. aufl. ed.). Beltz, Psychologie-
Verl.-Union.
James, W. (1950). The Principles of Psychology, Vol. 1, (reprint edition ed.). Dover
Publications.
Jasper, H. H. (1958). The Ten-Twenty Electrode System of the International Fed-
eration in Electroencephalography and Clinical Neurophysiology. EEG Journal,
10:371–375.
Kim, K. H., Bang, S. W., & Kim, S. R. (2004). Emotion Recognition System using
Short-term Monitoring of Physiological Signals. Medical and Biological Engineer-
ing and Computing, 42:419–427.
Klein, J., Moon, Y., & Picard, R. W. (2002). This Computer Responds to User Frus-
tration - Theory, Design, Results and Implications. Interacting with Computers,
14:119–140.
Kleinginna, Paul R., J. & Kleinginna, A. M. (1981). A Categorized List of Emo-
tion Definitions, with Suggestions for a Consensual Definition. Motivation and
Emotion, 5(4):345–379.
Lang, P., Bradley, M., & Cuthbert, B. (1997). International Affective Picture System
(IAPS): Technical Manual and Affective Ratings.
Lang, P., Bradley, M., & Cuthbert, B. (2005). International Affective Picture System
(IAPS): Technical Manual and Affective Ratings. Technical report, Gainesville,
Fl: NIMH Center for the Study of Emotion and Attention (CSEA), University of
Florida.
Lang, P. J. (1995). The Emotion Probe: Studies of Motivation and Attention.
American Psychologist, 50:372–385.
Leng, H., Lin, Y., & Zanzi, L. A. (2007). An Experimental Study on Physiological
Parameters Toward Driver Emotion Recognition. Lecture Notes in Computer
Science, 4566:237–246.
Levenson, R. W. (1992). Autonomic Nervous System Differences among Emotions.
Psychological Science, 3(1):23–27.
Levenson, R. W. (1999). The Intrapersonal Functions of Emotion. Cognition and
Emotion, 13(5):481–504.
84 Bibliography
Levenson, R. W., Ekman, P., & Friesen, W. V. (1990). Voluntary Facial Action Gen-
erates Emotion-specific Autonomic Nervous System Activity. Psychophysiology,
27(4):363–384.
Luria, A. R. (1973). The Working Brain. New York: Basic Books.
Mayer, C. (2005). UKA EMG/EEG Studio v2.0.
McFarland, R. A. (1985). Relationship of Skin Temperature Changes to the Emo-
tions Accompanying Music. Applied Psychophysiology and Biofeedback, 10:255–
267.
Mowrer, O. (1960). Learning Theory and Behavior. New York: Wiley.
Murphy, F., Nimmo-Smith, I., & Lawrence, A. (2003). Functional Neuroanatomy of
Emotions: A Meta-Analysis. Cognitive, Affective, and Behavioral Neuroscience,
3(3):207 – 233.
Nass, C., Fogg, B. J., & Moon, Y. (1996). Can Computers be Teammates? Inter-
national Journal of Human-Computer Studies, 49(6):669–678.
Nass, C. & Moon, Y. (2000). Machines and Mindlessness: Social Responses to
Computers. Journal of Social Issues, 56(1):81 – 103.
Nass, C., Steuer, J., & Tauber, E. R. (1994). Computers are Social Actors. In:
CHI ’94: Proceedings of the SIGCHI conference on Human factors in computing
systems, pages 72–78. ACM.
Ortony, A. & Turner, T. J. (1990). What’s Basic about Basic Emotions? Psycho-
logical Review, 97(3):315–331.
Osgood, C. E. (1952). The Nature and Measurement of Meaning. Psychological
Bulletin, 49:197–237.
Picard, R. (1995). Affective Computing. Technical Report 321, MIT Media Labo-
ratory, Perceptual Computing Section.
Picard, R. W. (1997). Affective Computing. The MIT Press.
Picard, R. W. & Healey, J. (1997). Affective Wearables. In: ISWC, pages 90–97.
Picard, R. W., Vyzas, E., & Healey, J. (2001). Toward Machine Emotional Intelli-
gence: Analysis of Affective Physiological State. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 23:1175 – 1191.
Porbadnigk, A. K. (2008). EEG-based Speech Recognition: Impact of Experimental
Design on Performance, Studienarbeit, Universität Karlsruhe (TH), Karlsruhe,
Germany.
Rabiner, L. R. (1989). A Tutorial on Hidden Markov Models and Selected Applica-
tions in Speech Recognition. Proceedings of the IEEE, 77(2):257–286.
Reeves, B. & Nass, C. (1995). The Media Equation: How People Treat Computers,
Televisions, and New Media as Real People and Places. Cambridge University
Press.
Bibliography 85
Reynolds, C. & Picard, R. (2004). Affective Sensors, Privacy, and Ethical Contracts.
In: CHI ’04: CHI ’04 extended abstracts on Human factors in computing systems,
pages 1103–1106. ACM.
Russell, J. & Fehr, B. (1984). Concept of Emotion Viewed from a Prototype Per-
spective. Journal of Experimental Psychology: General, 113:464–486.
Schandry, R. (1989). Lehrbuch der Psychophysiologie, (2., überarb. u. erw. aufl. ed.).
Psychologie-Verl.-Union.
Schwartz, G., Fair, P., P. Salt, M. M., & Klerman, G. (1976). Facial Muscle Pat-
terning to Affective Imagery in Depressed and Nondepressed Subjects. Science,
192:489–491.
Selesnick, I. W., Baraniuk, R. G., & Kingsbury, N. C. (2005). The dual-tree complex
wavelet transform. Signal Processing Magazine, IEEE, 22(6):123–151.
Shaver, P., Schwartz, J., Kirson, D., & O’Connor, C. (1987). Emotion Knowledge:
Further Exploration of a Prototype Approach. Journal of Personality and Social
Psychology, 52:1061–1086.
Sobotka, S. S., Davidson, R. J., & Senulis, J. A. (1997). Anterior Brain Electrical
Asymmetries in Response to Reward and Punishment. Electroencephalography
and Clinical Neurophysiology, 83(4):236 – 247.
Spencer, H. (1890). The Principles of Psychology, volume Vol. 1. New York: Ap-
pleton.
Stickel, C., Fink, J., & Holzinger, A. (2007). Enhancing Universal Access - EEG
Based Learnability Assessment. Lecture Notes in Computer Science, 4556:813–
822.
Vrana, S. R., Cuthbert, B. N., & Lang, P. J. (1986). Fear Imagery and Text Pro-
cessing. Psychophysiology, 23:247 – 253.
Westermann, R., Spies, K., Stahl, G., & Hesse, F. W. (1996). Relative Effectiveness
and Validity of Mood Induction Procedures: a Metaanalysis. European Journal
of Social Psychology, 26:557–580.
Winton, W. M., Putnam, L. E., & M., K. R. (1984). Facial and Autonomic Manifes-
tations of the Dimensional Structure of Emotion. Journal of Experimental Social
Psychology, 20:195–216.
Yerkes, R. & Dodson, J. (1908). The Relation of Strength of Stimulus to the Rapidity
of Habit Formation. Journal of Comparative Neurology and Psychology, 18:459–
482.