Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Multimed Tools Appl

DOI 10.1007/s11042-016-3637-2

Stress in interactive applications: analysis

of the valence-arousal space based on physiological signals
and self-reported data
Alexandros Liapis 1 & Christos Katsanos 1,2 &
Dimitris G. Sotiropoulos 1 & Nikos Karousos 1,2 &
Michalis Xenos 1

Received: 10 December 2015 / Revised: 22 May 2016 / Accepted: 24 May 2016

# Springer Science+Business Media New York 2016

Abstract Measuring users emotional reaction to interactive multimedia and hypermedia is

important. One particularly popular self-reported method for emotion assessment is the ValenceArousal (VA) Scale: a 9 9 affective grid. This paper aims to identify specific stress region(s) in
the VA space by combining self-reported ratings (pairs of VA) and physiological signals (skin
conductance). To this end, 31 healthy volunteers participated in an experiment by performing five
stressful interaction tasks while their skin conductance was monitored. The selected interaction
tasks were most frequently listed as stressful by a separate group of 15 interviewees. After each
task, participants expressed their perceived emotional experience using the VA rating space. Our
findings show which regions in the VA rating space may reliably indicate self-reported stress that
is in alignment with ones measured skin conductance while using interactive applications. One
additional important contribution of this work is the proposed approach for the empirical
identification of affect regions in the VA space based on physiological signals.
Keywords Human computer interaction . Emotional experience evaluation . Interactive
multimedia environments . Galvanic skin response . Affect grid . Valence . Arousal

1 Introduction
E-learning, Ubiquitous Computing, Ambient Intelligent and Internet of Things are becoming
widespread. As a result, innovative interactive multimedia applications are emerging, leading

* Alexandros Liapis

School of Science and Technology, Hellenic Open University, Parodos Aristotelous 18, Patras 26335,

Technological Educational Institute of Western Greece, M. Alexandrou 1, Patras 26334, Greece

Multimed Tools Appl

users in an unprecedented exposure to a multitude of stimuli. Stimuli can trigger emotions,

negative or positive, which influence the overall User Experience (UX) [32]. Rosalind Picard
was a pioneer writing about computers that could understand and adapt to users emotions,
setting in this way the foundations of a new branch in computer science, the Affective
Computing [47]. During the last years, study of users emotional state has attracted an
increased interest in the scientific communities of Multimedia, Software Development and
Human-Computer Interaction (HCI).
So far, researchers and practitioners have developed many subjective methods (e.g. specific
questionnaires, interviews and observation etc.) to cope with Users Emotional Experience
(UEX) evaluation. Such subjective methods are delineated in the following section on the
research background of this paper. Despite the usefulness of such methods, they all provide
indirect access to users emotional state. Furthermore, such methods are often associated with
memory recall problems as ascertained by the post-session assessment process [37].
Direct access to emotion can be achieved by recording the Autonomic Nervous System
(ANS) activity, which regulates users bodily functions. Regarding ANS activity, studies [8,
11, 30] have shown that emotions are associated with changes in physiological signals, such as
heart rate, skin conductivity, skin temperature, respiration etc. In addition, physiological
signals enable the continuous monitoring of users emotional state. This was one of the main
reasons why innovative evaluation methods relied on the measurement of users physiological
signals [20, 35, 55, 56], creating new perspectives in UEX evaluation. Characteristics such as
objectivity, multidimensionality, unobtrusiveness and continuity [9, 29] have reinforced the
use of physiological signals. Regarding physiological signals recording sensors, factors such
as low cost, small size, new data acquisition techniques (e.g. using Bluetooth or Wi-Fi) can
further promote the use of such signals in both research and practice. However, one important
downside of physiological signals is that they are prone to a variety of noises such as users
health and environmental conditions (e.g. humidity levels, temperature etc.).
The purpose of this paper is to combine self-reported Valence-Arousal (VA) ratings and
physiological signals (Skin Conductance) in order to investigate how multiple data sources
contribute to the identification of specific stress region(s) in the VA space. Such region(s) in
the VA space would include ones self-reported stress rating that is in alignment with ones
measured skin conductance. The results of such studies may enable both HCI and interactive
multimedia researchers and practitioners to use a multi-emotional assessment tool, such as the
Affect Grid, in their UEX evaluation studies, knowing a priori that a specific region is
associated with a specific emotion; in this paper we investigate stress.
This paper significantly extends our previous work [36] in the following directions: a) the
VA space is segmented into non-rectangular regions with the use of a greedy algorithm
(Incremental Stress Region Construction; ISRC), and b) a new preprocessing technique is
applied on the GSR dataset. In order to collect VA ratings, the Affect Grid tool [51] was used.
To this end, 31 healthy typical computer users were asked to perform five stressful interaction
tasks while their skin conductance was monitored. Tasks were carefully selected based on the
responses of 15 typical computer users who were involved in a pre-experiment interview. At
the end of each task, participants assessed their interaction experience in terms of valence and
To the best of our knowledge, this is the first experimental approach which exclusively uses
typical interaction tasks as stimuli, such as information seeking in a website. The recorded
GSR dataset is going to be publicly available to the research community so that it can be used
for stress recognition in subtle interaction events. The preprocessing method applied on the

Multimed Tools Appl

GSR signal, along with the performance of various classifiers (e.g. SVM, k-NN etc.) are also
The rest of the paper is structured as follows. Section 2 presents the research background.
Section 3 presents the experimental details, in specific the stimuli selection, the experimental
general set-up and protocol are delineated. In Section 4, the proposed algorithm for segmenting
the VA space into non-rectangular regions is described. Section 5 presents the results. The
paper concludes with a discussion of the main findings, limitations of the presented work and
directions for future research.

2 Background
Multimedia components and features bridge the gap between traditional computer interfaces
and new innovative systems (e.g. multipurpose public interactive displays [27]) that support
multimodal interaction. The growing number of software applications along with the wide use
of interactive multimedia, generate a need for methods to evaluate them systematically.
In terms of user experience evaluation, one is mostly interested in identification of system
flaws [33]. A multimedia application or a system with such flaws might cause undesirable
activation of ANS, which is associated with behavioral effects known as the Bfight or flight^
response or stress [24, 60]. Stress is defined as a state transition from calmness to high arousal,
accompanied by biochemical, physiological and behavioral changes for reasons of preserving an
organisms integrity [3]. Although stress is often related to negative experience, it may be also
beneficial in some cases by providing an appropriate boost to someone (e.g. meet an important
deadline for a report submission, solve an exercise while taking part in exams etc.). However,
frequent or daily exposure to stressors is a precursor of chronic stress, which can badly affect
peoples health [1]. Beyond health issues, stress may also affect users performance [16], and its
presence in interactive computer environments is probably interpreted as a user experience issue.
Thus, assessing stress is particularly important in this context, and this is the goal of this paper.
Approaches for measuring stress can be distinguished into non-intrusive and intrusive
techniques. Non-intrusive techniques are mainly based on self-reporting instruments, such as
the Daily Stress Inventory [7], the NASA Task Load Index (NASA-TLX) [21], the Stroop
color test [26], and the Situation Awareness Global Assessment Technique (SAGAT) [19].
Although such techniques are straightforward and do not require the use of any special
equipment, they have been criticized for lack of objectivity, extra cognitive load and memory
recall problems [24, 39]. In an attempt to overcome these problems, other stress measurement
approaches capture and analyze users observed behavior in an non-invasive and continuous
way by means of pressure-sensitive keyboards [24, 41].
One particularly popular non-intrusive tool for emotion assessment is the two-dimensional
(2D) Affect Grid [51]. The Affect Grid is an effective and easy tool for emotion measuring
[28], which is based on the theoretically circumplex model of affect, introduced by Russell
[49]. This tool requires participants to select a point on a 9 9 grid (see Fig. 1) that best
indicates their emotional state at a specific moment. The grid consists of two dimensions: the
horizontal valence axis (displeasure-pleasure) and the vertical arousal axis (sleepiness-arousal).
For example, if someone feels neutral, then the middle square of the grid (coordinates = 5, 5) is
expected to be selected. The Affect Grid tool has been used in a variety of research fields.
Regarding the multimedia field, the Affect Grid has been used in the emotional evaluation of
both multimedia environments and content. In specific, [12, 54] used the 9 9 tool in order to

Multimed Tools Appl

Fig. 1 The Affect Grid tool

investigate emotions impact on learners performance within multimedia environments. Also,

in [13] learners levels of valence and arousal, while interacting with multimedia learning
content, was measured.
Although the Affect Grid tool has been used in several studies [2, 34, 42, 45, 48, 58], an
open research question is whether (and to what extent) it can capture the alignment between
what users report about their emotions and what they actually feel [46]. Many studies asking
participants to report their emotional state more than once [10, 15, 18] mention extreme
discrepancies among the answers of the same participant. Reasons such as mood variability
among participants, their Bobligation^ to over-report mood shifts and different interpretations
of the scales magnitude, probably cause such high variability in emotional state assessment
when using the Affect Grid tool [50].
Beyond non-intrusive methods, an alternative approach to measure stress is through the
direct measurement of physiological signals. The BJames-Lange theory of emotion^ [25]
considers emotions as an outcome of a psychological state that can be identified through
physiological signals. According to James-Lange theory, a stimulus leads to a physiological
reaction. For instance, a walk in the woods and an unexpected encounter with a wild animal
can increase ones heart beats and trigger a body tremble reaction. According to this theory,
interpretation of physical reactions would conclude that the person is afraid BI am trembling,
therefore I am afraid^. James-Lange theory of emotion was critiqued by the Btwo-factor theory
of emotion^ introduced by [52]. According to this theory, emotions are neither purely physical
nor purely cognitive reactions, but a combination of both. The theory posits that physical
reactions must be interpreted in context, i.e. considering the situation that someone is facing.
Therefore, a fast pounding heart could be interpreted as stress, if someone is taking part in
exams, and as fear, if someone encounters a wild animal.
Although there are many physiological signals that can be recorded by available technology
(e.g. heart rate, temperature, respiration, electromyography etc.), research [14, 22, 23, 40] has
shown that skin conductance, also known as Galvanic Skin Response (GSR) or Electro
Dermal Response (EDR), is a reliable indicator of stress. Skin conductance provides reliable
insight on the activation levels of the sympathetic branch of the ANS [6]. Additional
advantages of GSR, such as specificity of the measure, ease of setup and sensors low cost,
have made it particularly popular. Skin conductance is the physiological signal that was
selected and measured in this paper.

Multimed Tools Appl

GSR and other physiological signals have been used in order to measure stress in users of
multimedia applications and entertainment technologies [43, 44, 57, 60]. Such studies are
designed to induce intense reactions to users. However, recognizing stress in subtle interaction
events [59], which are typically expected in most interactive computer environments, remains
challenging. This paper studies stress measurement in such subtle interaction events.

3 Experimental details
Inducing emotion in a laboratory environment is particularly challenging. The stimuli must be
carefully designed or selected in order to trigger the appropriate arousal levels of ANS. In
addition, stimuli should be realistic enough and free from any researcher bias. To this direction,
many methods [30, 38, 44, 53] have been used in emotions induction process. However, they
all rely on intense contexts, such as viewing movie clips, listening to songs, experiencing
major hardware/software failures, viewing images of specific databases and playing games.
Thus, recognizing emotions in subtle interaction events [12], which are typically expected in
most interactive computer environments, remains rather unexplored.
To the best of our knowledge, a stimuli dataset that relies on such tasks does not exist.
Hence, this paper also presents the stimuli selection approach that was followed. Stimuli
selection was based on a face to face interview process and extensive pilot-testing. To this end,
15 typical computer users (University employees, students, and friends) participated on a
voluntarily basis and they were asked to report stressful computer tasks. Interviews took place
at the Hellenic Open University infrastructures, and each session lasted from 15 to 20 min per
interviewee. First, demographic information (e.g. age, skills in computer usage, profession,
education etc.) was recorded. Next, interviewees were asked to describe at least five stressful
computer tasks. Interviewees were neither informed nor participated in the main stress
monitoring experiment, which is described in the following. The scenarios provided by the
interviewees did not require any special skills or experience in computer usage.
Subsequently, the collected interview data were analyzed. First, similar participants answers were grouped and a frequency table was created. Frequency analysis did not reveal any
significant differences due to demographic parameters. Next, starting from the most
frequently-mentioned answer, appropriate interaction scenarios were designed and pilot-tested.
This pilot-testing process revealed that although interaction scenarios involving financial
transactions and viruses were commonly reported by interviewees, they were not selected in
order to ensure the ecological validity of the study. For instance, pilot-testing participants
reported that a financial transaction with a wrong charge in a provided credit card was not
found to be stressful. In addition, the interaction scenarios had to require minimum typing
effort in order to avoid any noise during signal recording. In the end, the five most commonly
reported scenarios were selected, taking also into account the aforementioned criteria. The final
interaction scenarios are elaborated in the following section.

3.1 Research-based stimuli scenarios

3.1.1 Missing a file
This scenario simulates a condition in which someone has lost a file and is trying to find it. In
specific, participants were asked to visit the website of the internal assessment and training unit

Multimed Tools Appl

of the Hellenic Open University ( This website was selected in order to
avoid any previous interaction experience. In a three steps scenario participants were asked to:
a) navigate in the website and find a specific file, b) download and save the file to a specific
network folder and c) log in a provided google email account and send the file at a provided
email address. Login credentials were already saved on the testing pc in order to avoid extra
motor and mental effort. While participants were busy creating the email, experiment
facilitators deleted the downloaded file remotely.

3.1.2 Hardware failure

This scenario simulates a hardware failure related to the users pointing device (i.e. mouse). In
specific, participants were asked to visit the website of our research group (http://quality.eap.
gr). Again, this website was selected because it was expected to be unfamiliar to participants
and thus the effect of any previous interaction experience was minimized. Participants were
asked to navigate in the website in order to find and copy the consortium list from a specific
research project and then paste it in a text file. While participants were trying to accomplish the
task, the experiment facilitators set the mouse cursor speed to slow. The speed was remotely
adjusted using a custom-made software tool that had been previously installed in the testing

3.1.3 Slow internet connection

This scenario simulates a condition in which someone is trying to accomplish an internet-based
task in a slow network connection. Participants were asked to visit a web portal that is popular
in our country ( The specific website was selected due to its vast content
which affects website loading speed. Participants were asked to find information about a
specific movie (e.g. date of premier, casting etc.). During participants website navigation, the
network connection was set at 56Kbps in order to make interaction experience much slower
than the usual. The speed in the testing computer was modified remotely through the Fiddler
( software.

3.1.4 Web advertising (Popups)

This scenario simulates a condition in which someone is facing unexpected advertising
information while trying to accomplish a task. In specific, participants were asked to visit
one popular online booking website ( in order to make a reservation
for a predefined destination. Appropriately designed popup windows appeared in users screen
every 15 s while they were trying to navigate in the website and complete the reservation. The
popup window was relevant to both the websites content and visual appearance. The popup
appearance was controlled remotely from the observation-control room. To this end, a custommade software tool, which had been previously installed in the testing computer, was used.

3.1.5 Information seeking in websites

This scenario simulates a bad interaction experience related to website information architecture
issues. In specific, participants were instructed to visit the website of our Universitys library
( in order to find the authors of a specific book. The specific website was

Multimed Tools Appl

chosen due to complaints about its information architecture that had been collected in a
previous study.
Figure 2 illustrates the differences in skin conductance levels of a randomly-selected
participant during baseline and the first interaction scenario. An increase in participants
arousal, as measured by skin conductance, is obvious.

3.2 Experimental protocol and equipment

The main experiment took place in the facilities of our fully-equipped usability lab (http:// Skin conductance was recorded using the Mindfield eSense portable sensor
with a sampling rate of 5Hz. The Tobii eye-tracker environment (i.e. Tobii Studio) was used in
order to present the selected stimuli scenarios (stressors) to each participant. Tobii Studio was
also used from experiment facilitators in order to monitor participants eye activity in real time
and successfully perform facilitators actions (e.g. delete participants downloaded file in the
first scenario while they were not looking at it). Room temperature was continuously monitored to minimize its effect on the collected skin conductance signals.
Thirty-one healthy volunteers (18 female), aged between 21 and 38 (Mean = 30.8, SD = 4.7)
participated in the experiment. Each user session lasted approximately 40 min, including short
breaks between scenarios. At the end of the experiment, participants were debriefed about the
purpose of the study. They were also provided with access to their recording (i.e. eye-activity
and GSR). The experiment lasted for 6 days.
In each user session, participants were first given a brief tour in the lab area and were
informed that they will be asked to interact with some websites in order to perform some
typical computer tasks. Next, they completed an appropriate consent form along with some
demographic information. Subsequently, the Affect Grid tool was explained to participants.
Afterwards, the skin conductance sensor was placed on participants non-dominant hand, and
in specific on their middle and ring finger respectively. Participants were allowed to relax for
approximately 5 min, while experiment facilitators checked signal transmission quality and
participants body posture in front of the eye-tracker. During this time, one facilitator was
available to answer in any of the participants questions.
The experimental process began with a 1:30 min baseline recording [17, 30], during which
participants were asked to simply relax. Subsequently, the five stress-inducing scenarios were
presented to participants in a random order. At the end of each scenario, participants were

Fig. 2 Skin conductance: Baseline VS Scenario1 (Missing a file)

Multimed Tools Appl

High Arousal

Fig. 3 The Affect Grid rating

scales. Numbers represent a pair of
VA values. In neutral state,
participant is expected to select the
middle square 41, which
corresponds to V = 5 and A = 5






























asked to provide subjective ratings of their emotional experience using the Affect Grid tool.
The Google Forms service was used to implement the Affect Grid tool and collect participants
responses (Fig. 3). Skin conductance was not recorded during the breaks and the selfassessment process.

Fig. 4 Participants ratings in the Affect Grid tool for all stressors. Numbers inside bubbles correspond to the
total number of participants that selected the specific pair of VA values. The minimum and maximum value for
each axis is 1 and 9 respectively

Multimed Tools Appl

Fig. 5 Participants ratings in the VA space for all stressors. Two representative examples of regions defined by
Eq. 1 are shown. Numbers inside bubbles represent how many participants selected the specific pair of VA
values. The minimum and maximum value for each axis is 1 and 9 respectively

4 Defining stress regions in the valence-arousal space

At the end of the experiment, delineated in the previous section, 151 records of VA ratings
were correctly collected; in four cases no response was recorded. Figure 4 provides an
overview of the users cross-scenario ratings recorded using the Affect Grid tool.

4.1 Approach for rectangular stress region construction

In the work presented in [36], the previously-mentioned VA ratings were used in order to identify
stress regions in the VA space. First, nine rectangular regions were constructed starting from the
upper left corner (v, a) = (1, 9) of the VA space. Then, we explored the alignment between
participants self-reported stress and skin conductance in each of these regions. To this end, each
rectangular region was constructed as a two dimensional function of the valence (v) and arousal (a):

Rv; a Pi ; T j S Valence Pi ; T j v and Arousal Pi ; T j a
where Pi denotes the participant i, (i = 1,..31), Tj is the task j, (j = 1,..5) and S is the sample space.
Given a pair of values (v, a) in the VA coordinate system, a rectangular region R(v, a)
is defined and the associated participants ratings are included. To formulate a
classification problem we denote by R the stress region (group 1: stress) and C the
complement of R in the VA space (group 2: other emotion).

Multimed Tools Appl

The exploration of the VA space started from defining R(3, 6), a rather small region in the
upper left corner of VA, which was iteratively expanded horizontally, vertically and diagonally
as far as R(5, 4) (see Fig. 5). Next, each region was associated with the corresponding
participants physiological signals. After, six popular classifiers were used in order to test
the classification accuracy in each region. Previous results [36] showed that the regions R(3,
6), R(3, 5) and R(3, 4) achieved best classification accuracies. However, stress region(s) in the
VA space may not be rectangular and this was a reported limitation of our previous work. In
this paper we explore a new algorithmic approach for segmenting the VA space into nonrectangular regions in order to refine the stress region.

4.2 Proposed approach for non-rectangular stress region construction

In this paper, a new approach, the Incremental Stress Region Construction (ISRC) algorithm
that defines non-rectangular stress regions in the VA space is presented. First, using Eq. (1),
each region is initially defined as a two dimensional function of the valence (v) and arousal (a).
In specific, starting from the upper left point (v, a) = (1, 9) of the Affect Grid tool the first
region is defined Ri(v, a) whereas the rest points consist the complementary region Ci. Next,
the Ri region expands horizontally and vertically with step one (the instruments smallest unit
of analysis) in each direction. In this way N blocks (1 1) contribute to the construction of a
bigger region which will be probably not rectangular (see Fig. 6). Subsequently, each region

Fig. 6 A hypothetical stress region in the VA space produced by a non-rectangular approach. Numbers inside
bubbles represent how many participants selected the specific pair of VA values. The minimum and maximum
value for each axis is 1 and 9 respectively

Multimed Tools Appl

Ri(v, a) and Ci is associated with participants corresponding physiological signals. Afterwards, six popular classifiers, offered in the MATLAB R2015a Statistics and Machine
Learning Toolbox v10.0, are used to test the classification accuracy between regions. Classification results between Ri(v, a) and Ci are used to determine whether or not a specific Ri(v, a)
will be part of the final region. To this end, we have to predefine a classification threshold as
termination criterion of the ISRC algorithm.
The following algorithm presents the final region construction process. The algorithm was
tested for five classification thresholds (i.e., 60, 65, 70, 75, 80, 85) and the results are presented
in Section 5.2.

4.3 Preprocessing of physiological data and assignment to regions of VA ratings

For each VA rating there is an associated GSR signal that has been recorded. All in all, 151
skin conductance signals were recorded from 31 participants involved in five interaction
scenarios. In four cases (once in task 1, once in task 2 and twice in task 4), no signal was
recorded due to sensor malfunction or experimenter error.
First, GSR signals were normalized using a formula proposed in [43]:
Normalized GSRi

GSRmax GSRmin

where GSR(i) is the raw data, GSR(max) is the global maximum and GSR(min) is the global
minimum of the raw GSR per participant. Next, the normalized GSR signals were
smoothed using Gauss adaptive smoothing function offered in Ledalab V3.4.8,1 a
MATLAB application [4, 5] that supports electrodermal activity analysis. The final
dataset consists of 149 signals which were used in the classification process of the
proposed ISRC algorithm; for two signals the corresponding participants VA ratings
were not recorded.

Multimed Tools Appl

From each smoothed GSR signal, 7 statistical features (i.e., mean, median, min, max,
standard deviation, minRatio and maxRatio) were extracted. The same statistics were extracted
from the first and the second differences of each signal. Thus, 21 statistical features were
extracted from each smoothed GSR signal. The extracted features were used to train six
classifiers offered in the MATLAB R2015a Statistics and Machine Learning Toolbox v10.0: a)
Linear Discriminant Analysis (LDA), b) Quadratic Discriminant Analysis (QDA), c) Linear
Support Vector Machine (L-SVM), d) Quadratic Support Vector Machine (Q-SVM), e) Cubic
Support Vector Machine (C-SVM), and f) k-Nearest Neighbors (k-NN). The classification
phase used a 3-fold cross-validation [31] approach with 100 simulations (i.e., 300
runs = 3 100).

5 Results
The main objective of the present study is to identify non-rectangular stress regions in the VA
space by combining self-reported and physiological data. To this end, the following approach
was applied: First, 149 pairs of VA ratings were associated with participants corresponding
GSR signals. Next, 21 statistical features were extracted from preprocessed GSR signals and
were used in the classification process. In contrast to the rectangular approach that has been
followed in [36], here the VA regions were algorithmically constructed based on the Affect
Grids smallest unit of analysis: blocks of 1 1.
Results of the proposed ISRC algorithm with a specific classification threshold (i.e., 75 %)
are presented in Section 5.1, in addition to a comparison with the previously-proposed [36]
rectangular approach. Section 5.2 presents an evaluation study for the classification accuracy
threshold effect on VA stress regions produced by all ISRC instances (classifiers). In Section
5.3, we employ the proposed ISRC algorithm on the regions R(3, 6), R(3, 5) and R(3, 4) that
achieved the best classification accuracies in [36] and compare the obtained results.

5.1 Results for the ISRC algorithm

In this section, we present the results obtained from all ISRC and classifier combinations,
hereafter denoted as ISRC: < name of classifier > and called ISRC instance. For evaluating
classification accuracy of the ISRC instances we used the 3-fold cross-validation repeated 100
times (totally 300 runs). Classification threshold accuracy was set to 75 %.
Figure 7 illustrates the produced non-rectangular stress regions per ISRC instance, whereas
Table 1 includes the corresponding numerical results. Table 1 also presents comparative results
between all six ISRC instances and the corresponding rectangular approach. To this end, the
rectangular region was defined as the convex hull of the corresponding ISRC region i.e., the
smallest rectangle containing the output of ISRC instance (see red borders in Fig. 7). Results
show that the ISRC algorithm refines the corresponding rectangular region (convex hull) in
terms of the VA blocks selected as stressful, which in turn resulted in higher classification
accuracy; from 5.4 % for C-SVM to 24.9 % for kNN.
Figure 7 shows that each ISRC instance produced a slightly different region, including from
8 to 12 blocks in the VA space, for the same classification threshold. This means that different
VA regions achieve the same alignment (at least 75 %) with the associated physiological data
depending on the classifier used. To address this issue, we chose to report the common blocks
(intersection) of the stressful VA regions produced by all six ISRC instances (see Fig. 8d).

Multimed Tools Appl






(f) ISRC:k-NN

Fig. 7 VA stress region(s) identified per ISRC instance based on participants skin conductance. Regions colored
green represent the output of ISRC with classification accuracy at least 75 %. There were no available ratings or
signals for the red hatched blocks. Red frames represent the corresponding convex hull regions

5.2 Effect of classification threshold on VA stress regions produced

This section investigates the influence of the classification threshold on the VA stress regions
produced by ISRC algorithm. To this end, we defined the intersection of the stress regions
produced from all six ISRC instances as output of the ISRC algorithm. Each ISRC instance
was run using 3-fold cross-validation repeated 100 times, and the classification threshold
ranged from 60 to 85 % with step 5 %. Figure 8 illustrates the obtained results, in which orange
blocks constitute the intersection of the corresponding stress regions of each ISRC instance. In
these regions, participants self-reported ratings are in high alignment (from 60 to 85 %) with
their measured skin conductance.

Multimed Tools Appl

Table 1 Comparative results between ISRC instances with threshold 75 % and the corresponding convex hull


Non-Rectangular approach (ISRC)

Rectangular approach


ISRC instance
Convex Hull









75.4 2.5




53.8 3.5




75.6 1.0




56.5 2.9




75.5 1.0
75.9 2.0




65.6 2.1
54.6 3.3



75.1 2.6



69.7 2.6




75.2 2.5




50.3 3.4

5.3 Comparative results between ISRC and previous work

In this section, we focus on a comparison between the proposed ISRC algorithm and
a previous work presented in [36], where a rectangular stress region identification in
the VA space was applied. The rectangular approach showed that regions R(3, 6),
R(3, 5) and R(3, 4) achieved the best performance in terms of alignment between
self-reported ratings and skin conductance signals. Thus, these three regions were
selected to be tested with all ISRC instances. To this end, each ISRC instance was
run using 3-fold cross-validation repeated 100 times, and the classification threshold
was set to 75 %; a value nearest to the best accuracy achieved in the rectangular
Each of the following tables present comparative results between rectangular approach and
ISRC algorithm for R(3, 6), R(3, 5) and R(3, 4) respectively. For each approach the number of
blocks, signals and classification accuracies (%) are presented. Results show that the ISRC
algorithm improves the classification accuracy for all three rectangular regions. In specific,
Tables 2, 3 and 4 show that the improvement ranged from 0.9 to 15.3 % for R(3, 6), from 4.3
to 12.4 % for R(3, 5) and from 7.7 to 20.3 % for R(3, 4), respectively. In addition, R(3, 4)
demonstrates the highest cross-classifier improvement, which might be attributed to the
additional degrees of freedom (number of blocks) made available to the ISRC algorithm for
refining the provided initial rectangular region: 15 blocks in R(3, 4), 12 blocks in R(3, 5) and
10 blocks in R(3, 6) respectively.

6 Conclusions and future work

The work presented in this paper significantly extends findings from our previous study
[36] in the following directions: a) the VA space is segmented into non-rectangular regions
with the proposed Incremental Stress Region Construction (ISRC) algorithm, and b) a new
preprocessing technique is applied on the GSR dataset. In specific, a new approach was
proposed and tested in order to define a specific stress region, probably non-rectangular, in
the VA space. The proposed approach was based on the associations between self-reported
data and skin conductance signals. To the best of our knowledge this is the first study that
defines affect regions (e.g. stress) on VA space using physiological signals.

Multimed Tools Appl

(a) Threshold accuracy 60%

(b) Threshold accuracy 65%

(c) Threshold accuracy 70%

(d) Threshold accuracy 75%

(e) Threshold accuracy 80%

(f) Threshold accuracy 85%

Fig. 8 Orange blocks constitute the intersection of the corresponding stress regions that were produced from six
(6) different ISRC instances for various threshold accuracies

To this end, an experiment was conducted and 31 participants were asked to perform five
carefully selected stress-inducing interaction tasks. The Affect Grid tool was used in order to
collect self-reported data from participants. Participants skin conductance was also recorded.
The stressful interaction scenarios were produced through a research-based approach: interviews with 15 typical computer users and extensive pilot-testing with participants who were
not involved in the main experiment.
Starting from the upper left point - (v, a) = (1, 9) - of the Affect Grid tool the first region was
defined Ri(v, a) labeled as Bstress^. The rest points in the VA space were defined as
complementary region Ci labeled as Bother emotion^. Next, the Ri region was expanded
horizontally and vertically with a step of one block, the Affect Grids smallest unit of analysis,
in each direction. In this way each unit of analysis (1 1) contributes to the construction of a

Multimed Tools Appl

Table 2 Results for R(3, 6): Rectangular vs. ISRC with classification accuracy threshold 75 %


Rectangular approach [36]

Non-rectangular approach (ISRC)




Mean SD


Increase in
percentage points

Accuracy Mean SD




72.4 2.4


78.1 2.0





74.1 1.2


75.0 1.0





74.1 1.1
71.1 2.5



77.0 0.7
79.1 1.8





63.3 3.0


75.2 2.2





64.4 2.8


79.7 2.1


bigger region which will be probably not rectangular. Afterwards, each region Ri (v, a) and Ci
was associated with the participants corresponding physiological signals. Subsequently, six
popular classifiers, which constitute the core of the ISRC algorithm and are offered in the
MATLAB R2015a Statistics and Machine Learning Toolbox v10.0, were used to test the
classification accuracy between Ri and Ci.
Our findings show which regions in the VA rating space may reliably indicate (from 60 to
85 %) self-reported stress that is in alignment with ones measured skin conductance in the
context of typical interactive applications. As a result, HCI and interactive multimedia
researchers and practitioners can employ the Affect Grid in their UEX evaluation studies,
knowing a priori that a specific VA region is associated with both perceived and physiologically experienced stress. One additional important contribution of this work is the proposed
approach for the empirical identification of affect regions in the VA space, which may be also
used for other emotions in the future.
One limitation of this work is that we did not employ any feature selection techniques,
which might improve the classification accuracies. Furthermore, out dataset included VA
blocks with no ratings, which constitutes an additional limitation of this study. Extra studies
are also required to ensure the generalizability of our findings. One of our immediate future
aims is to enlarge our dataset in order to investigate the effect (if any) of gender on the
identified stress region(s) in the VA space. Future work also includes investigating the reported
stress regions using additional physiological signals, such as blood volume pressure,

Table 3 Results for R(3, 5): rectangular vs. ISRC with classification accuracy threshold 75 %


Rectangular approach [36]

Non-Rectangular approach (ISRC)




Mean SD


Increase in
percentage points

Accuracy Mean SD




70.4 2.5



75.7 2.0





70.7 2.0


75.1 1.1





72.1 1.3



76.4 0.7





69.5 2.6



76.7 2.0





62.8 3.4


75.2 2.4





64.1 2.8



75.9 2.4


Multimed Tools Appl

Table 4 Results for R(3, 4): Rectangular vs. ISRC with classification accuracy threshold 75 %
Rectangular approach [36]

Non-Rectangular approach (ISRC)

Increase in percentage

Classifier Blocks Signals Accuracy Blocks Signals Accuracy

Mean SD
Mean SD



62.8 2.9



75.7 2.0





63.1 2.5



75.1 1.0





67.3 1.6
63.1 3.4



75.0 0.8
75.1 1.9





54.9 3.1


75.2 2.5





57.6 3.1



75.0 2.6


respiration and temperature. Finally, VA regions for other emotions might be also investigated
following the methodology described in this paper.

1. Anderson NB (1998) Levels of analysis in health science. A framework for integrating sociobehavioral and
biomedical research. Ann N Y Acad Sci 840:563576
2. Barrett LF (1998) Discrete emotions or dimensions? The role of valence focus and arousal focus. Cogn Emot
12:579599. doi:10.1080/026999398379574
3. Baum A (1990) Stress, intrusive imagery, and chronic distress. Health Psychol Off J Div Health Psychol Am
Psychol Assoc 9:653675
4. Benedek M, Kaernbach C (2010) Decomposition of skin conductance data by means of nonnegative
deconvolution. Psychophysiology 47:647658. doi:10.1111/j.1469-8986.2009.00972.x
5. Benedek M, Kaernbach C (2010) A continuous measure of phasic electrodermal activity. J Neurosci
Methods 190:8091. doi:10.1016/j.jneumeth.2010.04.028
6. Boucsein W (1992) Electrodermal activity. Plenum University Press, New York
7. Brantley PJ, Waggoner CD, Jones GN, Rappaport NB (1987) A daily stress inventory: development,
reliability, and validity. J Behav Med 10:6174
8. Cacioppo JT, Tassinary LG (1990) Inferring psychological significance from physiological signals. Am
Psychol 45:1628
9. Calhoun BH, Lach J, Stankovic J et al (2012) Body sensor networks: a holistic approach from silicon to
users. Proc IEEE 100:91106. doi:10.1109/JPROC.2011.2161240
10. Campbell JD, Chew B, Scratchley LS (1991) Cognitive and emotional reactions to daily events: the effects of
self-esteem and self-complexity. J Pers 59:473505
11. Chanel G, Rebetez C, Btrancourt M, Pun T (2011) Emotion assessment from physiological signals for
adaptation of game difficulty. IEEE Trans Syst Man Cybern Part Syst Hum 41:10521063. doi:10.1109/
12. Chauncey A, Azevedo R (2010) Emotions and motivation on performance during multimedia learning: how
do i feel and why do i care? In: Aleven V, Kay J, Mostow J (eds) Intell. Tutoring Syst. Springer, Berlin, pp
13. Chung S, Cheon J, Lee K-W (2015) Emotion and multimedia learning: an investigation of the effects of
valence and arousal on different modalities in an instructional animation. Instr Sci 43:545559. doi:10.1007/
14. de Santos Sierra A, Avila CS, Guerra Casanova J, et al. (2010) Two stress detection schemes based on
physiological signals for real-time applications. In: 2010 Sixth Int. Conf. Intell. Inf. Hiding Multimed. Signal
Process. IIH-MSP. pp 364367
15. Deaver CM, Miltenberger RG, Smyth J et al (2003) An evaluation of affect and binge eating. Behav Modif
16. Diamond DM, Campbell AM, Park CR et al (2007) The temporal dynamics model of emotional memory
processing: a synthesis on the neurobiological basis of stress-induced amnesia, flashbulb and traumatic
memories, and the Yerkes-Dodson law. Neural Plast 2007:60803. doi:10.1155/2007/60803

Multimed Tools Appl

17. Drachen A, Nacke LE, Yannakakis G, Pedersen AL (2010) Correlation between heart rate, electrodermal
activity and player experience in first-person shooter games. Proc. 5th ACM SIGGRAPH Symp. Video
Games. ACM, New York, pp 4954
18. Eich E, Macaulay D, Ryan L (1994) Mood dependent memory for events of the personal past. J Exp Psychol
Gen 123:201215
19. Endsley MR (1988) Situation awareness global assessment technique (SAGAT). In: Aerosp. Electron. Conf.
1988 NAECON 1988 Proc. IEEE 1988 Natl. pp 789795 vol 3
20. Ganglbauer E, Schrammel J, Tscheligi M (2009) Possibilities of psychophysiological methods for measuring
emotional aspects in mobile contexts. Proc. Mob. HCI
21. Hart SG, Staveland LE (1988) Development of NASA-TLX (Task Load Index): results of empirical and
theoretical research. In: Meshkati PAH and N (ed) Adv. Psychol. North-Holland, pp 139183
22. Healey JA, Picard RW (2005) Detecting stress during real-world driving tasks using physiological sensors.
IEEE Trans Intell Transp Syst 6:156166. doi:10.1109/TITS.2005.848368
23. Hernandez J, Morris RR, Picard RW (2011) Call center stress recognition with person-specific models. In:
DMello S, Graesser A, Schuller B, Martin J-C (eds) Affect. Comput. Intell. Interact. Springer, Berlin, pp 125134
24. Hernandez J, Paredes P, Roseway A, Czerwinski M (2014) Under pressure: sensing stress of computer users.
Proc. SIGCHI Conf. Hum. Factors Comput. Syst. ACM, New York, pp 5160
25. James W (1994) The physical basis of emotion. Psychol Rev 101:205210. doi:10.1037/0033-295X.101.2.205
26. Jensen AR, Rohwer WD (1966) The Stroop color-word test: a review. Acta Psychol (Amst) 25:3693
27. Katsanos C, Tselios N, Goncalves J et al (2014) Multipurpose public displays: can automated grouping of
applications and services enhance user experience? Int J Hum-Comput Interact 30:237249. doi:10.1080/
28. Killgore WD (1998) The Affect Grid: a moderately valid, nonspecific measure of pleasure and arousal.
Psychol Rep 83:639642. doi:10.2466/pr0.1998.83.2.639
29. Kivikangas JM, Chanel G, Cowley B et al (2011) A review of the use of psychophysiological methods in
game research. J Gaming Virtual Worlds 3:181199. doi:10.1386/jgvw.3.3.181_1
30. Koelstra S, Muhl C, Soleymani M et al (2012) DEAP: a database for emotion analysis; using physiological
signals. IEEE Trans Affect Comput 3:1831. doi:10.1109/T-AFFC.2011.15
31. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proc.
14th Int. Jt. Conf. Artif. Intell. - Vol. 2. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 11371143
32. Law EL-C, Roto V, Hassenzahl M et al (2009) Understanding, scoping and defining user experience: a
survey approach. Proc. SIGCHI Conf. Hum. Factors Comput. Syst. ACM, New York, pp 719728
33. Lazar DJ, Feng DJH, Hochheiser DH (2010) Research methods in human-computer interaction. John Wiley
& Sons
34. Lee M-F, Chen G-S, Hung JC, et al. (2014) Data mining in emotion color with affective computing.
Multimed Tools Appl 114. doi: 10.1007/s11042-014-2231-8
35. Liapis A, Katsanos C, Sotiropoulos D, et al (2015) Recognizing emotions in human computer interaction:
studying stress using skin conductance. In: Abascal J, Barbosa S, Fetter M, et al (eds) Hum.-Comput.
Interact. INTERACT 2015. Springer International Publishing, pp 255262
36. Liapis A, Katsanos C, Sotiropoulos D et al (2015) Subjective assessment of stress in HCI: a study of the
valence-arousal scale using skin conductance. Proc. 11th Biannu. Conf. Ital. SIGCHI Chapter. ACM, New
York, pp 174177
37. Lin T, Imamiya A, Mao X (2008) Using multiple data sources to get closer insights into user cost and task
performance. Interact Comput 20:364374. doi:10.1016/j.intcom.2007.12.002
38. Lin T, Omata M, Hu W, Imamiya A (2005) Do physiological data relate to traditional usability indexes? In:
Proc. 17th Aust. Conf. Comput.-Hum. Interact. Citiz. Online Consid. Today future. Computer-Human
Interaction Special Interest Group (CHISIG) of Australia, Narrabundah, Australia, Australia, pp 110
39. Lopatovska I, Arapakis I (2011) Theories, methods and current research on emotions in library and
information science, information retrieval and humancomputer interaction. Inf Process Manag 47:575
592. doi:10.1016/j.ipm.2010.09.001
40. Lunn D, Harper S (2010) Using galvanic skin response measures to identify areas of frustration for older
Web 2.0 users. In: Proc. 2010 Int. Cross Discip. Conf. Web Access. W4A. ACM, New York, NY, USA, p 34:
41. Lv H-R, Lin Z-L, Yin W-J, Dong J (2008) Emotion recognition based on pressure sensor keyboards. In: 2008
I.E. Int Conf Multimed Expo. pp 10891092
42. Mahlke S, Minge M (2008) Consideration of multiple components of emotions in human-technology
interaction. In: Peter C, Beale R (eds) Affect Emot. Hum.-Comput. Interact. Springer, Berlin, pp 5162
43. Mandryk RL, Atkins MS (2007) A fuzzy physiological approach for continuously modeling emotion during
interaction with play technologies. Int J Hum-Comput Stud 65:329347. doi:10.1016/j.ijhcs.2006.11.011

Multimed Tools Appl

44. Mandryk RL, Atkins MS, Inkpen KM (2006) A continuous and objective evaluation of emotional experience
with interactive play environments. Proc. SIGCHI Conf. Hum. Factors Comput. Syst. ACM, New York, pp
45. Mead KML, Ball LJ (2007) Music tonality and context-dependent recall: the influence of key change and
mood mediation. Eur J Cogn Psychol 19:5979. doi:10.1080/09541440600591999
46. Peter C, Herbon A (2006) Emotion representation and physiology assignments in digital systems. Interact
Comput 18:139170. doi:10.1016/j.intcom.2005.10.006
47. Picard RW (2000) Affective computing, 1st edition. The MIT Press
48. Ritz T, Thns M, Fahrenkrug S, Dahme B (2005) Airways, respiration, and respiratory sinus arrhythmia
during picture viewing. Psychophysiology 42:568578. doi:10.1111/j.1469-8986.2005.00312.x
49. Russell JA (1980) A circumplex model of affect. J Pers Soc Psychol 39:11611178. doi:10.1037/h0077714
50. Russell YI, Gobet F (2012) Sinuosity and the affect grid: a method for adjusting repeated mood scores.
Percept Mot Skills 114:125136. doi:10.2466/03.28.PMS.114.1.125-136
51. Russell JA, Weiss A, Mendelsohn GA (1989) Affect grid: a single-item scale of pleasure and arousal. J Pers
Soc Psychol 57:493502. doi:10.1037/0022-3514.57.3.493
52. Schachter S, Singer J (1962) Cognitive, social, and physiological determinants of emotional state. Psychol
Rev 69:379399. doi:10.1037/h0046234
53. Scheirer J, Fernandez R, Klein J, Picard RW (2001) Frustrating the user on purpose: a step toward building
an affective computer
54. Strain AC, Azevedo R, DMello S (2012) Exploring relationships between learners affective states,
metacognitive processes, and learning outcomes. In: Cerri SA, Clancey WJ, Papadourakis G, Panourgia K
(eds) Intell. Tutoring Syst. Springer, Berlin, pp 5964
55. Tsui W-H, Lee P, Hsiao T-C (2013) The effect of emotion on keystroke: an experimental study using facial
feedback hypothesis. Conf Proc Annu Int Conf IEEE Eng Med Biol Soc IEEE Eng Med Biol Soc Annu
Conf 2013:28702873. doi:10.1109/EMBC.2013.6610139
56. Vermeeren APOS, Law EL-C, Roto Vet al (2010) User experience evaluation methods: current state and development
needs. Proc. 6th Nord. Conf. Hum.-Comput. Interact. Extending Boundaries. ACM, New York, pp 521530
57. Wang S, Liu Z, Zhu Y et al (2014) Implicit video emotion tagging from audiences facial expression.
Multimed Tools Appl 74:46794706. doi:10.1007/s11042-013-1830-0
58. Wang S, Zhu Y, Wu G, Ji Q (2013) Hybrid video emotional tagging using users EEG and video content.
Multimed Tools Appl 72:12571283. doi:10.1007/s11042-013-1450-8
59. Ward RD, Marsden PH (2004) Affective computing: problems, reactions and intentions. Interact Comput 16:
707713. doi:10.1016/j.intcom.2004.06.002
60. Wilson GM, Sasse MA (2000) Do users always know whats good for them? Utilising physiological
responses to assess media quality. In: CPsychol SMB (Hons) MSc, Waern Y, FRSA GCM (Cantab)
PGCE (eds) People Comput. XIV Usability Else. Springer London, pp 327339

Alexandros Liapis is a Ph.D. candidate in the Hellenic Open Universitys School of Science and Technology.
His research interests include Human-Computer Interaction, Usabiity Evaluation and Physiological Signal
Analysis. Liapis graduated from the Department of Financial Applications at the Technological Institute of
Western Macedonia, and received his M.Sc. from the Department of Applied Informatics at the University of
Macedonia. Contact him at

Multimed Tools Appl

Christos Katsanos is a post-doctoral researcher in the Hellenic Open Universitys School of Science and
Technology and an adjunct professor at the Business Administration Department of the Technological Educational Institute of Western Greece. His research interests include Human-Computer Interaction, Web Accessibility, Information Architecture, Educational Technology and Human-Robot Interaction. Katsanos received his
Dipl.-Ing. and Ph.D. from the Department of Electrical and Computer Engineering at the University of Patras.
Contact him at

Dimitris Sotiropoulos is an adjunct professor and post-doctoral researcher in the Hellenic Open Universitys
School of Science and Technology. His research interests include Machine Learning, Global Optimization,
Artificial Neural Networks, Interval Methods and Physiological Signal Analysis. Sotiropoulos received his B.Sc.
and Ph.D. from the Department of Mathematics at the University of Patras. Contact him at

Multimed Tools Appl

Nikos Karousos is a post-doctoral researcher in the Hellenic Open Universitys School of Science and
Technology and an adjunct professor at the Technological Educational Institute of Western Greece. His research
interests include Hypertext, Service Oriented Architecture, Application Development, Design of Knowledge
Management Systems and Software Evaluation. Karousos graduated from the Computer Engineering & Informatics Department of University of Patras, and holds an M.Sc diploma and a Ph.D. diploma from the same
department. Contact him at

Michalis Xenos is a professor in the Computer Science Department of the School of Science and Technology of
the Hellenic Open University, Director of the Computer Science Course, Director of the Internal Assessment and
Education Unit, Director of the Software Quality Research Group and Director of the Software Quality
Laboratory. His current research interests include, inter alia, Software Quality, Human Computer Interaction
and Educational Technologies. Xenos received his B.Eng., M.Sc. and Ph.D. from the Department of Computer
Engineering & Informatics at the University of Patras. Contact him at

You might also like