Professional Documents
Culture Documents
Arousal and Validation in 2023
Arousal and Validation in 2023
4, 1997
1Preparation of this article was supported in part by National Institute of Mental Health
(NIMH) grant MH18931 to Paul Ekman and Robert Levenson for the NIMH Postdoctoral
Training Program in Emotion Research. I thank Paul Ekman for permitting access to the
data analyzed here. I also thank Jerome Kagan and several anonymous reviewers for their
helpful comments on this manuscript.
2Address all correspondence concerning this article to Nancy Alvarado, who is now at the
Department of Psychology (0109), University of California at San Diego, 9500 Gilman Drive,
La Jolla, California 92093-0109.
323
0146-7239/97/1200-0323$12..50/0 <8 1997 Plenum Publishing Corporation
324 Alvarado
affect changes with the focus of attention. She speculated that valence focus
"may be associated with the tendency to attend to environmental, particu-
larly social cues " (p. 163) whereas arousal focus may be related to internal
(somesthetic) cues, citing Blascovich (1990; Blascovich et al., 1992). This
paper presents support for Feldman's views, in a direct-scaling self-report
context where valence and arousal are reported independently and the en-
vironmental cues are held constant, using data originally collected by Ek-
man, Friesen, and Ancoli (1980).
gorilla film evoking the greatest duration and intensity of facial activity,
the puppy film showing the greatest frequency of facial activity, and only
seven subjects showing any facial response to the ocean film. From this,
Fridlund argued that the gorilla and puppy films were somehow more social
in nature, evoking more facial expression because such expressions only
arise from social antecedents. However, this is only true if the films did in
fact evoke the same emotional responses. As will be argued later, I believe
they did not.
Consensus Modeling
'According to Hays (1988), these assumptions can be violated without greatly affecting results
when a fixed-effects model is used to test inferences about specific means. Violating
assumptions of normality and equal variance has serious consequences for a random-effects
model used to test inferences about the variance of the population effects.
328 Alvarado
METHOD
This analysis was performed upon the original self-report data col-
lected by Ekman et al. (1980), rather than the summaries provided by the
resulting article. Additional details about the data collection procedures
Scaling Emotional Response to Film Clips 329
were provided in that article and are omitted here, except where relevant
to the arguments presented.
Subjects
Stimuli
Procedure
Subjects rated their emotional responses for two baseline periods and
five film-viewing periods using a series of nine unipolar 9-point scales, la-
beled with the following terms: Interest, Anger, Disgust, Fear, Happiness,
Pain, Sadness, Surprise, and Arousal. Pain was defined for subjects as "the
experience of empathetic pain" and Arousal was explained as applying to
the total emotional state rather than to any one of the other scales pre-
sented. The other terms were not explained to subjects. Scales ranged from
0 (no emotion) to 8 (strongest feeling). Instructions explained how the ratings
were to be made (Ekman et al., 1980): "... strength of a feeling should
be viewed as a combination of (a) the number of times you felt the emo-
tion—its frequency; (b) the length of time you felt the emotion—its dura-
tion; and (c) how intense or extreme the emotions [sic] was—its intensity"
(p. 1127).
The first baseline occurred during a 20-min period in which the subject
was instructed to relax. The presentation of pleasant or unpleasant films
first was counterbalanced. Ratings for all three pleasant films were made
after viewing all three films. Similarly, ratings for the two unpleasant films
were made after viewing both films, A second baseline rating was made
330 Alvarado
after rating of the first set of films, during a 5-min interval before starting
the second series of films.
RESULTS
Consensus Analysis
lyze the data using an ordinal consensus model, but such a model has not
yet been developed. The categorical, multiple-choice model used here as-
sumes an equal probability of guessing the alternatives in its correction for
guessing. The analysis of normality (presented later) suggests that this as-
sumption is appropriate for some but not all of the rating scales. With or-
dinal data, it is more likely that guessing biases differ among the rating
alternatives (e.g., the probability of guessing 5 may be different than the
probability of guessing 0). A model incorporating such biases had not been
developed at the time this analysis was performed, but now exists (see
Klauer and Batchelder, 1996). In general, the application of a categorical
model to what we suspect is ordinal data tends to work against a finding
of consensus because subjects must agree on the exact rating number given
to each stimulus out of nine alternatives (0 to 8).
The measures used to evaluate results are (1) individual competence
scores, (2) mean competence, (3) eigenvalues produced during the principal
component analysis used to estimate the solution to the model's equations,
and (4) answer key confidence estimates. Competence scores range from
-1.00 to 1.00 and are maximum-likelihood parameter estimates. They are
best understood as estimated probabilities rather than correlation coeffi-
cients. A negative competence score indicates extreme and consistent dis-
agreement with the group across rating periods.
Batchelder and Romney (1988, 1989) established three criteria for
judging whether consensus exists in subject responses to questions about a
domain: (1) eigenvalues showing a single dominant factor (a ratio greater
than 3:1 between the first and second factors), (2) a mean competence
greater than .500, and (3) absence of negative competence scores in the
group of subjects. While failure to meet these criteria does not necessarily
rule out consensus, it can indicate a poor fit between the data and the
model.
Consensus analysis results for the nine scales across the seven rating
periods are summarized in Table I. All scales except those labeled Interest
and Arousal met the criteria for consensus. In contrast, the scales for In-
terest and Arousal showed nearly half the group with negative consensus
scores, indicating severe disagreement about the correct responses on those
scales. The scales for Anger, Disgust, and Pain showed the greatest con-
sensus, with the highest mean consensus scores and with eigenvalue ratios
indicating a single dominant factor in the data. While the scales for Sadness
and Surprise each showed a single negative consensus score, the otherwise
high mean consensus scores and ratios between the eigenvalues suggest that
consensus also existed for those scales.
This finding of consensus for seven of the nine scales suggests that
subjects agreed strongly in their emotional responses to the stimuli pre-
332 Alvarado
sented, particularly with respect to the scales labeled Anger, Disgust, and
Pain. Lesser agreement existed for Surprise and Fear, and for Happiness
and Sadness. Based upon the measures provided by this model, consensual
emotional response did not exist for the two scales labeled Arousal and
Interest. The importance of this finding will be discussed later.
Answer key confidence levels were high (M = .95), even when emo-
tional response was reported, but consensus appeared to be largely gov-
erned by agreement about the absence of negative emotion during the
pleasant film clips, and the absence of positive emotion during the unpleas-
ant film clips.4 The scales showing lower consensus (but nevertheless meet-
ing the criteria for consensus), Sadness, Happiness, and Surprise, showed
minor violations of this pattern. Because the presentation of films was
counterbalanced, half of the subjects saw pleasant films and half saw un-
pleasant films before the second baseline. From the ratings, several subjects
appeared to have carried residual negative emotional response into this
second baseline period, producing mixed ratings. They may also have car-
ried such response into the pleasant film ratings, as Ekman et al. (1980)
4Thisis far from a trivial finding, as several emotion theorists have hypothesized that complex
emotional responses may be blends of basic emotions and thus have insisted that multiple
scales be provided to permit subjects to express such complexity. A lack of response is thus
as meaningful as positive response on each single scale with respect to each rating context.
Scaling Emotional Response to Film Clips 333
Anger 0 0 0 0 0 0 0
Disgust 0 0 0 0 0 0 5
Fear 0 0 0 0 0 1 8
Happiness 0 4 0 0 0 0 0
Pain 0 0 0 0 0 8 8
Sadness 0 0 0 0 0 0 0
Surprise 0 0 0 0 0 8 6
Interest 0 1 1 1 0 3 5
Low 0 1 1 1 0 3 5
High 4 6 7 6 2 5 7
Arousal 0 1 2 1 0 1 3
Low 0 1 2 1 0 1 3
Medium 1 4 1 4 1 6 5
High 2 3 6 3 4 8 8
noted in their discussion. Nor were the pleasant films unambiguously pleas-
ant. Five subjects responded to the gorilla film with mild anger, and four
responded to the puppy film with even stronger anger (e.g., 6, 7, or 8).
Similarly, several subjects reported sadness when watching the gorilla film,
and several reported disgust while watching the puppy film. These re-
sponses may be partly explained by the content of the films. The puppy
ultimately chewed up and spit out the flower with which it was playing,
evoking disgust in some subjects. The gorilla may have aroused sadness
because it resided in a zoo. The lower consensus for the Fear and Surprise
ratings result from several subjects who claimed to have felt no surprise
or fear in response to the second workshop accident.
Model-predicted answer key responses for each of the scales during
each of the viewing periods are shown in Table II. Examination of the an-
swer key for the Happiness rating scale shows a clear difference in the
level of enjoyment among subjects for the three film clips. The gorilla film
was rated as 4, the ocean film as 0, and the puppy film as 0. The consensus
model makes these predictions by weighting each subject's response by that
subject's overall agreement with the group (the estimated probability of
correctness). Even without the model's weighting, these responses were the
modal responses among subjects for these films. It is only when all re-
sponses are averaged that higher numbers emerge for the ocean and puppy
films. To see why this occurs, consider a group in which equal numbers of
subjects give ratings of 0 and 8 and no other ratings. When these are av-
eraged to obtain a mean of 4.0, it should be evident that this rating is an
accurate portrayal of emotional response for no single subject in that group.
334 Alvarado
Nor will it be a good predictor of the response of the next subject who
views the film. The actual distribution of scores generally raises an alarm
about using the mean as an indicator of central tendency (see the analysis
of normality below).
During subsequent research, Ekman and Friesen edited the puppy film
to remove the portion where the puppy eats the flower, and thereby ob-
tained higher enjoyment ratings. Examination of the disgust and anger
scales provided important clues to the differing emotions evoked in indi-
vidual subjects by this particular film. The difference in content may ac-
count for the puppy film's higher frequency of smiling but lower duration
and intensity of smiling, compared to the gorilla film (Ekman et al., 1980).
Differing emotions were not reported across the nine scales for the ocean
film. This analysis shows that the ocean film was simply not as enjoyable
as the gorilla film. The finding that few subjects smiled while viewing it is
entirely consistent with the self-report ratings obtained for the ocean film.
Although responses are typically distributed across a range of response
options in any data set, even one showing strong consensus, the process of
consensus modeling permits identification of those subjects with consistently
divergent response patterns across the set of questions. These divergent sub-
jects obtain negative consensus scores during analysis. By partitioning the
data set based upon the sign of the consensus score (negative or positive),
Fig. 1. Examples of used and unused valenced rating scales: Ratings of Surprise for the
cut finger film clip (top) and ratings of Anger for the puppy film clip (bottom). Std. Dev.
= standard deviation.
of the data for the Arousal and Interest scales yielded no coherent sub-
groups because the resulting partitioned data sets also failed to meet the
criteria for consensus (see Table III). Instead, responses seemed to be dis-
tributed across the range of possible responses. However, subjects with nega-
tive consensus scores on the arousal scale tended to obtain negative
consensus scores on the Interest scale as well (Goodman and Kruskal's
gamma = .54). This suggests several conclusions: (1) the scales for Arousal
and Interest do not lend themselves to this type of categorical analysis; (2)
subjects are idiosyncratic but consistent in their response using these two
scales; and (3) there is no single correct (i.e., consensual) rating response
for arousal or interest with respect to these stimuli. This suggests a quali-
tative difference in behavior among subjects when using the Arousal and
Interest scales compared to the remaining scales.
Analysis of Normality
Frequency histograms were produced for each of the nine rating scales,
by stimulus rating period. In general, a given scale was either used or un-
used (mostly 0 ratings) for a given stimulus, consistent with the consensus
analysis results described above and shown in Table II. When a scale was
used, the distribution was frequently bimodal and generally included a sub-
stantial minority reporting no affect (0 ratings), as shown in Fig. 1. In con-
trast, ratings of arousal and interest were distributed across the entire range
of scores for each rating period, as shown in Fig. 2.
Happiness ratings were spread across the entire scale for all three
pleasant films, as shown in Fig. 3. However, none of the distributions was
normal. A representative comparison of observed versus expected scores,
and detrended deviation from an expected normal distribution, are plotted
in Fig. 4. Consistent with consensus analysis, the modal response for both
the puppy and ocean films was 0. Note that although the means for the
three pleasant film clips were equal, the distributions were clearly different.
These differences, especially with respect to those reporting no affect (0
ratings), are fully consistent with the differences in smiling noted by Ekman
et al. (1980) and do not support Fridlund's interpretation that little smiling
occurs because the ocean film evoked equal happiness but was asocial in
content.
Patterns of Correlation
Fig. 2. Ratings of Arousal for the gorilla and puppy film clips. Std. Dev. = standard deviation.
Fig, 3. Ratings of Happiness for the three pleasant film clips. Std. Dev. = standard deviation.
Correlations between happiness and arousal for the seven rating pe-
riods are shown in Table IV A significant Spearman rank order correlation
(p < .01) was found between Arousal and Happiness for each of the three
Scaling Emotional Response to Film Clips 339
Fig. 4. Detrended normal Q-Q plot of Happiness ratings for the gorilla film clip.
tent and served as a less affective interval between the other two pleasant
stimuli.
That most subjects reported arousal even when they reported no va-
lenced emotion (e.g., during baseline periods) supports the consensus
analysis evidence that valence is experienced differently than arousal, that
it varies with the stimulus, and that it is only related to arousal when the
magnitude of the rating is considered. In other words, arousal appears to
be related to the selection of a particular value on the Happiness rating
scale, but unrelated to whether that scale was used. The strong correlation
between arousal and valence scales when a valenced emotion was reported
suggests that subjects were using the arousal and valence scales in a con-
sistent manner, on an individual basis. They were clearly using the arousal
and valence scales inconsistently as a group because consensus emerged
for valence but not for arousal, and because no consensus for arousal ex-
isted despite consensus for valence.
DISCUSSION
This reanalysis suggests that (1) the mean ratings used as norms were
a misleading assessment of the happiness evoked by the film clips; (2) sub-
ject ratings were consensual, varying with stimulus properties for the rating
scales labeled using valenced emotion terms, but were idiosyncratic, varying
from a personal baseline, for the scales labeled using arousal terms; (3)
subjects appeared to use the valence-related scales differently than the
arousal-related scales across the rating contexts; and (4) the magnitude of
ratings of valence appeared related to the magnitude of arousal reported
when valenced emotion was reported (but not vice versa).
When emotional response ratings were treated as discrete, categorical
data rather than as interval-scaled continuous data, results showed strong
agreement among subjects with respect to scales labeled using emotion
terms, including those labeled with the terms Anger, Disgust, Sadness, Hap-
piness, Fear, and Surprise. Strong agreement was also found with respect
to the scale labeled Pain. Strong disagreement among subjects was shown
with respect to the scales labeled Interest and Arousal, across the spectrum
of rating contexts. Further, stimuli considered equal in their ability to evoke
Happiness ratings when responses were analyzed as interval-scaled data
were found to be quite different in their enjoyment potential when analyzed
discretely. This may account for the previously reported failure to find
equal facial expressivity in response to equally rated film clips.
The analysis of normality suggests that averaged means do not ac-
curately characterize group response for this data set. Further, substantial
342 Alvarado
servations by Larsen and Diener (1992) that the practice of labeling scales
using adjectives from different octants of the emotion circumplex will pro-
duce different rating behavior, and that the dimensions of pleasantness or
unpleasantness versus activation seem to vary independently of each other.
To support this, Larsen and Diener (1992) described findings that the
Velten mood induction techniques tend to change hedonic tone (evalu-
ation) without affecting activation. My reanalysis confirms this.
While self-reported arousal does vary with external circumstances, and
appears to have the characteristics of a state rather than a trait measurement
(Matthews, Davies, & Lees, 1990), Matthews et al. noted the following:
. . .Revelle (personal communication [to Matthews et al.], July 11, 1988) pointed
out that individuals' self-ratings of arousal may be affected by individual differences
in characteristic baseline levels of arousal, so that arousal ratings are not directly
comparable across subjects . . . . Thus, only a part of the interindividual variance
in arousal scores will reflect absolute arousal values; a second part will reflect
interindividual variation in baseline, (pp. 151-152)
The .54 gamma correlation between Arousal and Interest scores for the
same subject may exist because both scales vary from the same baseline,
not because they both measure the same construct.
The scales analyzed in this study drew their terms from different
quadrants of Larsen and Diener's (1992) two-dimensional self-report
space. The Interest, Arousal, and Surprise scales were labeled with terms
from the activation dimension. The Happiness, Sadness, Anger, and Dis-
gust scales were labeled with terms from the hedonic (pleasant/unpleas-
ant) dimension. Pain did not appear in the circumplex because it is not
usually considered an affect term, but it seems closest to terms like mis-
erable or distressed in the hedonic dimension. Fear appears midway be-
tween the activation and hedonic dimensions, in a quadrant for activated
unpleasant affect.
The analysis reported here supports Larsen and Diener's (1992)
contention that the dimensions of activation and pleasantness/unpleas-
antness are orthogonal, at least with respect to introspective monitoring
and self-report. In these results, the activation reported on the Arousal
and Interest scales appears to vary differently than the remaining rating
scales for the stimuli presented. Even the scales combining hedonic affect
and activation, i.e., the Surprise and Fear scales, show considerable con-
sensual response with strong ratings by subjects in response to the second
unpleasant film clip (where an accidental death is shown). Although con-
sidered to be located in the high activation quadrant of Larsen and Di-
ener's self-report affect circumplex, these scales nevertheless show
consensual response. Surprise and fear typically involve strong autonomic
Scaling Emotional Response to Film Clips 345
APPENDIX
If the answer key is not known, the parameters are estimated using the
following equations:
REFERENCES
Batchelder, W. & Romney, A. (1988). Test theory without an answer key. Psychometrika, 53,
71-92.
Batchelder W., & Romney, A. (1989). New results in test theory without an answer key. In
E. Roskam (Ed.), Mathematical psychology in progress (pp. 229-248). Heidelberg,
Germany: Springer Verlag.
Blascovich, J. (1990). Individual differences in physiological arousal and perception of arousal:
Missing links in Jamesian notions of arousal-based behaviors. Personality and Social
Psychology Bulletin, 16, 665-675.
Blascovich, J., Tomaka, J., Brennan, K., Kelsey, R., Hughes, P., Coad, M. L, & Adlin, R.
(1992). Affect intensity and cardiac arousal. Journal of Personality and Social Psychology,
63, 164-174.
Borgatti, S. (1993). Anthropac 4.0. Columbia, SC: Analytic Technologies, Inc.
Clore, G. (1992). Cognitive phenomenology: Feelings and the construction of judgment. In
L. Martin & A. Tesser (Eds.), The construction of social judgments (pp. 133-163). Hillsdale,
NJ: Erlbaum.
Clore, G., Ortony, A., & Foss, M. (1987). The psychological foundations of the affective
lexicon. Journal of Personality and Social Psychology, S3, 751-766.
Ekman, P., Friesen, W., & Ancoli S. (1980). Facial signs of emotional experience. Journal of
Personality and Social Psychology, 39, 1125-1134.
Feldman, L. (1995). Valence focus and arousal focus: Individual differences in the structure
of affective experience. Journal of Personality and Social Psychology, 69, 53-166.
348 Alvarado