Does The Kuleshov Effect Really Exist?

Article
Perception
Does the Kuleshov Effect 2016, 0(0) 1–28
! The Author(s) 2016
Really Exist? Revisiting a Reprints and permissions:
sagepub.co.uk/journalsPermissions.nav
Classic Film Experiment DOI: 10.1177/0301006616638595

pec.sagepub.com
on Facial Expressions and

Emotional Contexts
Daniel Barratt
Centre for Cognitive Semiotics, Lund University, Sweden
Department of International Business Communication,
Copenhagen Business School, Denmark
Anna Cabak Rédei

Centre for Cognitive Semiotics, Lund University, Sweden
Åse Innes-Ker
Department of Psychology, Lund University, Sweden
Joost van de Weijer

Humanities Laboratory, Lund University, Sweden
Abstract
According to film mythology, the Soviet filmmaker Lev Kuleshov conducted an experiment in
which he combined a close-up of an actor’s neutral face with three different emotional
contexts: happiness, sadness, and hunger. The viewers of the three film sequences reportedly
perceived the actor’s face as expressing an emotion congruent with the given context. It is not
clear, however, whether or not the so-called ‘‘Kuleshov effect’’ really exists. The original film
footage is lost and recent attempts at replication have produced either conflicting or unreliable
results. The current paper describes an attempt to replicate Kuleshov’s original experiment using
an improved experimental design. In a behavioral and eye tracking study, 36 participants were each
presented with 24 film sequences of neutral faces across six emotional conditions. For each film
sequence, the participants were asked to evaluate the emotion of the target person in terms of
valence, arousal, and category. The participants’ eye movements were recorded throughout. The
results suggest that some sort of Kuleshov effect does in fact exist. For each emotional condition,
the participants tended to choose the appropriate category more frequently than the alternative
options, while the answers to the valence and arousal questions also went in the expected
directions. The eye tracking data showed how the participants attended to different regions of
Corresponding author:
Daniel Barratt, Department of International Business Communication, Copenhagen Business School, Denmark.
Email: db.ibc@cbs.dk
2 Perception 0(0)
the target person’s face (in light of the intermediate context), but did not reveal the expected
differences between the emotional conditions.
Keywords
Kuleshov effect, film editing, facial expressions, emotional contexts, eye tracking
Introduction
In the early days of cinema, the Soviet filmmaker Lev Kuleshov (1899–1970) conducted an
experiment that has become part of the mythology of film history. Legend has it that
Kuleshov combined an image of the Russian actor Ivan Mozhukin’s neutral face with a
variety of different emotional contexts, including a little girl playing with a doll, a dead
woman in a coffin, and a bowl of soup (e.g., Pudovkin, 1970, p. 168). The viewers of the
three film sequences were reported to have perceived Mozhukin’s face as expressing
happiness, sadness, and hunger (or thoughtfulness) respectively. Since then, the capacity
for emotional contexts to influence the viewer’s interpretation of a neutral face has been
called the Kuleshov effect and has been cited by a number of filmmakers, film theorists, and
psychologists (also known as the ‘‘Kuleshov experiment’’ or the ‘‘Mozhukin effect’’). In the
world of film, perhaps the archetypal example of this is provided by the celebrated
discussions between the British director Alfred Hitchcock and the French director
François Truffaut in August 1962. Hitchcock states that the primary editing structure of
his film Rear Window (1954) is based on the Kuleshov effect, with James Stewart’s character
(Jeff) allegedly providing the emotionally ambiguous (though rarely neutral) face, and the
views out of the character’s apartment window providing the specific emotional contexts (see
Truffaut, 1984, pp. 213–223).
The numerous references to the Kuleshov effect within both film and psychology imply
that the effect is an established and uncontroversial phenomenon. Upon closer examination,
however, it is not clear whether the Kuleshov effect really exists at all. To start with,
Kuleshov’s original film footage is lost and the historical accounts of the corresponding
experiment are both vague and inconsistent (cf., Hill, 1967, p. 8; Pudovkin, 1970, p. 168).
Furthermore, there have been only two previous attempts at replicating the original
experiment and these attempts have produced either conflicting or unreliable results. The
current study should be regarded as an indirect replication in the sense that it attempts to
replicate Kuleshov’s original intention—investigating the impact of different emotional
contexts on the interpretation of a neutral facial expression—but uses an improved and
extended stimulus set in conjunction with an improved experimental design.1
Understanding a Kuleshov-type Sequence

A Kuleshov-type sequence can be understood as a crossover between two fundamental
modes of film editing. The sequence was originally conceived of as an example of Soviet
montage editing, where the basic goal is to combine two images in order to create a new idea
not present in either of the constituents. For example, combining an image of a neutral face
with an image of a bowl of soup may create the idea of hunger. This leaves open the question
of how the two images are actually connected. One interpretation is that the two images are
related in a purely cognitive sense: that is, Mozhukin was meant to be thinking about or
imagining a bowl of soup that was not physically present (an instance of what could be
Barratt et al. 3
described as mental image montage). Interestingly, there is some support for this view: in
Pudovkin’s description of the hunger condition, the Russian word Razmyshlenie translates
roughly as ‘‘thoughtfulness’’ or ‘‘pensiveness.’’ An alternative interpretation—and, one could
argue, the more plausible default interpretation—is that the two images are spatially
connected: that is, the actor Mozhukin was meant to be looking at, and thus in spatial
proximity to, the bowl of soup.
If the relationship between the images is a spatial one, then a Kuleshov-type sequence can
be regarded as an early example of classical continuity editing, where the basic goal is to create
a coherent space and time by using shot-to-shot transitions which follow natural patterns
of attention (see Anderson, 1996; Smith, 2012; Smith, Levin, & Cutting, 2012). More
specifically, the sequence can be regarded as an instance of point-of-view (POV) editing.
The formal properties of the POV structure have been described by Branigan (1984). The
‘‘point/glance’’ shot (hereafter glance shot) presents a character looking in the direction of an
object located offscreen, while the ‘‘point/object’’ shot (hereafter object shot) presents a view
of the object in question. The object can be presented from the optical perspective of the
character (resulting in a ‘‘true POV’’; Brewster, 1982) or from the optical perspective of a
third-party observer (resulting in a ‘‘semi-subjective’’ view; Mitry, 1967). Eyeline-match
editing can be regarded as a special instance of POV editing, where the object is a second
character returning the first character’s gaze (and the second character is usually presented
from a third-party/semi-subjective perspective, thereby avoiding a direct gaze into the
camera). Finally, the POV structure can be either ‘‘prospective’’ or ‘‘retrospective’’ in
nature, with the glance shot being shown either before or after the object shot.
From an explanatory standpoint, Carroll (1996) argues that the POV structure is easily
comprehended by viewers because it duplicates the natural human and primate tendency to
follow (from an egocentric position) the gaze of an intentional agent to an object in the
adjacent environment, or the mutual gaze between one intentional agent and another. The
fact that the corresponding head movement is replaced with an edit rather than a camera
movement does not matter because ‘‘it is the endpoints of the activity, and not the space
between, that command our attention’’ (Carroll, 1996, p. 128). Persson (2003, Chapter 2)
develops this theory by describing the POV structure as an instance of deictic gaze or joint
visual attention and by making a distinction between monitor and non-monitor sight links (see
Figure 1). In a monitor sight link, both the gazer and the object are presented from the same
optical perspective/camera position, albeit with a necessary ‘‘turn’’ of the head or camera. In
a non-monitor sight link, on the other hand, the presentation of the object involves an
unnatural ‘‘jump’’ from one optical perspective/camera position to another.
In humans, deictic gaze is thought to play an important role in the development and
operation of a number of psychological faculties, including language (via the naming of
objects; e.g., Bruner, 1983) and social cognition (via the process of ‘‘social referencing’’;
e.g., Klinnert, 1984). The influence of deictic gaze on two faculties in particular is of direct
relevance to the current discussion. The first of these, theory of mind, can be defined as the
capacity to attribute mental and emotional states to an intentional agent (e.g., Baron-Cohen,
1995; Dennett, 1987; Leslie, 1994). Some models of the mind-reading system explicitly include
both an ‘‘eye-direction detector’’ and a ‘‘shared-attention mechanism’’ as fundamental
components (e.g., Baron-Cohen, 1995). The second faculty, empathy, has been defined by
some theorists as the capacity to ‘‘feel with’’ an intentional agent; that is, to actually
experience some kind of congruent emotional state (e.g., Neill, 1996). Models of empathy
typically include a sensorimotor component based on facial mimicry and a cognitive
component described in terms such as imagination, simulation, or role-taking (e.g., Davis,
1996; Hoffman, 1984; Zillmann, 1991).
4 Perception 0(0)
Figure 1. Types of POV structure and corresponding camera positions.
According to Persson (2003, Ch. 2), the likelihood for the viewer to make a ‘‘POV
inference’’ is increased if one or more of the following eight conditions are met: (1) the
gazer does not look directly into the camera (cf., the so-called ‘‘fourth wall’’ rule); (2) the
object shot is presented from the optical perspective of either the gazer or a third-party
observer (depending on the object depicted); (3) the glance shot and the object shot are
followed by another glance shot (creating a triadic structure); (4) the environment of the
glance shot matches the environment of the object shot (the need for consistent backgrounds
and lighting); (5) the gazer changes their behavior just before the cut in the first glance shot
(the role of behavioral and oculomotor cues); (6) the gazer shows some form of reaction in
the second glance shot (the role of reaction shots); (7) the soundtrack for the glance shot is
Barratt et al. 5
continuous with the soundtrack for the object shot (the role of sound bridges); and finally (8)
the spatial relation between the gazer and the object is established beforehand (the role of
narrative context). Given these conditions, one could argue that a contemporary filmmaker
would be likely to construct a Kuleshov-type sequence in such a way that would make it clear
that the glance shot and the object shot are spatially related, and, correspondingly, the
contemporary film viewer would be likely to make a POV inference. For these reasons, the
current study will regard a Kuleshov-type sequence as a (spatially defined) POV structure.
How, though, does a Kuleshov-type sequence (understood as a POV structure)
communicate and elicit emotion? The strong version of the Kuleshov thesis states that a
character’s face is ‘‘emotionally amorphous’’; that is, the viewer’s interpretation of a
character’s emotional state is shaped entirely by the situational context. Carroll (1996) is
skeptical about such a conclusion and argues that the close-up of a character’s face normally
functions to suggest some sort of emotional state. This does not imply that situational context
is unimportant, however. Instead, Carroll proposes a more nuanced position: the glance shot
in the POV structure establishes an emotional range, while the object shot provides an
emotional focus. For example, the close-up of a character’s face might suggest that the
character is experiencing some kind of negative emotion, while a subsequent view of either
a coffin or a snake might function to specify that that negative emotion is one of sadness or
fear respectively. texto música Por o texto vir antes é que acontece o priming
The weak version of the Kuleshov thesis proposed by Carroll partially reconciles two
competing accounts of facial expressions. According to the two-factor neurocultural model
(see Darwin, 1872/1998; Ekman, 1972; Ekman & Friesen, 1971), there are universal facial
expressions for each of the six ‘‘basic’’ or ‘‘primary’’ emotions: namely, happiness, sadness,
fear, anger, disgust, and surprise. The neural dimension of the model stipulates that a facial
expression is a ‘‘read-out’’ of a primary emotion, while the cultural dimension stipulates that
such expressions can be over-ridden by social display rules. In contrast, the behavioral ecology
view of facial expressions (e.g., Fridlund, 1994) stresses the importance of situational context
and (intentional) communication. The sight of a person screaming, for example, can be
interpreted as either a sign of fear in the context of a threatening situation or a sign of joy
in the case of an athlete winning an important race.
Related Studies in Emotion Research

The question of how our interpretations of facial expressions can be influenced by situational
context is also of great interest to emotion researchers (for a recent review, see Wieser &
Brosch, 2012). Interestingly, the Kuleshov effect is sometimes cited within the emotion
literature as an existence proof that situational context really does matter (e.g.,
Halberstadt, Winkielman, Niedenthal, & Dalle, 2009; Wallbott, 1988), the implicit
assumption, once again, being that the Kuleshov effect is an established and
uncontroversial phenomenon. None of the emotion research, however, has exactly
replicated the design of Kuleshov’s original experiment or obtained the corresponding
results. Instead, emotion researchers have developed a number of (more or less) related
experimental paradigms.
The first of these paradigms uses static images. In the ‘‘person-scenario’’ paradigm
(Goodenough & Tinker, 1931), the participant is presented with a static picture of a face
(either emotionally expressive or emotionally neutral) in conjunction with a verbal
description of an emotional situation, while in the ‘‘candid pictures’’ paradigm (Munn,
1940) the participant is presented with a photograph of a protagonist in a real-world
emotional situation taken from either a newspaper or a magazine. Each of these paradigms
6 Perception 0(0)
is tested in the same way: that is, a first group of participants is presented with the person alone,
a second group is presented with the situation alone, while a third group is presented with a
combination of the two (in the ‘‘candid pictures’’ paradigm, the removal of the person and
situation information is achieved by masking out the corresponding parts of the photograph).
The results of these paradigms have been somewhat mixed: on some occasions, facial
expressions dominate over situational contexts, while on other occasions situational
contexts dominate over facial expressions. (For more recent studies using similar paradigms,
see Carrera-Levillain & Fernandez-Dols, 1994; Carroll & Russell, 1996; Trope, 1986.)
Meanwhile, paradigms using dynamic images have been relatively rare. Goldberg (1951)
constructed two film sequences each comprising an emotional situation followed by a final
image of an emotional person: the emotional person was the same in both sequences
(a woman screaming), while the emotional situation was different (a situation giving rise to
either a fearful or a joyful experience). Goldberg found some evidence of the Kuleshov effect,
with the emotional state of the woman presented in the joyful context being rated more
positively than the emotional state of the woman presented in the fearful context. More
recently, Wallbott (1988) selected 60 clips from film and television dramas each comprising
an emotional situational followed by a close-up of either an actor or an actress responding to
that emotional situation (for a summary, see Wallbott, 1990). A first group of participants
saw the emotional situation on its own, a second group saw only the emotional person, while
a third group saw both elements. The main finding was that the contextual information was
at least as important, if not more so, than the person information, in terms of making
emotion attributions. The person information became more important, however, when the
person’s facial expressions were incongruent with the contexts and when the person in
question was played by an actress rather than an actor.
Previous Attempts at Replication

To our knowledge, there have been only two previous attempts at replicating the original
Kuleshov experiment—one within the field of film studies and one within the field of
neuroscience. Each of these studies has a number of strengths but also a number of
weaknesses.
In the field of film studies, Prince and Hensley (1992) presented 137 communications
students with a close-up of an actor’s emotionally neutral face combined with one of three
different emotional contexts: a little girl smiling and playing with her teddy bear, a woman
lying in a coffin, or a bowl of steaming soup on a table. The authors chose to use a triadic
POV structure (cf., Condition 3): thus, each film sequence consisted of a close-up of the
actor’s face (glance shot), followed by the emotional context (object shot), followed by
another close-up of the actor’s face (glance shot). Each shot was presented for a duration
of 7 s; therefore, each sequence had a total running time of 21 s. The participant’s task was to
evaluate the actor’s performance by selecting from a list of seven emotion categories:
happiness, sadness, fear, anger, disgust, surprise, and hunger (see the categorical approach
to emotion, e.g., Ekman & Friesen, 1971). The participant could also select either a ‘‘no
emotion’’ or an ‘‘other’’ option. Significantly, Prince and Hensley failed to find any evidence
of a Kuleshov effect at all: when asked to identify the actor’s emotion, the majority of the
participants chose the ‘‘no emotion’’ option, while the remainder tended to choose an
emotion that was inappropriate to the particular context. A possible reason for the noise
in the data is that this was basically a single-trial experiment: each participant only saw one
film sequence.
Barratt et al. 7
More recently, in the field of neuroscience, Mobbs et al. (2006) presented 14 healthy
volunteers with 24 film sequences comprising eight positive, eight negative, and eight
neutral contexts. The eight positive faces were 25% happy and 75% neutral, while the
eight negative faces were 25% fearful and 75% neutral; these faces were created using
morphing software. This time, the authors used a retrospective POV structure: each film
sequence consisted of an object shot (4 s in duration) and a glance shot (750 ms),
separated by a jittered interstimulus interval of between 4 and 8 s. For the behavioral
measure, the participants were asked to rate the target faces for emotional expression
and mental state using a two-dimensional rating scale (see the dimensional approach to
emotion, e.g., Russell, 1980). Meanwhile, activity in related brain areas was measured
by means of functional magnetic resonance imaging (fMRI) with an epoch-based event-
related design.
Mobbs et al. found some evidence of a Kuleshov effect with the neutral faces presented in
emotional contexts resulting in higher ratings of valence and arousal, and increased BOLD
responses in brain regions such as the amygdala. There are at least four potential problems with
this study, however. To begin with, the object shot was presented before the glance shot, thus
reversing the order of the original Kuleshov sequence and not excluding the possibility of some
sort of priming effect. Second, the structure of the film sequences was of necessity distorted in
order to satisfy the temporal constraints of the fMRI method being employed: the jittered
interstimulus interval of between 4 and 8 s effectively removed the direct transition between the
object shot and the glance shot. Third, the authors chose to present the target face looking
directly into the camera, thus breaking the ‘‘fourth wall’’ rule described above. Finally, the
authors chose to employ a ‘‘pseudo-candid photograph manipulation’’ in which the participant
was told that the target face was taken from a webcam recording of a previous participant
looking at the context image on a computer monitor; this cover story essentially undermined
the spatial relationship between the two shots.
Para começar, a foto do objeto foi apresentada antes da foto do olhar/rosto, invertendo a ordem da sequência original
de Kuleshov e não excluindo a possibilidade de algum tipo de efeito de priming.
Current Study
The current paper describes an attempt to replicate Kuleshov’s original experiment using an
improved and extended stimulus set in conjunction with an improved experimental design. In
a behavioral and eye tracking study, 36 participants were tested using a stimulus set of 144
specially constructed film sequences (to be described in greater detail later).
One of the goals of the current study was to satisfy as many of the eight conditions
described by Persson (2003, Ch. 2) as possible in order to increase the likelihood that the
participant would infer that the glance shot and the object shot were spatially related (i.e.,
make a POV inference). In particular, we constructed film sequences ensuring that the gazer
did not look directly into the camera (Condition 1), that the object shot was presented from
an appropriate optical perspective (Condition 2), that the glance shot and the object shot
were followed by another glance shot (Condition 3), and that the environment of the glance
shot matched the environment of the object shot as far as possible (Condition 4). A related
goal was to present the individual shots for durations comparable to an estimated Average
Shot Length (ASL) in mainstream Hollywood films of between 3 and 4 s (see Salt, 1974;
Cutting, Brunick, DeLong, Iricinschi, & Candan, 2011). Against this benchmark, the 7-s
duration in the Prince and Hensley (1992) study was too long, while the 750-ms duration
in the Mobbs et al. (2006) study was too short.
Two additional goals concerned the methods for measuring the participants’ responses to
the film sequences. First, we wished to combine the categorical approach to emotion used in
8 Perception 0(0)
the Prince and Hensley (1992) study with the dimensional approach to emotion used in the
Mobbs et al. (2006) study, thus obtaining as much information as possible about the
participant’s subjective interpretations of the target person’s emotional state. Second, we
decided to record the participants’ eye movements while they were watching the given film
sequences. The rationale for recording the participants’ eye movements was that certain
response tendencies might be apparent in the (more implicit) eye tracking data which were
not apparent in the (more explicit) questionnaire data.
Method
Participants
Thirty-six students from Copenhagen Business School, Denmark (18 female, 18 male; age
range 19–39 years; mean age 22.7 years) participated in the experiment in return for a cinema
ticket with a monetary value of 75 Danish kroner. All of the participants were fluent in
English and all had either normal or corrected-to-normal vision. The experiment was
conducted in accordance with the ethical guidelines and requirements of the Humanities
Laboratory, Lund University, Sweden.
Apparatus
The experiment was run on an iView X RED eye tracking system (SensoMotoric
Instruments, Germany) comprising an IBM-compatible desktop computer. The stimuli
(film sequences) were presented on an LCD monitor. The display size was 19’’ (48.3 cm)
measured diagonally, the aspect ratio was 5:4, and the resolution was 1280 ! 1024 pixels.
The viewing distance was between 60 and 80 cm (the optimal range specified by the
manufacturer). The participants’ responses were registered using a mouse and keyboard.
Stimulus presentation and response registration were controlled by Experiment Center,
Version 2.0 (SMI). The participants’ eye movements were recorded at a sampling rate of
50 Hz.
Stimulus Materials
A total of 144 film sequences were specially constructed for the experiment. Each film
sequence consisted of three shots (re. Condition 3): a close-up of the target person’s
neutral face (glance shot), followed by a view of the object or event that the target
person was looking at (object shot), followed by another close-up of the target person’s
neutral face (glance shot). Each of the three shots was presented for a duration of 3 s,
corresponding to the lower limit of the estimated ASL cited earlier; therefore, each
sequence had a total running time of 9 s. The film sequences were constructed from a
stimulus set of 24 faces and 24 contexts (to be described later). Each participant was
presented with 24 film sequences; that is, 24 specific face-context combinations. The
participants were divided into six different groups, yielding the aforementioned total of
144 film sequences.
All of the film sequences were constructed digitally from both photographs and video
clips. The photographs and video clips were cropped to standard (4:3) aspect ratio and
converted to grayscale by reducing the level of saturation. The final film sequences were
presented in Audio Video Interleave (AVI) format; the resolution of the image was
1440 ! 1080 pixels and the frame rate was 60 fps.
Barratt et al. 9
Glance Shots. The 24 faces consisted of 12 neutral female faces and 12 neutral males faces
selected from the Karolinska Directed Emotional Faces picture set (KDEF; Lundqvist,
Flykt, & Öhman, 1998). An attempt was made to select the most ‘‘average looking’’ faces
from the picture set by avoiding those models with, for example, extreme hair styles and
makeup. In addition, conspicuous freckles, moles, and blemishes were removed by means of
digital manipulation. All of the faces were presented in three-quarter profile in order to avoid
a direct gaze into the camera (re. Condition 1) and to facilitate the illusion that the person
was looking at an object in an offscreen space. The glance shots were created in two stages.
To begin with, a 6-s shot with a slow ‘‘zoom-in’’ effect was created. The ‘‘zoom in’’ effect was
added in order to give the static photograph a dynamic dimension (a technique popularly
known as the ‘‘Ken Burns effect’’). The shot was then divided into two 3-s shots. Thus, the
zoom-in of the opening glance shot was spatiotemporally continuous with the zoom-in of the
closing glance shot.
Object Shots. The 24 contexts comprised six conditions with four examples in each. In line
with Kuleshov’s original experiment, we included a happiness condition, a sadness condition,
and a hunger condition. To increase the external and internal validity of the experiment, and
to increase the overall number of trials, we added a further three conditions: namely, a fear
condition, a desire condition, and a null (no context) condition.
When creating the object shots, we employed two basic rules-of-thumb. The first rule-of-
thumb concerned the distinction between static and dynamic objects. If the depicted object
was primarily static (e.g., an inanimate object or a person sleeping), then a photograph with a
slow zoom-in effect was used. If, however, the object was overtly dynamic (e.g., a human or
an animal moving), then a video clip was used instead. The photographs were obtained from
Google Image searches, online stock photography collections, and the International Affective
Pictures System (IAPS; Lang, Bradley, & Cuthbert, 2005), while the video clips were taken
from videos uploaded on YouTube. The second rule-of-thumb concerned the distinction
between POV editing and eyeline-match editing (re. Condition 2; Figure 1). If the object of
the target person’s gaze was an inanimate object, then the object shot was presented from an
ambiguous optical perspective which could be interpreted as a non-monitor sight link (‘‘true
POV’’). If, however, the object was an intentional agent (i.e., a human or an animal), then the
object shot was presented as a monitor sight link (‘‘semi-subjective’’), thereby avoiding a
direct gaze into the camera (re. Condition 1).
For the happiness condition, we chose stimuli that would be potentially capable of
inducing either happiness or a positively valenced nurturing tendency: namely, the human
examples of a baby smiling and a child playing as the first pair of stimuli, and the animal
examples of a puppy and a kitten falling asleep as the second pair (see Gould, 1980; Lorenz,
1970). For the sadness condition, on the other hand, we chose archetypal and instantly
recognizable symbols of death and mourning: namely, a coffin and a graveyard, and a
wreath and a cemetery. The hunger condition presented a different challenge. Following
Kuleshov’s intuition that the hunger condition should be partly neutral in nature, we
included a bowl of soup and a loaf of bread as the first pair of stimuli, and a bowl of rice
and a sack of potatoes (as two relatively uninteresting carbohydrate accompaniments to a
main meal) as the second pair.
For the fear condition, we selected a snake and a spider as archetypal examples of innate
or ‘‘biologically prepared’’ fear stimuli (see LeDoux, 1998; Öhman & Soares, 1994; Seligman,
1971), and a roaring lion and tiger as two additional examples of potentially dangerous
animals. The video clips were edited and, in some instances, horizontally flipped so that
10 Perception 0(0)
the animal appeared to be either moving or looking in the direction of the target person,
thereby increasing the perceived level of threat. For the desire condition, we selected
photographs of two attractive female models and two attractive male models all posing in
the form of the ‘‘classical reclining nude.’’ All of the models had their backs to the camera,
and all were photographed in a realistic domestic setting. Our rationale here was that the
models should be recognized as attractive and desirable, and plausibly occupying the same
space as the target person, but that the images should not be pornographic in nature. Finally,
we included a null (no context) condition in order to establish the participants’ baseline
interpretation for each neutral face. This condition comprised a gray background with a
centrally presented number (Courier New, white, 36 pt.) ‘‘counting down’’ from three to
one at a rate of one number per second. The countdown served two functions: first, it
increased the level of visual interest; and second, it acted as a substitute fixation cross,
thereby ensuring that the participants fixated their eyes on the center of the display.
(A baseline study of the different object shots is described in Appendix A.)
Additional Considerations. Finally, we made an attempt to match the environment of the glance
shot with that of the object shot (re. Condition 4). One of the advantages of using an
established stimulus set for the glance shots was the fact that all of the models were
photographed against a neutral background which (after grayscaling) could be potentially
interpreted as either a blank wall in the case of an interior setting or a cloudless sky in the
case of an exterior. In addition, the cropping of the glance shot to a standard (4:3) aspect
ratio effectively functioned to minimize the size and potential incongruence of the
background relative to the object shot. Both the glance shots and the object shots were
converted to grayscale by reducing the level of saturation, and an attempt was made to
match the brightness and contrast levels of the varying object shots with the brightness
and contrast levels of the relatively constant glance shots.
Procedure
The experiment began with a verbal instruction delivered by the experimenter followed by a
written instruction presented on screen. The participant was told that they would be
presented with 24 film sequences, each 9 s in duration, grayscaled, and without sound.
The basic structure of the film sequences (glance shot, object shot, glance shot) was
described and the spatial relationship between the three shots was explicitly stated:
‘‘Each film sequence depicts a situation in which either a man or a woman looks at an
object or event in their environment.’’ It was explained that, for each film sequence, the
participant’s task would be to judge the emotional state of the target person in terms of
valence, arousal, and category. It was also stipulated that the participant should give their
spontaneous response and that there were no right or wrong answers. Finally, the
participant was told that their eye movements would be recorded throughout the
experiment.
The actual testing began with a 9-point eye tracking calibration. If the mean deviation
from each calibration point was less than 1" of visual arc in both the horizontal and vertical
planes, then the calibration was accepted. If not, the calibration was repeated until the
condition had been met. The calibration was also repeated halfway through the
experiment, after the 12th trial, in order to account for any slippage. The same acceptance
conditions applied.
An individual trial was structured as follows (see Figure 2). To begin with, a fixation cross
was presented in the center of the display for a minimum duration of 1500 ms. Continuation
Barratt et al. 11
Figure 2. Trial procedure and examples of stimuli.
to the next display was contingent on the participant fixating the cross for at least 200 ms.
The film sequence was then presented for 9 s. As soon as the film sequence had ended, the
participant was asked three questions, each presented on a separate display: first, to rate the
valence of the target person’s emotion on a 9-point scale from –4 (‘‘negative’’) to þ4
(‘‘positive’’); second, to rate how aroused the target person appeared to be on a 9-point
scale from 1 (‘‘calm’’) to 9 (‘‘excited’’); and third, to identify the type of emotion that the
target person was feeling by selecting from a list of nine emotion categories. This list included
the six ‘‘basic’’ or ‘‘primary’’ emotions (happiness, sadness, fear, anger, disgust, and surprise),
two basic drives or motivations (hunger and desire), and an ‘‘other’’ option whereby the
participant was free to name or describe a more complex or mixed emotional state. We
deliberately worded the questions in such a way that the participant was asked to make a
judgement about the target person’s emotional state; that is, we did not refer directly to the
target person’s facial expression. By asking the participants two rating-based questions in
conjunction with one category-based question, we combined the dimensional approach to
emotion used by Mobbs et al. (2006) with the categorical approach used by Prince and
Hensley (1992).
The stimulus set of 24 faces and 24 contexts could be potentially combined in 24! ways.
For pragmatic reasons, we constructed six groups with different face-context combinations
(see Appendix B). These six groups were constructed according to three constraints. First,
across the six groups, a given model (e.g., Female 01) was presented with one contextual
example from each of the six conditions. Second, groups 1 to 3 saw the 12 female models
conjoined with the first pair of contextual examples from each condition and the 12 male
12 Perception 0(0)
models paired with the second pair, while groups 4 to 6 saw the 12 female models conjoined
with the second pair of contextual examples from each condition and the 12 male models
paired with the first pair. The latter constraint ensured that, across the six groups, both
female and male models were paired with any given contextual example (e.g., the baby
smiling). Additionally, in the desire condition, female models were depicted looking at
both desirable females (same-sex) and desirable males (different-sex), and vice versa. Third
and final, the six groups were balanced with respect to the gender of the participants, with
three female participants and three male participants being assigned to each group. For each
participant, the sequences were presented in random order.2
After the experiment, the participants were asked to complete a short follow-up
questionnaire about their experience (before being fully debriefed). They were asked the
following five questions: (1) What do you think the experiment was about?; (2) Was there
anything confusing in the experiment?; (3) What was your impression of the different faces?;
(4) Have you heard of the Soviet filmmaker Lev Kuleshov and/or the ‘‘Kuleshov effect’’?; and
(5) Do you have any other comments?
Hypotheses
For the questionnaire data, we had a number of hypotheses. To begin with, if we assume that
the Kuleshov effect really does exist, then the emotional context should influence a
participant’s judgements of a target person’s emotional state in terms of both attributed
valence (re. Question 1) and attributed arousal (re. Question 2). A visualization of the
predicted valence and arousal ratings for the different conditions is provided by Figure 3.
Following the general consensus of emotion theory (e.g., Russell, 1980), our hypothesis
was that the happiness condition should be rated positively in terms of valence (almost by
definition) and somewhere in the middle of the scale in terms of arousal (a sudden burst of joy
can be thought of as a more intense version of happiness; Evans, 2001). The sadness and fear
conditions, on the other hand, should be rated negatively on the valence dimension but
dissociated from one another on the arousal dimension; sadness (a relatively passive
emotion involving a withdrawal from the environment) tends to be associated with a low
level of arousal, whereas fear (a relatively active emotion connected to the fight-and-flight
response) tends to be associated with a high level of arousal.
Following the research on drives and motivations (e.g., Pfaff, 1999), our hypothesis was
that the (sexual) desire condition should be rated positively in terms of valence and associated
with a high level of arousal. For the basic drive/motivation of hunger, meanwhile, we had less
specific hypotheses: for example, the hunger condition could be rated positively if interpreted
as a want that would be soon satisfied (given the immediate presence of the food), negatively
if interpreted as a lack of something, or neutrally if interpreted as an instance of
thoughtfulness/pensiveness. For the null (no context) condition, our default prediction was
that the allegedly ‘‘neutral’’ faces should be rated at the (‘‘neutral’’) midpoint of the valence
scale and towards the low (‘‘calm’’) end of the arousal scale.
The Kuleshov thesis also predicts that the emotional context should influence a
participant’s judgements of a target person’s emotional state with respect to identifying the
appropriate emotion category (re. Question 3). In this case, our hypothesis was simply that a
given emotional context would increase the likelihood that the corresponding emotion
category was selected: for example, in the happiness condition the happiness category
would be selected more frequently than the other five emotion categories; in the sadness
condition the sadness category would be selected more frequently, and so forth. For the
Barratt et al. 13
excited
fear desire
● ●
arousal
●
hunger / NULL (?)
● ●
sadness happiness
calm
negative positive
valence
Figure 3. Predictions for valence and arousal ratings by condition.
null (no context) condition, we had no specific hypotheses regarding the eight specific
emotion categories cited in the list.
The rationale for recording the participants’ eye movements was that certain response
tendencies might be apparent in the (more implicit) eye tracking data which were not
apparent in the (more explicit) questionnaire data. Our working hypothesis was that the
object shot might influence the way in which the participants looked at the target person’s
face in the closing glance shot, in comparison with the opening glance shot: that is, when
participants saw a certain emotional context they might tend to look for evidence of that
emotion in the target person’s face by fixating certain regions of the face sooner, more
frequently, and for longer durations.
The facial expressions of different emotions recruit muscle units in different regions of the
face (see Darwin, 1872/1998; Ekman, 1972; Ekman & Friesen, 1971). In the facial expression
of happiness, for example, the most significant region seems to be the (smiling) mouth,
whereas in the facial expression of fear the most significant region seems to be the
(widening) eyes. In turn, various studies suggest that the observer’s capacity to recognize
facial expressions of emotion is dependent on attending to the most informative, or
‘‘diagnostic,’’ regions of the face. For example, the ability to recognize happy faces
(together with the detection advantage of happy faces) has been attributed to the visual
14 Perception 0(0)
saliency of the upturned mouth (e.g., Calvo & Lundqvist, 2008; Calvo & Nummenmaa,
2008). Conversely, the inability to recognize fearful faces (in a patient with bilateral
amygdala damage) has been attributed to a failure to attend to the eye region, a failure
which can be temporarily corrected by explicit instruction (Adolphs et al., 2005; Adolphs,
Tranel, Damasio, & Damasio, 1994). Additional evidence comes from a behavioral study in
which participants had to decide whether the upper and lower halves of a target face were
emotionally congruent or incongruent: participants were faster and more accurate at
detecting incongruent mouths in otherwise happy faces and incongruent eyes in otherwise
fearful faces (cf., Calder, Young, Keane, & Dean, 2000; Innes-Ker, 2003).
Many of these key findings are both supported and strengthened by a recent study
(Schurgin et al., 2014) which uses eye tracking as a relatively direct measure of visual
attention and which investigates the role of eye movements in recognizing facial
expressions of emotion. This study found that participants fixated the mouth region
(upper lips) most when presented with happy/joyful faces and the eye region most when
presented with fearful (and also sad) faces. Significantly, this pattern of fixations was
preserved when the participants were presented with neutral faces (within a block of a
certain category of emotional faces). The latter result is of particular relevance to the
Kuleshov effect as it suggests that the surrounding context can influence attention in a
goal-driven, as opposed to stimulus-driven, fashion. (For a related eye tracking study on
scanning facial expressions of emotion, especially happiness and sadness, see Eisenbarth &
Alpers, 2011.)
In summary, for the eye tracking data, we divided our general working hypothesis into two
more specific hypotheses. Happiness > mouth hypothesis: When participants see a happy
context (e.g., a cute baby smiling), they will tend to look for evidence of happiness in the
target person’s face by fixating the mouth region sooner, more frequently, and for longer
durations. Fear > eyes hypothesis: When participants see a fear context (e.g., a dangerous
snake), on the other hand, they will tend to look for evidence of fear in the target person’s
face by attending more to the eye region. For the remaining emotional contexts—sadness,
desire, and hunger—we did not have specific hypotheses regarding the eye tracking data,
although it is plausible that a sad face is primarily associated with tearful eyes (e.g.,
Eisenbarth & Alpers, 2011), while folk wisdom suggests that the look of desire is
associated with sustained eye contact and dilated pupils.
Results and Analysis

Questionnaire Data
Preliminary evidence for the influence of emotional contexts came from the answers to
Questions 1 and 2 (valence and arousal). The mean ratings for valence and arousal for
each of the six conditions are presented in Figure 4, as single points in a two-dimensional
space. In this figure, the corresponding axes for valence and arousal have been rescaled so
that a value of zero corresponds to the mean rating across all six conditions (valence ¼ –0.52;
arousal ¼ 4.57). The positions of each of the six points thus indicate whether a condition
mean was higher (positive value) or lower (negative value) than the overall mean in terms of
valence and arousal. The rating data were formally analyzed using a mixed model regression
analysis. In this analysis, we included condition as the main predictor with six levels, and
participant and item as random effects. The coefficients indicate whether a condition mean
was significantly above or below the overall mean. The results are presented in Table 1.
Three of the four conditions for which we had specific predictions (sadness, fear, and
desire) were located in the expected quadrant, with the relative position of two of those
Barratt et al. 15
−1 0 1
excited 1
desire ●
fear ●
arousal
happiness
● 0
● hunger
NULL ●
● sadness
calm −1
negative positive
valence
Figure 4. Questions 1 and 2: mean valence and arousal ratings by condition.
Table 1. Questions 1 and 2: mean valence and arousal ratings by condition.
Question 1 (valence) Question 2 (arousal)
Estimate SE df t p Estimate SE df t p
Happiness 0.240 0.303 180.15 0.790 .430 0.003 0.219 121.57 0.016 .987
Desire 0.809 0.303 180.15 2.669 .008** 0.816 0.219 121.57 3.728 .000***
Sadness –0.955 0.303 180.15 –3.150 .002** –0.490 0.219 121.57 –2.237 .027*
Fear –0.094 0.303 180.15 –0.309 .757 0.163 0.219 121.57 0.746 .457
Hunger 0.274 0.303 180.15 0.905 .367 –0.191 0.219 121.57 –0.872 .384
NULL –0.274 0.303 180.15 –0.905 .367 –0.302 0.219 121.57 –1.380 .170
*p < .05; **p < .01; ***p < .001.
conditions (sadness and desire) being significantly above/below average. The faces presented
in the sad contexts were perceived as the most negative and the least aroused (t(180.15) ¼
–3.150, SE ¼ 0.303, p < .01; t(121.57) ¼ –2.237, SE ¼ 0.219, p < .05), whereas the faces
presented in the desire contexts were perceived as the most positive and the most aroused
16 Perception 0(0)
happiness desire
happiness ● ● happiness
sadness ● ● sadness
fear ● ● fear
anger ● ● anger
disgust ● ● disgust
surprise ● ● surprise
hunger ● ● hunger
desire ● ● desire
other ● ● other
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
sadness fear
fear ● ● fear
anger ● ● anger
other ● ● other
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
hunger NULL
fear ● ● fear
anger ● ● anger
other ● ● other
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Figure 5. Question 3: relative frequency of emotion categories by condition.
(t(180.15) ¼ 2.669, SE ¼ 0.303, p < .01; t(121.57) ¼ 3.728, SE ¼ 0.219, p < .001), with
‘‘different sex’’ desire being rated slightly higher than ‘‘same sex’’ desire. The happiness
condition was slightly above average for valence (as expected) but at an average level for
arousal (not expected). For the hunger and null (no context) conditions, we had less specific
predictions. Interestingly, the null condition was towards the low end of both the valence and
arousal spectrums, suggesting that the default interpretation of the target person’s emotional
state was that the person was feeling somewhat negative and unaroused.
Additional evidence for the influence of emotional contexts came from the answers to
Question 3 (category). A summary of these answers is presented in Figure 5. In this figure, all
answers outside the eight basic emotional categories have been collapsed into a single ‘‘other’’
category, yielding a total of nine response categories which are shown on the vertical axes of
the six panels. The dots inside each panel show how often each of these nine categories was
chosen within each condition, with the lines extending from the dots indicating a 95%
confidence interval. If the emotional contexts had no effect on the interpretation of the
Barratt et al. 17
target person’s emotional state (the null hypothesis), then each of the nine categories should
have been selected with an equal degree of probability; that is, a relative frequency
approaching 11.1%. For each of the five emotional conditions (happiness, sadness, hunger,
fear, and desire), the participants tended to choose the target emotion (indicated by a solid
dot) more frequently than the eight alternative options; the confidence intervals for these
target emotions are well above the 11.1% baseline. This effect was most pronounced for the
sadness condition, possibly bolstered by the fact that the baseline interpretation of the
‘‘neutral’’ faces in the null (no context) condition tended towards sadness.
Across the six emotional conditions and the 36 participants, the ‘‘other’’ option was
selected with a relative frequency of 9.5% (82 out of a total of 864 trials). Out of the 82
‘‘other’’ responses, the top five most common interpretations of the target person’s emotional
state were interest/curiosity (11.0%), thoughtfulness (8.5%), indifference (7.3%), neutrality
(7.3%), and tiredness/exhaustion (6.1%) (for an argument that neutrality is often confused
with interest, see Tomkins, 1995, p. 253). On four occasions (¼ 4.9%), the participants
described mixed emotions; that is, combinations of two or more basic emotions.
The remaining responses made reference to more complex, cognitive states such as
disappointment and disbelief. On five occasions (¼ 6.1%), the participants cited emotion
terms which were either repetitions or synonyms of one of the eight pre-specified categories:
for example, the term ‘‘frighten[ed]’’ can be thought of as the adjective form of the noun
‘‘fear.’’ For the sake of simplicity, these responses were not incorporated into the above
analysis of the categorical data.
Eye Tracking Data

To recap, the rationale for recording the participants’ eye movements was that certain
response tendencies might be apparent in the (more implicit) eye tracking data which were
not apparent in the (more explicit) questionnaire data.
One advantage of using the KDEF picture set is that the differences between the 24 models
used had been minimized beforehand. The models were first photographed using a camera
grid and the resulting pictures were then adjusted using a digital grid: the vertical and
horizontal positions of the models’ eyes and mouths were aligned to specific positions on
both grids. For the analysis of the eye tracking data, the faces in the opening and closing
glance shots were divided into areas of interest (AOIs), for the eye, mouth, nose, and ear
regions (see Figure 6). The size of the AOIs was scaled up slightly in order to account for both
the physiognomic differences between the 24 models and the size differences between the
opening and closing shots created by the zoom-in. Although the opening and closing
glance shots were each presented for a duration of 3 s, the AOIs were created for only the
first second of presentation time: while a 3-s duration is a reasonable approximation of the
ASL in mainstream Hollywood films, it is too long for assessing a person’s initial perception
of a visual scene given that the gist of such a scene can be extracted very quickly (see ‘‘gist
perception’’; Castelhano & Hendersen, 2007). Given that relatively few fixations were made
on the ear region, this AOI was not included in the analysis.
We analyzed the eye tracking data using multilevel logistic regression with condition and
glance shot as fixed factors. Table 2 (rows 1 to 3) shows how frequently the participants
looked at each of the three main AOIs in terms of the proportion of trials that a given AOI
was fixated. (The two numbers in each cell represent the data for the first and second glance
shots respectively.) The participants tended to fixate the eye region most frequently, followed
by the nose and then the mouth regions. Regarding the difference between the first and
second glance shots, the following pattern was observed. The nose region was fixated in
18 Perception 0(0)
Figure 6. Eye tracking data: examples of areas of interest (AOIs).
Table 2. Proportion of trials fixated, mean total fixation times (ms), and mean entry times (ms) for each
AOI (by condition and glance shot).
Happiness Desire Sadness Fear Hunger NULL
Proportion Eyes 0.93–0.86 0.91–0.85 0.93–0.92 0.90–0.86 0.90–0.90 0.90–0.83

of trials Nose 0.55–0.45 0.51–0.37 0.58–0.50 0.50–0.45 0.50–0.37 0.50–0.41
fixated Mouth 0.27–0.26 0.23–0.21 0.23–0.24 0.25–0.20 0.24–0.34* 0.22–0.19
Mean total Eyes 537–550 570–575 543–571 562–609 550–580 583–579

fixation Nose 342–337 357–315 339–334 339–311 355–341 331–297
time (ms) Mouth 297–306 291–330 318–318 308–313 304–342 319–389
Mean Eyes 338–439 314–480 322–454 340–427 365–444 337–514

entry Nose 486–535 425–526 454–523 414–574 410–565 437–469
time (ms) Mouth 721–770 610–654 736–811 690–665 649–709 649–657
AOI ¼ area of interest.

*p ¼ .076.
significantly fewer trials in the second glance shot across all conditions (B ¼ –0.623, SE ¼
0.122, z ¼ –5.096, p < .001). For the mouth region, there was a comparable decrease except
for the hunger condition, with the increase in the hunger condition being marginally
significant (B ¼ 0.775, SE ¼ 0.437, z ¼ 1.773, p ¼ .076). For the eye region, there were no
significant effects.
Table 2 (rows 4 to 6) shows how long the participants looked at each of the three main
AOIs in terms of mean total fixation times. In general, the participants tended to fixate the
eye region for the longest durations, followed by the nose and then the mouth regions. In
addition, the participants tended to look at the nose region for shorter durations in the
second glance shot compared to the first, but at the eye and mouth regions for slightly
longer durations. None of these differences, however, were significant (ps > .05).
Table 2 (rows 7 to 9) shows the mean entry times for each of the three main AOIs. In
general, participants were quicker to look at the eyes than the nose, and quicker to look at the
nose than the mouth. In addition, the entry time for a given AOI tended to be longer in the
Barratt et al. 19
second glance shot than in the first. For the eye region, the difference between the first and
second glance shots in the null (no context) condition was significant. In addition, there were
two significant interaction effects: the difference between the first and the second glance shots
was not as large in the fear condition (B ¼ –84.672, SE ¼ 31.925, t ¼ –2.652, p < .05) and the
hunger condition (B ¼ –92.172, SE ¼ 31.815, t ¼ –2.897, p < .05), as it was in the null (no
context) condition.
In order to test the two main hypotheses (‘‘happiness > mouth,’’ ‘‘fear > eyes’’), we
analyzed the total fixation times for the two best stimuli from each condition: namely, the
baby and child from the happiness condition, and the snake and spider from the fear
condition. (The selection of the two best stimuli was based on the results of the
questionnaire data [particularly the answers to Question 3] and is supported by the results
of the baseline study [see Appendix A].) The results went in the expected directions. For the
happiness condition, the mouth region received relatively more attention in the second
compared with the first glance shot (307 vs. 365 ms), while the attention given to the eye
region was roughly equal between the two glance shots (538 vs. 528 ms). For the fear
condition, on the other hand, there was a sizeable increase in the attention given to the
eye region (567 vs. 633 ms) but not the mouth region (292 vs. 313 ms) across the two
glance shots. None of these differences, however, were significant (ps > .05).
The main findings for the analysis of the eye tracking data can be summarized as follows.
(1) Comparison between the three main AOIs. The participants tended to focus on the eye
region to the largest extent, followed by the nose and then the mouth regions. This finding
was supported by the analysis of the proportion of trials that each AOI was fixated, as well as
by the analysis of the total fixation times and entry times for each AOI. (2) Comparison
between the first and second glance shots. For the comparison between the two glance shots,
the most reliable trend was that the nose received less attention in the second glance shot
compared with the first. This finding is consistent with the view that the nose served as a
‘‘spatial anchor’’ in the first glance shot, but that attention shifted to other parts of the face in
the second glance shot as the participant searched for evidence of certain emotions in light of
the intermediate context (see Võ, Smith, Mital, & Henderson, 2012). The general time lag for
the fixations in the second glance shot compared with the first can be explained by the fact
that the second glance shot was subject to carry-over effects from the preceding context,
whereas the first glance shot was preceded by a fixation cross. (3) Comparison between
conditions. For the two main hypotheses (‘‘happiness > mouth,’’ ‘‘fear > eyes’’), the results
went in the expected directions but were not statistically significant. The main significant
difference between the conditions involved the hunger condition, with the mouth region
receiving more attention in the second glance shot compared to the first. This finding was
not expected but is consistent with the fact that the mouth is the site of ingestion.
Discussion and Conclusion

The current study represents a first attempt at replicating Kuleshov’s original experiment
using an improved and extended stimulus set in conjunction with an improved experimental
design. The results of the study suggest that some sort of Kuleshov effect does in fact exist.
The strongest evidence came from the answers to Question 3 (category). For each of the five
emotional conditions, the participants tended to choose the appropriate category more
frequently than the alternative options. This effect was most pronounced for the sadness
condition, possibly bolstered by the fact that the baseline interpretation of the ‘‘neutral’’
faces in the null (no context) condition tended towards sadness. The answers to Questions 1
and 2 (valence and arousal) also went in the expected directions with, for example, the faces
20 Perception 0(0)
presented in the sad contexts being perceived as more negative and less aroused than the faces
presented in the desire contexts.
In light of these findings, the study makes a contribution to film theory, where the
Kuleshov effect has become the stuff of legend and a shorthand for demonstrating the
power of editing (e.g., Truffaut, 1984), but where the two existing attempts at replication
(Mobbs et al., 2006; Prince & Hensley, 1992) have produced either conflicting or unreliable
results. It also makes a contribution to the psychological research on emotion, where much
interest has been shown in the question of how situational context influences our
interpretation of facial expressions (e.g., Wieser & Brosch, 2012) and where the Kuleshov
effect has been cited as evidence that situational context really matters (e.g., Halberstadt
et al., 2009; Wallbott, 1988).
At the same time, it should be acknowledged that the results of such a study need to be
treated with a certain degree of caution. In any psychological experiment, a possible source of
error is demand characteristics—extraneous variables that bias the participant to behave in a
certain way. In this particular case, one prominent example of such a demand characteristic is
experimenter expectancies: that is, the participant responding in the way that they think the
experimenter would like them to or simply coming up with their best guess about what is
going on given the information available. If, for example, you are presented with a relatively
inexpressive person looking at a coffin or a graveyard, then one could argue that the best
answer that you can come up with—in the absence of any other contextual information—is
that the depicted person is experiencing an emotion akin to sadness.
Such considerations raise the question of what the Kuleshov effect really is. Upon closer
examination, the Kuleshov effect potentially involves a number of different psychological
processes, operating either individually or in combination. For example, does the observer
actually perceive the emotion in the target person’s face: that is, do they have a
phenomenological experience of seeing, say, a sad facial expression (a possible example of
the ‘‘face adaptation effect’’; Strobach & Carbon, 2013)? Or does the observer primarily
cognise the target person’s emotional state through a process of inference (an instance
of theory of mind)? Or does the observer actually experience the target person’s emotional
state in some weakened form (an instance of empathy)? In addition to these three questions,
the Kuleshov effect may have a temporal dimension. For example, do the psychological
processes just described occur ‘‘online’’ during the actual presentation of the target
person’s face, or after that presentation, as a way of making sense of the corresponding
memory?
In the post-experiment questionnaire, many of the participants noted that the target faces
seemed to be similar in terms of emotional expressions, relatively inexpressive, somewhat sad,
or somewhat calm. Some of the participants also pointed to the potential role of the context
in influencing their interpretations. In other words, many of the participants were thinking in
the right direction with respect to the experimenters’ intentions. None of the participants,
however, realized that all of the target faces were supposed to be neutral. Thus, none of the
participants guessed the true purpose of the experiment.
Whatever the effects of demand characteristics, it remains the case that the participants in
the study selected the appropriate emotion category in, for example, the sadness condition
more frequently than the other emotion categories. This is a finding which needs to be
explained. Significantly, the finding is in line with the weaker version of the Kuleshov
thesis which proposes that the glance shot in the POV structure establishes an ‘‘emotional
range’’ while the object shot provides an ‘‘emotional focus’’ (see Carroll, 1996). The fact that
the baseline interpretation of the faces in the null (no context) condition tended towards
sadness suggests that the allegedly ‘‘neutral’’ faces were not entirely neutral after all. If this
Barratt et al. 21
was the case, then it is plausible that the opening shot of the character’s face might have
functioned to suggest that the character was experiencing some kind of negative emotion,
while a subsequent object shot of, for example, a coffin or a graveyard might have functioned
to specify that that negative emotion was one of sadness.
In the study, eye tracking was included as a more exploratory measure. The rationale for
recording the participants’ eye movements was that certain response tendencies might be
apparent in the (more implicit) eye tracking data which were not apparent in the (more
explicit) questionnaire data. The comparison between AOIs (in terms of fixations) gave us
insight into how the participants attended to and prioritized different parts of the target
person’s face, while the comparison between the first and second glance shots gave us
insight into how the participant re-distributed their attention in light of the intermediate
context. For the comparison between the conditions, there were some possible trends
regarding the two main hypotheses but no significant differences. There are three possible
reasons for the failure to find such differences, each of which illustrates the difficulty in
achieving a balance between stimulus integrity/complexity and experimental control. First,
the opening but not the closing glance shot was preceded by a fixation cross—thus, the two
shots were not directly comparable in terms of the participant’s starting eye position. Second,
although the context objects were roughly controlled for in terms of spatial location and size,
the context objects necessarily differed in a number of other dimensions. Finally, if the POV
structure is understood as an instance of deictic gaze, then it could be argued that we would
expect the viewer to focus relatively more on the target person’s eyes as it is the eye region
that provides the primary spatial connection between the shots. ear tracking pra saber se o ouvinte
foca no ritmo, alturas, textura
A primary goal of future studies will be to reduce the types of demand characteristic cited
earlier. One way of doing this would be to include catch trials (or fillers) with target faces
expressing subtle emotions: for example, a subtle positive face could be created by morphing
25% of a happy face with 75% of a neutral face (using the same model and a constant head
position), while a subtle negative face could be created by morphing 25% of a fearful
(or angry) face with 75% of a neutral face (cf., the study by Mobbs et al., 2006). Including
such catch trials (or fillers) would serve to ensure that participants are attending properly to
the faces (rather than to the emotional contexts alone), while helping to disguise the true
purpose of the experiment. A further goal will be to include additional measures of the
participants’ responses to the film sequences. In the current study, for example, we asked
the participants to judge the emotional state of the depicted person but we did not measure
the emotional state of the participants themselves. One way of approaching this issue would
be to use such measures as pupil dilation and galvanic skin response as indices of
corresponding physiological arousal. Such information might help us to tease apart the
different components of the Kuleshov effect described earlier.
Perhaps the ultimate challenge of future studies will be to construct film sequences
which satisfy as many of the remaining four conditions described by Persson (2003, Ch. 2)
as possible while retaining a reasonable level of experimental control. One obvious way
forward would be to improve the quality and realism of the face images by making them
truly ‘‘dynamic.’’ This could be achieved either by filming trained actors against a neutral
backdrop (cf., the study by Prince & Hensley, 1992) or by using software to create animated
morphs from still photographs. The advantage of using dynamic images would be their
capacity to capture subtle behavioral and oculomotor cues, thus increasing the illusion
that the target person is looking at, and reacting to, an object in an ‘‘offscreen’’ space (see
Conditions 5 and 6). Another way forward would be to add a soundtrack to the image track
and to present some kind of establishing shot before the POV structure itself. This would
allow for the creation of film sequences in which diegetic sound functions to bridge the gap
22 Perception 0(0)
between the glance shot and the object shot, and in which the spatial relation between
the gazer and the object is established beforehand (see Conditions 7 and 8). When all
of these conditions are finally accounted for, we may be in a position to decisively
conclude that the Kuleshov effect—discussed all those years ago by Pudovkin, Hitchcock,
and Truffaut—really does exist.
Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or
publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or
publication of this article: The study described in this paper was conducted as part of the research
programme for the Centre for Cognitive Semiotics, Lund University, Sweden – financed between 2009
and 2014 by the Bank of Sweden Tercentenary Foundation (grant number M2008-0408:1-PK).
Notes
1. Anna Cabak Rédei came up with the idea of replicating Kuleshov’s original experiment. Thanks to
Yuri Tsivian and Daria Khitrova for providing the translation of the Russian word Razmyshlenie
(‘‘thoughtfulness’’ or ‘‘pensiveness’’).
2. In order to obtain some pilot data about the Rear Window examples discussed by Hitchcock
(Truffaut, 1984), we added a 25th trial/sequence based on the film. For the glance shot, we took
a frame still of one of the few instances in the film in which James Stewart’s face appears to be
neutral and added the slow zoom-in effect described earlier. For the object shots, the participants
were presented with 4-s video clips featuring either Miss Torso (the attractive female dancer)
exercising, the newlywed couple embracing, the cute dog being lowered in a basket, the incident
in which the antagonist Thorwald (Raymond Burr) threatens the heroine Lisa (Grace Kelly), or a
null (no context) condition.
References
Adolphs, R., Gosselin, F., Buchanan, T. W., Tranel, D., Schyns, P., & Damasio, A. R. (2005).
A mechanism for impaired fear recognition after amygdala damage. Nature, 433, 68–72.
Adolphs, R., Tranel, D., Damasio, H., & Damasio, A. (1994). Impaired recognition of emotion in facial
expressions following bilateral damage to the human amygdala. Nature, 372, 669–672.
Anderson, J. D. (1996). The reality of illusion: An ecological approach to cognitive film theory.
Carbondale, IL: Southern Illinois University Press.
Baron-Cohen, S. (1995). Mindblindness: An essay on autism and theory of mind. Cambridge, MA: MIT
Press.
Branigan, E. (1984). Point of view in the cinema. New York, NY: Mouton.
Brewster, B. (1982). A scene at the ‘movies’. Screen, 23, 4–15.
Bruner, J. (1983). Child’s talk: Learning to use language. New York, NY: Norton.
Calder, A. J., Young, A. W., Keane, J., & Dean, M. (2000). Configural information in facial
expression perception. Journal of Experimental Psychology: Human Perception and Performance,
26, 527–551.
Calvo, M. G., & Lundqvist, D. (2008). Facial expressions of emotion (KDEF): Identification under
different display-duration conditions. Behavior Research Methods, 40, 109–115.
Calvo, M. G., & Nummenmaa, L. (2008). Detection of emotional faces: Salient physical features guide
effective visual search. Journal of Experimental Psychology: General, 137, 471–494.
Barratt et al. 23
Carrera-Levillain, P., & Fernandez-Dols, J.-M. (1994). Neutral faces in context: Their emotional
meaning and their function. Journal of Nonverbal Behavior, 18, 281–299.
Carroll, J. M., & Russell, J. A. (1996). Do facial expressions signal specific emotions? Judging emotion
from the face in context. Journal of Personality and Social Psychology, 70, 205–218.
Carroll, N. (Ed.) (1996). ‘‘Toward a theory of point-of-view editing: Communication, emotion, and the
movies’’. In Theorizing the moving image (pp. 125–138). Cambridge, England: Cambridge University
Press.
Castelhano, M. S., & Hendersen, J. M. (2007). Initial scene representations facilitate eye movement
guidance in visual search. Journal of Experimental Psychology: Human Perception and Performance,
33, 753–763.
Cutting, J. E., Brunick, K. L., DeLong, J. E., Iricinschi, C., & Candan, A. (2011). Quicker, faster,
darker: Changes in Hollywood film over 75 years. i-Perception, 2, 569–576.
Darwin, C. (1872/1998). The expression of the emotions in man and animals (3rd ed.). Commentary by P.
Ekman. London, England: Fontana Press.
Davis, M. H. (1996). Empathy: A social psychological approach. Oxford, England: Westview Press.
Dennett, D. C. (1987). The intentional stance. Cambridge, MA: MIT Press.
Eisenbarth, H., & Alpers, G. W. (2011). Happy mouth and sad eyes: Scanning emotional facial
expressions. Emotion, 11, 860–865.
Ekman, P. (1972). Universals and cultural differences in facial expressions of emotion. In J. Cole (Ed.),
Nebraska symposium on motivation (pp. 207–282). Lincoln, NE: University of Nebraska Press.
Ekman, P., & Friesen, W. (1971). Constants across cultures in the face and emotion. Journal of
Personality and Social Psychology, 17, 124–129.
Evans, D. (2001). Emotion: The science of sentiment. Oxford, England: Oxford University Press.
Fridlund, A. J. (1994). Human facial expression: An evolutionary view. San Diego, CA: Academic Press.
Goldberg, H. D. (1951). The role of ‘cutting’ in the perception of the motion picture. Journal of Applied
Psychology, 35, 70–71.
Goodenough, F. L., & Tinker, M. A. (1931). The relative potency of facial expression and verbal
description of stimulus in the judgement of emotion. Journal of Comparative Psychology, 12,
365–370.
Gould, S. J. (1980). A biological homage to Mickey Mouse. In The panda’s thumb: More reflections in
natural history (pp. 95–107). New York, NY: W.W. Norton & Company.
Halberstadt, J., Winkielman, P., Niedenthal, P. M., & Dalle, N. (2009). Emotional conception: How
embodied emotion concepts guide perception and facial action. Psychological Science, 20,
1254–1261.
Hill, S. P. (1967). Kuleshov – Prophet without honor? Film Culture, 44, 1–41.
Hoffman, M. L. (1984). Interaction of affect and cognition in empathy. In C. E. Izard, J. Kagan, &
R. B. Zajonc (Eds.), Emotions, cognition, and behaviour (pp. 103–131). Cambridge, England:
Cambridge University Press.
Innes-Ker, Å. (2003). Gestalt perception of emotional expressions (doctoral dissertation). Indiana
University, Bloomington, IN.
Klinnert, M. D. (1984). The regulation of infant behavior by maternal facial expression. Infant Behavior
and Development, 7, 447–465.
Lang, P. J., Bradley, M. M., & Cuthbert, B. N. (2005). International affective picture system (IAPS):
Digitized photographs, instruction manual, and affective ratings (Technical Report A-6). University of
Florida, Gainesville, FL.
LeDoux, J. E. (1998). The emotional brain: The mysterious underpinnings of emotional life. London,
England: Weidenfeld & Nicolson.
Leslie, A. M. (1994). ToMM, ToBY, and agency: Core architecture and domain specificity. In L.
A. Hirschfeld, & S. A. Gelman (Eds.), Mapping the mind: Domain specificity in cognition and
culture (pp. 119–148). Cambridge, England: Cambridge University Press.
Lorenz, K. (1970). Studies in animal and human behaviour (Vol. 1). Translated by R. Martin. London,
England: Methuen.
24 Perception 0(0)
Lundqvist, D., Flykt, A., & Öhman, A. (1998). The Karolinska directed emotional faces – KDEF
(CD-ROM) Department of Clinical Neuroscience, Psychology Section, Karolinska Institutet,
Stockholm, Sweden.
Mitry, J. (1967). Esthe´tique et psychologie du cine´ma (Vol. 2, pp. 72–77). Paris, France: Editions
Universitaires.
Mobbs, D., Weiskopf, N., Lau, H. C., Featherstone, E., Dolan, R. J., & Frith, C. D. (2006). The
Kuleshov effect: The influence of contextual framing on emotional attributions. Scan, 1, 95–106.
Munn, N. L. (1940). The effect of knowledge of the situation upon judgement of emotion from facial
expression. Journal of Abnormal and Social Psychology, 35, 324–338.
Neill, A. (1996). Empathy and (film) fiction. In D. Bordwell, & N. Carroll (Eds.), Post-theory:
Reconstructing film studies (pp. 175–194). Madison, WI: University of Wisconsin Press.
Öhman, A., & Soares, J. J. F. (1994). ‘Unconscious anxiety’: Phobic responses to masked stimuli.
Journal of Abnormal Psychology, 103, 231–240.
Persson, P. (2003). Understanding cinema: A psychological theory of moving imagery. Cambridge,
England: Cambridge University Press.
Pfaff, D. W. (1999). Drive: Neurobiological and molecular mechanisms of sexual motivation. Cambridge,
MA: MIT Press.
Prince, S., & Hensley, W. E. (1992). The Kuleshov effect: Recreating the classic experiment. Cinema
Journal, 31, 59–75.
Pudovkin, V. I. (1970). Film technique and film acting. Edited and translated by I. Montagu. New York,
NY: Grove Press, Inc.
Russell, J. A. (1980). The circumplex model of affect. Journal of Personality and Social Psychology, 39,
1161–1178.
Salt, B. (1974). Statistical style analysis of motion pictures. Film Quarterly, 28, 13–22.
Schurgin, M. W., Nelson, J., Iida, S., Ohira, H., Chiao, J. Y., & Franconeri, S. L. (2014). Eye
movements during emotion recognition in faces. Journal of Vision, 14, 14, 1–16.
Seligman, M. E. P. (1971). Phobias and preparedness. Behavior Therapy, 2, 307–320.
Smith, T. J. (2012). The attentional theory of cinematic continuity. Projections: The Journal for Movies
and the Mind, 6, 1–27.
Smith, T. J., Levin, D. T., & Cutting, J. E. (2012). A window on reality: Perceiving edited moving
images. Current Directions in Psychological Science, 21, 107–113.
Strobach, T., & Carbon, C.-C. (2013). Face adaptation effects: Reviewing the impact of adapting
information, time, and transfer. Frontiers in Perception Science. DOI: 10.3389/fpsyg.2013.00318.
Tomkins, S. S. (1995). In E. V. Demos (Ed.), Exploring affect: The selected writings of Silvan S.
Tomkins. Cambridge, England: Cambridge University Press.
Trope, Y. (1986). Identification and inferential processes in dispositional attribution. Psychological
Review, 93, 239–257.
Truffaut, F. (1984). Hitchcock (rev. ed.). In collaboration with H. G. Scott. New York, NY: Simon &
Schuster, Inc.
Võ, M. L.-H., Smith, T. J., Mital, P. K., & Henderson, J. M. (2012). Do the eyes really have it?
Dynamic allocation of attention when viewing moving faces. Journal of Vision, 12, 3, 1–14.
Wallbott, H. G. (1988). In and out of context: Influences of facial expression and context information
on emotion attributions. British Journal of Social Psychology, 27, 357–369.
Wallbott, H. G. (1990). The relative importance of facial expression and context information in
emotion attributions – Biases, influence factors, and paradigms. Advances in Psychology, 68,
275–283.
Wieser, M. J. & Brosch, T. (2012). Faces in context: A review and systematization of contextual
influences on affective face processing. Frontiers in Psychology, 3, 471, 1–13.
Zillmann, D. (1991). Empathy: Affect from bearing witness to the emotions of others. In J. Bryant, &
D. Zillmann (Eds.), Responding to the screen: Reception and reaction processes (pp. 135–167).
Hillsdale, NJ: Lawrence Erlbaum Associates.
Barratt et al. 25
Appendix A: Baseline Study of Object Shots

In order to ascertain that the object shots were providing the emotional contexts intended, we
ran a baseline study of the object shots independently of the glance shots. There were 21
different object shots (film clips) in total, given that the shot for the null (no context)
condition was originally presented four times. Eighteen students and members of staff
from Copenhagen Business School, Denmark (12 female, 6 male; age range 19–45 years;
mean age 30.2 years) participated in this study. All of the participants were fluent in
English and had normal or corrected-to-normal vision.
For the baseline study, the three questions used in the main experiment had to be slightly
rephrased in order to account for the fact that this time the task was to evaluate the
emotional content of the film clip itself as opposed to judging the emotional state of a
target person. Thus, after each film clip was presented the participants were asked: first, to
rate the emotional valence of the film clip on a 9-point scale from –4 (‘‘negative’’) to þ4
(‘‘positive’’); second, to rate the emotional intensity of the film clip on a 9-point scale from 1
(‘‘calm’’) to 9 (‘‘exciting’’); and third, to identify the type of emotion that they (the
participant) most associated with, or that they thought best ‘‘matched,’’ the film clip (by
selecting from a list of nine emotion categories). The participants’ eye movements were
recorded but only one calibration was used. All of the other details were the same as for
the main experiment.
Table A1 presents the mean valence and arousal ratings (Questions 1 and 2), and the
proportions of trials in which the target category was chosen (Question 3) for each of the 21
Table A1. Results of questionnaire data for individual object shots (baseline study).
Object shot Question 1 (valence) Question 2 (arousal) Question 3 (category)
Happiness 1 (baby) 2.873 0.704 1.000

Happiness 2 (child) 2.317 –0.074 0.889
Happiness 3 (puppy) 2.095 –1.296 0.778
Happiness 4 (kitten) 1.873 –0.907 0.611
Desire 1 (woman) 1.540 –0.074 0.778

Desire 2 (woman) 1.540 0.426 0.778
Desire 3 (man) 0.262 –0.463 0.444
Desire 4 (man) 1.151 –0.352 0.556
Sadness 1 (coffin) –2.849 –0.241 0.722

Sadness 2 (graveyard) –2.905 –0.185 0.556
Sadness 3 (wreath) –1.683 –0.852 0.944
Sadness 4 (cemetery) –2.405 –0.130 0.833
Fear 1 (snake) –2.794 2.981 0.889

Fear 2 (spider) –2.738 2.870 0.556
Fear 3 (lion) 0.429 2.593 0.389
Fear 4 (tiger) 0.317 2.370 0.389
Hunger 1 (soup) –0.183 –2.241 0.611

Hunger 2 (bread) 0.151 –1.907 0.889
Hunger 3 (rice) 0.651 –2.019 0.778
Hunger 4 (potatoes) –0.016 –2.185 0.556
NULL 0.373 0.981 0.000

26 Perception 0(0)
−3 3
exciting 3
●
fear
● NULL
arousal
●
● desire ●
sadness happiness
● hunger
calm −3
negative positive
valence
Figure A1. Questions 1 and 2: mean valence and arousal ratings by condition (baseline study).
film clips. As in the main experiment, the valence and arousal ratings have been centered at
the overall means for the six conditions (valence ¼ 0.183; arousal ¼ 4.519). The strongest
evidence that the film clips were working in the intended way came from the answers to
Question 3. For all of the 20 film clips corresponding to the five emotional conditions, the
participants chose the target emotion more frequently than the alternative options (that is,
well above the 0.111 chance level), although certain film clips were more effective than others:
for example, the baby and the child were the strongest examples of happiness stimuli, and the
snake and the spider were the strongest examples of fear stimuli. The one problematic
condition was the null (no context) condition which was often categorized as surprise,
anticipation, or excitement. This discrepancy, however, may have been due to the fact that
the countdown sequence was very different from the other film clips and was only presented
once on this occasion.
In order to obtain a clearer picture of how the film clips were operating in terms of valence
and arousal, we looked at the mean ratings by condition. Figure A1 presents the mean ratings
for valence and arousal for each of the six conditions (as single points in a two-dimensional
space), while Table A2 presents the results of a regression analysis of the rating data. As
before, the valence and arousal ratings have been centered at the overall means (valence ¼
0.183; arousal ¼ 4.519). The valence estimates show that the happiness and desire clips were
Barratt et al. 27
Table A2. Questions 1 and 2: mean valence and arousal ratings by condition (baseline study).
Question 1 (valence) Question 2 (arousal)
Estimate SE df T p Estimate SE df t p
Happiness 2.290 0.468 15.66 4.893 .000*** –0.394 0.266 19.75 –1.478 .155
Desire 1.123 0.468 15.66 2.400 .029* –0.116 0.266 19.75 0.435 .669
Sadness –2.460 0.468 15.66 –5.257 .000*** –0.352 0.266 19.75 –1.322 .201
Fear –1.196 0.468 15.66 –2.557 .021* 2.704 0.266 19.75 10.155 .000***
Hunger 0.151 0.468 15.66 0.322 .752 –2.088 0.266 19.75 –7.842 .000***
NULL 0.373 0.928 15.17 0.402 .693 0.982 0.496 16.54 1.980 .065
*p < .05; ***p < .001.
rated significantly higher than average, while the sadness and fear clips were rated
significantly lower than average. The arousal estimates, on the other hand, show that the
fear clips were rated significantly higher than average, while the hunger clips were rated
significantly lower than average. In summary, three out of the four conditions (for which
we had specific hypotheses) were located in the expected quadrants. The one potentially
problematic condition was the desire condition which was rated high in terms of valence
(as expected) but low in terms of arousal (not expected). In this case, the discrepancy may
have been due to the fact that the perception of desire is relatively context-dependent and
more likely to be influenced by a variety of factors (such as the gender of the observer/
participant).
28 Perception 0(0)
Appendix B: Additional Information on Experimental Design.
Table B1. Six groups with different face-context combinations.
KDEF code Group 1 Group 2 Group 3 Group 4 Group 5 Group 6
Female 1 AF01 Desire 1 NULL Sadness 1 Hunger 3 Happiness 3 Fear 3

(woman) (coffin) (rice) (puppy) (lion)
Female 2 AF06 Desire 2 NULL Sadness 2 Hunger 4 Happiness 4 Fear 4
(woman) (graveyard) (potatoes) (kitten) (tiger)
Female 3 AF07 Fear 1 Desire 1 NULL Sadness 3 Hunger 3 Happiness 3
(snake) (woman) (wreath) (rice) (puppy)
Female 4 AF08 Fear 2 Desire 2 NULL Sadness 4 Hunger 4 Happiness 4
(spider) (woman) (cemetery) (potatoes) (kitten)
Female 5 AF11 Happiness 1 Fear 1 Desire 1 NULL Sadness 3 Hunger 3
(baby) (snake) (woman) (wreath) (rice)
Female 6 AF18 Happiness 2 Fear 2 Desire 2 NULL Sadness 4 Hunger 4
(child) (spider) (woman) (cemetery) (potatoes)
Female 7 AF20 Hunger 1 Happiness 1 Fear 1 Desire 3 NULL Sadness 3
(soup) (baby) (snake) (man) (wreath)
Female 8 AF22 Hunger 2 Happiness 2 Fear 2 Desire 4 NULL Sadness 4
(bread) (child) (spider) (man) (cemetery)
Female 9 AF24 Sadness 1 Hunger 1 Happiness 1 Fear 3 Desire 3 NULL
(coffin) (soup) (baby) (lion) (man)
Female 10 AF26 Sadness 2 Hunger 2 Happiness 2 Fear 4 Desire 4 NULL
(graveyard) (bread) (child) (tiger) (man)
Female 11 AF32 NULL Sadness 1 Hunger 1 Happiness 3 Fear 3 Desire 3
(coffin) (soup) (puppy) (lion) (man)
Female 12 AF34 NULL Sadness 2 Hunger 2 Happiness 4 Fear 4 Desire 4
(graveyard) (bread) (kitten) (tiger) (man)
Male 1 AM02 Desire 3 NULL Sadness 3 Hunger 1 Happiness 1 Fear 1

(man) (wreath) (soup) (baby) (snake)
Male 2 AM04 Desire 4 NULL Sadness 4 Hunger 2 Happiness 2 Fear 2
(man) (cemetery) (bread) (child) (spider)
Male 3 AM08 Fear 3 Desire 3 NULL Sadness 1 Hunger 1 Happiness 1
(lion) (man) (coffin) (soup) (baby)
Male 4 AM10 Fear 4 Desire 4 NULL Sadness 2 Hunger 2 Happiness 2
(tiger) (man) (graveyard) (bread) (child)
Male 5 AM12 Happiness 3 Fear 3 Desire 3 NULL Sadness 1 Hunger 1
(puppy) (lion) (man) (coffin) (soup)
Male 6 AM13 Happiness 4 Fear 4 Desire 4 NULL Sadness 2 Hunger 2
(kitten) (tiger) (man) (graveyard) (bread)
Male 7 AM21 Hunger 3 Happiness 3 Fear 3 Desire 1 NULL Sadness 1
(rice) (puppy) (lion) (woman) (coffin)
Male 8 AM23 Hunger 4 Happiness 4 Fear 4 Desire 2 NULL Sadness 2
(potatoes) (kitten) (tiger) (woman) (graveyard)
Male 9 AM26 Sadness 3 Hunger 3 Happiness 3 Fear 1 Desire 1 NULL
(wreath) (rice) (puppy) (snake) (woman)
Male 10 AM28 Sadness 4 Hunger 4 Happiness 4 Fear 2 Desire 2 NULL
(cemetery) (potatoes) (kitten) (spider) (woman)
Male 11 AM31 NULL Sadness 3 Hunger 3 Happiness 1 Fear 1 Desire 1
(wreath) (rice) (baby) (snake) (woman)
Male 12 AM34 NULL Sadness 4 Hunger 4 Happiness 2 Fear 2 Desire 2
(cemetery) (potatoes) (child) (spider) (woman)
KDEF ¼ Karolinska Directed Emotional Faces.

Does The Kuleshov Effect Really Exist?

Uploaded by

Copyright:

Available Formats

You might also like

Does The Kuleshov Effect Really Exist?

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Does The Kuleshov Effect Really Exist?

Uploaded by

Copyright:

Available Formats

Article

Classic Film Experiment DOI: 10.1177/0301006616638595

on Facial Expressions and

Anna Cabak Rédei

Joost van de Weijer

Understanding a Kuleshov-type Sequence

Figure 1. Types of POV structure and corresponding camera positions.

Related Studies in Emotion Research

Previous Attempts at Replication

Figure 2. Trial procedure and examples of stimuli.

Figure 3. Predictions for valence and arousal ratings by condition.

Results and Analysis

Figure 4. Questions 1 and 2: mean valence and arousal ratings by condition.

Table 1. Questions 1 and 2: mean valence and arousal ratings by condition.

Question 1 (valence) Question 2 (arousal)

*p < .05; **p < .01; ***p < .001.

Figure 5. Question 3: relative frequency of emotion categories by condition.

Eye Tracking Data

Figure 6. Eye tracking data: examples of areas of interest (AOIs).

Happiness Desire Sadness Fear Hunger NULL

Proportion Eyes 0.93–0.86 0.91–0.85 0.93–0.92 0.90–0.86 0.90–0.90 0.90–0.83

Mean total Eyes 537–550 570–575 543–571 562–609 550–580 583–579

Mean Eyes 338–439 314–480 322–454 340–427 365–444 337–514

AOI ¼ area of interest.

Discussion and Conclusion

Declaration of Conflicting Interests

Appendix A: Baseline Study of Object Shots

Object shot Question 1 (valence) Question 2 (arousal) Question 3 (category)

Happiness 1 (baby) 2.873 0.704 1.000

Desire 1 (woman) 1.540 –0.074 0.778

Sadness 1 (coffin) –2.849 –0.241 0.722

Fear 1 (snake) –2.794 2.981 0.889

Hunger 1 (soup) –0.183 –2.241 0.611

NULL 0.373 0.981 0.000

Question 1 (valence) Question 2 (arousal)

*p < .05; ***p < .001.

Appendix B: Additional Information on Experimental Design.

Table B1. Six groups with different face-context combinations.

KDEF code Group 1 Group 2 Group 3 Group 4 Group 5 Group 6

Female 1 AF01 Desire 1 NULL Sadness 1 Hunger 3 Happiness 3 Fear 3

Male 1 AM02 Desire 3 NULL Sadness 3 Hunger 1 Happiness 1 Fear 1

KDEF ¼ Karolinska Directed Emotional Faces.

You might also like

p < .05; p < .01; p < .001.