Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Discourse Processes

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/hdsp20

Visuospatial Working Memory and Understanding


Co-Speech Iconic Gestures: Do Gestures Help to
Paint a Mental Picture?

Ying Choon Wu, Horst M. Müller & Seana Coulson

To cite this article: Ying Choon Wu, Horst M. Müller & Seana Coulson (2022) Visuospatial
Working Memory and Understanding Co-Speech Iconic Gestures: Do Gestures Help to Paint a
Mental Picture?, Discourse Processes, 59:4, 275-297, DOI: 10.1080/0163853X.2022.2028087

To link to this article: https://doi.org/10.1080/0163853X.2022.2028087

View supplementary material

Published online: 10 Feb 2022.

Submit your article to this journal

Article views: 164

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=hdsp20
DISCOURSE PROCESSES
2022, VOL. 59, NO. 4, 275–297
https://doi.org/10.1080/0163853X.2022.2028087

Visuospatial Working Memory and Understanding Co-Speech


Iconic Gestures: Do Gestures Help to Paint a Mental Picture?
a b c
Ying Choon Wu , Horst M. Müller , and Seana Coulson
a
Institute for Neural Computation, University of California, San Diego; bFaculty of Linguistics and Literary Studies,
Bielefeld University; cCognitive Science Department, University of California, San Diego

ABSTRACT
Multi-modal discourse comprehension requires speakers to combine infor­
mation from speech and gestures. To date, little research has addressed the
cognitive resources that underlie these processes. Here we used a dual-task
paradigm to test the relative importance of verbal and visuospatial working
memory in speech-gesture comprehension. Healthy, college-aged partici­
pants encoded either a series of digits (verbal load) or a series of dot
locations in a grid (visuospatial load) and rehearsed them (secondary mem­
ory task) as they performed a (primary) multi-modal discourse comprehen­
sion task. Regardless of the secondary task, performance on the discourse
comprehension task was better when the speaker’s gestures and speech
were congruent than when they were incongruent. However, the congruity
advantage was smaller when the concurrent memory task involved
a visuospatial load than when it involved a verbal load. Results suggest
that taxing the visuospatial working memory system reduced participants’
ability to benefit from the information in congruent iconic gestures. A control
experiment demonstrated that results were not an artifact of the difficulty of
the visuospatial load task. Overall, these data suggest speakers recruit visuos­
patial working memory to interpret gestures about concrete visual scenes.

During multi-modal discourse comprehension, listeners are tasked with integrating visual informa­
tion conveyed in speakers’ gestures with semantic information conveyed by their speech. Utilizing
gestural information likely recruits working memory (WM) resources because it relates to linguistic
information at varying levels of granularity, such as the word-, phrase-, and sentence-levels (Kendon,
2004). Here we investigate the relative import of verbal versus visuospatial WM resources for under­
standing discourse accompanied by iconic gestures, that is, body movements that signal visuospatial
properties of objects and events in the accompanying speech.

Processes in multi-modal discourse comprehension


Several factors point to a role for working memory (WM) in the coordination of semantic information
across speech and various sorts of co-speech gestures. First, the relationship between the production of
deictic or pointing gestures and the availability of referents in discourse suggests that such gestures help
listeners activate semantic information in WM. Speakers are more likely to gesture when they produce
a noun to refer to a new object than for a noun that is an ongoing topic of discussion (Ateş & Küntay, 2018),
and they are more likely to gesture with a pronoun that reintroduces a discourse referent than with
pronoun that maintains reference to an already activated antecedent (Perniss & Özyürek, 2015). Even in

CONTACT Seana Coulson scoulson@ucsd.edu University of California, Cognitive Science Building, Mail Code 0515, 9500
Gilman Drive, La Jolla, CA 92093, USA
Supplemental data for this article can be accessed on the publisher’s website.
© 2022 Taylor & Francis Group, LLC
276 WU, ET AL.

Figure 1. Informant’s description of a kitchen countertop.

Turkish, for which the production of pronouns is equally likely for referents that are reintroduced as
maintained, speakers are more likely to gesture when the referent is less accessible in WM (Azar et al.,
2019).
Further, the delivery of information in the speech and gesture streams is somewhat asynchronous
and may require WM for its integration. For example, Figure 1 shows frames from a video in our
corpus in which an informant describes a photograph of a kitchen countertop. The speaker says, “the
countertop goes over a bit – kind of curved in the middle,” and accompanies his utterance with the
three gestures depicted in the figure. As the speaker says, “the countertop goes over a bit,” his speech is
accompanied by the gesture shown in the first three frames of the video. This gesture shows the
relationship between the countertop (overtly mentioned in his speech) and the cabinet face (inferred
from background knowledge about kitchens). Accompanied by “kind of curved in the middle,” the
next three frames show the curved overhang of the countertop mentioned in the initial part of the
utterance. The final frame of the video is unaccompanied by speech and involves the speaker extending
his hand to again depict the countertop and its spatial relationship to the cabinet face.
Although there is a clear relationship between the meaning of the gestures and the speech depicted in
Figure 1, the information in the two streams unfolds dynamically, but at different timescales. For
example, the onset of the first gesture precedes the mention of its referent in the speech (“countertop”),
while the gesture in the final frame follows it. Without the accompanying speech, these gestures may be
difficult to interpret. However, coupled with the speech, they afford the listener a more visually specific
representation of the discourse referent, namely information about the shape of the countertop and the
fact that it extends somewhat past the edge of the cabinets. These anecdotal observations are supported
by more systematic analyses of multimodal corpora that indicate the temporal dynamics of speech and
gestures are tightly coupled, with relationships at multiple time scales (Pouw & Dixon, 2019).
Coulson and Wu (2014) have suggested that speech and iconic gestures each activate conceptual
representations in semantic memory, and that language comprehenders can integrate the information
from both channels to form visually enhanced cognitive models of the discourse referents. However, the
slight asynchrony in the way that information is presented in speech and gestures likely requires listeners
to recruit WM resources to maintain gestures until they can be linked to relevant information presented
in the speech. Gestures typically precede related speech (Butterworth & Beattie, 1978; Morrel-Samuels &
Krauss, 1992; Pouw & Dixon, 2020), and, at least for beat gestures, comprehenders more readily interpret
the gestures that precede speech than those that follow it (Leonard & Cummins, 2011). Temporal
asynchrony between speech and iconic gestures has also been shown to affect how much influence
gestures have on discourse interpretation (Habets et al., 2011; Obermeier & Gunter, 2014; Obermeier
et al., 2011), suggesting WM limitations may attenuate gestures’ beneficial impact on comprehension.

Working memory and gesture production


Accounts of the functional role of gestures in speech production have motivated a number of studies of
the relationship between WM resources and the production of co-speech iconic gestures. Such
accounts posit a facilitative role of gesture in lexical retrieval (Krauss et al., 1996), and/or the
DISCOURSE PROCESSES 277

segmentation of message-level conceptual information into linguistic units (Kita, 2000). Accordingly,
most evidence to date suggests the speakers who gesture the most are those with the most heavily taxed
cognitive resources. For example, gestures are produced more frequently by individuals with lower
versus higher verbal WM capacity (Gillespie et al., 2014; Hostetter & Alibali, 2007; Smithson &
Nicoladis, 2013). Further, Chu et al. (2014) found that the production of iconic gestures was negatively
correlated with scores on tests of spatial skills, including memory for visual patterns, mental rotation,
and the Corsi block-tapping task (hereafter, Corsi block task).
Beyond correlational studies, investigators have explored the relationship between WM load and
gesture rate. Consistent with a facilitative role for gesture in language production, reducing available
WM resources has been found to increase gesture rate (Smithson & Nicoladis, 2014). Similarly,
limiting participants’ ability to gesture has a detrimental impact on their ability to retain verbal
(Goldin-Meadow et al., 2001; Marstaller & Burianová, 2013; Ping & Goldin-Meadow, 2010) or
visuospatial information (Wagner et al., 2004). Using a dual-task paradigm, Cook et al. (2012) showed
that improved performance on verbal and visuospatial secondary recall tasks occurred with the
production of meaningful gestures, but not with meaningless hand movements. Considered together,
this literature suggests that the production of co-speech iconic gestures facilitates language production
by reducing demands on both verbal and visuospatial WM resources.

Working memory and gesture comprehension


In contrast to its role in gesture production, the importance of WM in gesture comprehension has
received far less attention. In keeping with the suggestion that gestures help activate visual and spatial
information about discourse referents (Coulson & Wu, 2014), Wu and Coulson (2014) explored the
visuospatial resources hypothesis that multi-modal discourse comprehension recruits visuospatial
WM during the interpretation of co-speech gestures. Accordingly, they used a picture probe classifica­
tion task to test the hypothesis that multi-modal discourse allows the listener to infer a more visually
specific discourse representation than does the speech alone. In this task, participants view a discourse
prime involving both speech and gestures followed by a picture probe that they judge as either related
or unrelated to the previous stretch of discourse (Wu & Coulson, 2014). Reaction times for related
picture probes are typically faster following discourse primes with congruent gestures that match the
concurrent speech than those with incongruent gestures, suggesting that congruent iconic gestures
help convey information about the discourse referents (Wu & Coulson, 2014).
Consistent with the suggestion that speech-gesture integration recruits the visuospatial WM
system, listeners with high visuospatial WM capacity benefit more from congruent gestures than
do listeners with lower WM capacity. Wu and Coulson (2014) found that participants with greater
visuospatial WM capacity had larger congruity effects – that is, differences in response times to
videos with congruent versus incongruent gestures. Moreover, imposing a concurrent verbal load
during this task yielded independent, additive effects of gesture congruity and WM load, whereas
a concurrent visuospatial load yielded interactive effects, as gesture congruity effects were greatly
attenuated under conditions of high visuospatial load (Wu & Coulson, 2014). The possibility of
deriving extra information from gestures thus seems to rely on the availability of visuospatial WM
resources.
One shortcoming of this earlier research, however, is that the impact of verbal load on gesture
comprehension was assessed in one group of study participants, while the impact of visuospatial load
was assessed in another. Observed differences in verbal versus visuospatial load may reflect incidental
differences in the underlying cognitive abilities of the two study groups or differences in the strategies
the each group used. The former possibility is particularly salient in view of models of WM that
emphasize the importance of individual differences in domain-general abilities in executive function
over modality-specific WM systems (e.g., Engle, 2002). According to such models, WM capacity
differences arise from domain-general differences in the ability to maintain recently encoded
278 WU, ET AL.

information in the face of intervening information. Such executive attention models may explain the
results reported by Wu and Coulson (2014) as reflecting group differences in executive attention and
fluid intelligence.

Present study
In the present study, we adopted a within-participants design to directly compare the impact of
a concurrent verbal versus visuospatial load on multi-modal discourse comprehension. The logic of
this dual-task paradigm is that if the two tasks recruit shared cognitive resources, performance of the
secondary task will impact performance of the primary task. In Experiment 1, the primary task
involved the discourse comprehension task used in Wu and Coulson (2014), accompanied in some
trials by a concurrent verbal recall task, and in others by a concurrent visuospatial recall task. In
particular, we examined how imposing a verbal versus visuospatial load impacts the size of gesture
congruity effects, that is, faster responses for discourse accompanied by congruent gestures than
incongruent ones. Finding smaller gesture congruity effects with visuospatial than verbal load,
Experiment 1 suggests participants’ ability to benefit from gestures was affected more by the con­
current visuospatial than the verbal recall task.
Because the results of Experiment 1 may be explained by an imbalance in the domain-general
difficulty of our secondary memory tasks, Experiment 2 addressed the possibility that the visuospatial
recall task was more difficult than the verbal one. Accordingly, Experiment 2 paired the same recall
tasks with a sentence processing task that placed heavy demands on verbal WM. An asymmetry in the
demands of the recall tasks would predict similar results for Experiment 2 as Experiment 1 – namely
that the visuospatial recall task would have a greater impact on sentence processing than would the
verbal recall task. That is, just as gesture congruity effects were smaller with the imposition of the
visuospatial recall task, so too would parsing effects be smaller when paired with visuospatial recall.
In fact, Experiment 2 reveals parsing demands effects were smaller when participants performed the
concurrent verbal recall task than the visuospatial task. Results argue against an asymmetry in the
demands that the recall tasks place on central processing resources. Rather, their relative impact on the
primary task – viz. the size of either gesture congruity effects or parsing demands effects – is a function
of the overlap in the modality-specific demands of the tasks paired in the dual-task paradigm. We
return to the modality-specific demands of speech-gesture integration in the General Discussion.

Experiment 1
Our previous research suggests that speech-gesture integration recruits cognitive resources shared by
visuospatial WM load tasks (Wu & Coulson, 2014). Participants’ ability to integrate information in the
speech and gestural channels was indexed in this paradigm by faster responses following congruent
versus incongruent gestures. Consequently, if this primary task is paired with a secondary task that
diverts cognitive resources, we would expect a reduction or elimination of congruity effects. That is,
the presence of a large congruity effect, even under conditions of memory load, would suggest that the
resources used in the two tasks are largely independent of one another. Alternatively, a small congruity
effect would signal that the resources needed for speech-gesture integration were unavailable because
of the demands of the secondary memory task (see Baddeley, 1992, for articulation of this rationale).
Given that the secondary tasks used here have previously been shown to be roughly matched for
difficulty (Wu & Coulson, 2014), the critical question is whether congruity effects are larger when the
discourse comprehension task is paired with a concurrent verbal versus visuospatial load task.
Hypothesizing that visuospatial WM resources are more important for speech-gesture integration
than verbal WM, we predicted the congruity effects would be smaller under conditions of visuospatial
than verbal WM load. Executive attention models, which, by contrast, emphasize a relationship
between WM abilities and the allocation of domain-general attentional resources, would predict
similarly sized congruity effects under both types of secondary tasks.
DISCOURSE PROCESSES 279

Methods
Participants
Participants were 60 undergraduates (39 female). All gave informed consent and received academic
course credit for participation. All participants were fluent English speakers.

Materials
Materials for the primary (discourse comprehension) task were identical to those used in Wu
and Coulson (2014). Discourse primes were derived from continuous video footage of sponta­
neous discourse centered on daily activities, events, and objects. The speaker in the video was
naïve to the experimenters’ purpose and received no explicit instructions to gesture. The speaker
responded to prompts from the experimenter to explain how to perform various actions and to
describe pictures. The experimenter offered encouraging back-channel responses (e.g., head
nods), and occasionally asked clarifying questions. Phenomena such as dysfluent repetitions
and filled pauses were present. Topics varied widely, ranging from the height of a child (“son
whose name is Forest – he’s about this tall,”), the angle of the pendulum on a grandfather clock
(“the pendulum is slightly – it’s not hanging straight down,”), the shape of a refrigerator (“a
refrigerator, one of those mini kind of dorm-room refrigerators,”), to swinging a golf club (“but
you y’know – you take the the golf clubs”), and other concrete, easily visualizable topics.
Short segments (2–8 s) were extracted in which the speaker produced both speech and iconic
gesture during his utterance. Clips were designed to capture the speech and gestures that related
to a particular message. Moreover, clips were chosen to feature gestures that revealed informa­
tion not fully specified by the speech, such as the size, shape, or spatial configuration of the
discourse referents. Each clip comprised 1–2 intonation units (Chafe, 1987) and their accom­
panying 2–3 iconic gestures. Approximately 60% of the clips involved a gesture that was repeated
once in reduced form, approximately 30% involved two distinct gestures, and the remainder
involved three gestures (mostly two repetitions of the first gesture). All clips began with the
stroke of the first gesture.
For congruent primes, the original association between the speech and gesture was preserved.
To create incongruent counterparts, audio and video portions of congruent clips were swapped
using Final Cut Pro such that across items, all of the same speech and gesture files were
presented; however, they no longer matched in meaning. Because of the discontinuity between
oro-facial movements and verbal output in incongruent items, the speaker’s face was blurred in
all discourse primes (congruent and incongruent). In an independent norming study using
a 5-point Likert scale, the degree of semantic match between speech and gesture in the
congruent trials was rated on average as 1.6 points higher than in the incongruent trials (3.8
(SD = 0.8) vs. 2.2 (SD = 0.7)).
Related picture probes were derived from photographs depicting objects and scenes denoted
by both the spoken and gestured portions of a discourse prime (see Figure 2). Unrelated filler
trials were constructed by pairing discourse primes with picture probes with content that bore
no obvious semantic relationships with either the preceding speech or gestures, as deemed by the
experimenters. Related and unrelated trials were counterbalanced across four randomized lists,
each containing 168 trials, such that each picture occurred as a related probe following its
associated congruent (42 trials) and incongruent (42 trials) discourse primes, and as an unrelated
probe following a different pair of congruent (42 trials) and incongruent (42 trials) discourse
primes. No probes or primes were repeated within any list, but across lists each probe was
presented in all four of these conditions. Verbal and visuospatial secondary recall tasks were
evenly distributed across 50% of each trial type.
280 WU, ET AL.

Figure 2. Discourse primes paired an audio file with either congruent (top panel) or incongruent (bottom panel) gestures. Related
picture probes were related to the spoken portion of both sorts of primes, and to the gestural portion of the congruent ones.

Secondary recall tasks


Each participant performed two types of secondary recall tasks. The verbal load task involved remem­
bering sequences of spoken numbers. The visuospatial load task involved remembering sequences of dot
locations in a two-dimensional grid. During the encoding phase of the verbal task, a series of four
numbers (each ranging between numbers 1–9) were selected pseudo-randomly and presented via digital
audio files while a central fixation cross remained on the computer screen. For the visuospatial task, four
dots were shown sequentially in squares selected pseudo-randomly within a 4 × 4 matrix.
After the intervening primary task, participants were prompted to recall the secondary memory
sets. In the case of verbal loads, an array of randomly ordered digits from 1–9 appeared in a row in the
center of the screen, and participants clicked the mouse on the numbers that they remembered hearing
in the order that they were presented. Randomization of the numeric array was done to discourage
participants from adopting a visual (e.g., imagining movements along a number line) or motoric (e.g.,
imagining movements on a numeric keypad) encoding strategy for the digits. In the case of visuos­
patial loads, a blank 4 × 4 grid appeared, and participants clicked the mouse in the boxes where the
dots had appeared in the order that they remembered seeing them. Although a verbal encoding
strategy was possible in principle for this task (viz. labeling each square and then rehearsing the labels),
pilot testing suggested participants found it much easier to do the task by visualizing the movement of
the dots. For both types of recall, written feedback (either “Correct” or “Incorrect”) was shown on the
monitor for 0.5 s after the final mouse click.
DISCOURSE PROCESSES 281

Trial structure
As outlined in Figure 3, each trial began with a fixation cross (1 s), followed by the encoding phase of
the secondary task. In the case of visuospatial loads, each dot remained visible on the grid for 1 s. In the
case of verbal loads, sound files lasting approximately 500 ms each were presented successively with
500 ms pauses in between. A 0.5-s pause concluded the encoding phase.

Primary task
The discourse comprehension portion of each trial began with a discourse video, presented at a rate of
30 ms per frame in the center of a computer monitor. Immediately following the video offset, a picture
probe appeared in the center of the screen and remained visible until a response was registered. Two
squares labeled “Yes” versus “No” accompanied each picture at the bottom of the screen. Squares were
arranged side by side, and the mouse cursor was initialized to a location equidistant between the two.
Participants were asked to respond by clicking the mouse in the square labeled “Yes” on related trials
and “No” on unrelated ones. No feedback was given.

Secondary recall task


After a brief pause (250 ms), participants were prompted to recall secondary memory items. Written
feedback on secondary recall accuracy (“Correct” or “Incorrect”) was presented for 500 ms. Between
trials, the screen was blank for 0.5 s and the mouse cursor was reset to a neutral, hidden position.

Figure 3. Schematic depiction of trial structure. Secondary encoding involved either the visual presentation of a sequence of four
dots in a 4 × 4 grid (visuospatial load, dots trials) or the auditory presentation of a sequence of four digits (verbal load digits trials).
The discourse prime and picture relatedness task were similar in both sorts of trials. Secondary recall involved either clicking the
mouse on the remembered grid locations of the four dots in sequence on the grid (visuospatial load, dots trials), or clicking on the
remembered numbers in the sequence that they were presented (verbal load, digits trials).
282 WU, ET AL.

Procedure
The entire experimental session lasted approximately 2 hours. First, participants engaged in the dual-
task paradigm. Instructions began with an explanation of each kind of memory task – that is, the
verbal load task, referred to as the “digits” trials, and the visuospatial task, referred to as the “dots”
trials. Participants were then told that each trial would also involve a video of a man describing
everyday objects and actions followed by a photograph. Participants were asked to watch and listen to
each video, to respond “Yes” or “No” whether the photograph depicted what the speaker was
describing, and then to recall numbers or dot locations as prompted. Participants were encouraged
to respond both as quickly and accurately as possible on the primary and secondary tasks and given the
opportunity to test their skills on a short practice block that comprised two dots trials and two digits
trials. They were also encouraged to either visually or verbally rehearse items to be remembered.
After completion of the dual-task portion of the experiment, verbal and visuospatial WM capacity
was assessed through two short tests. As in Wu and Coulson (2014), adaptations of the sentence span
task (Daneman & Carpenter, 1980) and the Corsi block task (Milner, 1971) were administered, in
counterbalanced order across subjects. The sentence span task involved listening to sequences of
unrelated sentences and remembering the sentence final word in each. All trials contained between
two and five sentences, depending on the level; for example, a trial at level 2 comprised two sentences,
a trial at level 3 comprised three sentences. A trial was scored as correct if all sentence final words were
accurately recalled in any order. A block was defined as three trials at a given level (viz. three sets of
two sentences at level 2, three sets of three sentences at level 3), and participants completed one block
at each level. Participants began at level 2, followed sequentially by levels 3, 4, and 5. If a participant
correctly recalled both sentence final words in at least two of the three trials in level 2, then they were
allowed to proceed to level 3; if they correctly recalled the three words in at least two of the three trials
in level 3, then they were allowed to proceed to level 4; and, if they correctly recalled the four words in
at least two of the three trials in level 4, then they were allowed to proceed to level 5. An individual’s
span was the highest level at which they correctly recalled the words in at least two of the three trials in
the block. If they correctly recalled words in only one of the three trials, they received an additional
half point, but were not allowed to proceed to the next level. sentence span scores in Experiment 1
ranged from 2 (just above the minimum possible score of 1) to the maximum possible score of 5. The
average score was 3.89 (SD = 0.80).
In the Corsi block task, an asymmetric array of nine squares was presented on a computer monitor.
On each trial, between three and nine of the squares flashed in sequence, with no square flashing more
than once. Participants reproduced patterns of flashes immediately afterward by clicking their mouse
in the correct sequence of squares. As in the sentence span task, trials were blocked by level, beginning
with level 3, which comprised sequences of three squares, and increased sequentially until level 9,
which comprised sequences of nine squares. There were five sequences at each level. An individual’s
Corsi block score was the highest level in which at least one of the five sequences was correctly
replicated (Conway et al., 2005). Corsi block scores in Experiment 1 ranged from a minimum score of
5 to the maximum possible score of 9. The average score among our participants was 7.74 (SD = 1.01).

Analysis
Primary discourse comprehension task performance
Data from four participants were excluded because of chance level accuracy on the primary task, and
three more for poor performance on the secondary task. Analysis of performance on the discourse
comprehension task thus included data from the remaining 53 participants. Analysis of accuracy on
the primary discourse comprehension task involved the use of a generalized linear mixed effects model
to predict trial level accuracy based on the experimental factors of congruity and load modality.
Response latencies were computed from the onset of the picture probe to the time of the key press.
Only correct responses to related probes were analyzed.
DISCOURSE PROCESSES 283

To explore how speech-gesture congruity impacted response times to picture probes during each of
the two memory tasks, we used linear mixed effects models (Baayen et al., 2008). Initial modeling
attempts involved a “maximal” model (Barr et al., 2013); however, but various failures to converge
suggested overparameterization. Because the generalizability of our discourse primes was also of
interest in our analyses of accuracy, we decided to apply the same parsimonious approach to random
effects in all analyses (Gelman & Hill, 2006, p. 275; Yu, 2015). Consequently, both for our mixed effects
models of response times and for our logistic regression model of accuracy for the primary discourse
comprehension task, the random effect structure comprised one random intercept term for subjects
and another for items.
The role of individual differences in WM capacity for gesture congruity effects was examined
by calculating a difference score for each participant in each of the two load modality conditions
by subtracting response times following congruent videos from those following incongruent
ones. Difference scores were then regressed against sentence span scores (our independent
measure of verbal WM capacity) and Corsi block scores (our independent measure of visuos­
patial WM capacity).

Secondary recall task performance


In principle, overlapping resources across the two tasks manifest either as an impact of the recall
task manipulation on participants’ speed and accuracy on the primary discourse comprehension
task or as an impact of the gesture congruity manipulation on recall scores. Initial analysis of
recall performance thus used logistic regression to model correct recall on each trial as
a function of recall task and congruity. Random intercept terms for both subjects and items
were employed.
Our previous research, however, suggested that in this paradigm recall performance is largely
a function of individual differences in WM capacity. Accordingly, we used linear regression to explore
the relationship between the proportion of correct trials on each recall task with individual differences
in visuospatial and verbal WM capacity.

Results
We begin by reporting analyses of performance on the primary discourse comprehension task
and follow with analyses of participants’ accuracy on the secondary recall task. Performance on
the discourse comprehension task was near ceiling (see Table 1); therefore, we focused on
response times in this task. Responses were faster following congruent than incongruent ges­
tures, and the gesture congruity effect was larger when paired with the verbal than the
visuospatial recall task (see Figure 4). Analysis of individual differences revealed a relationship
between Corsi block scores and gesture congruity effects during visuospatial load trials, but not
during verbal load trials. Analysis of secondary recall tasks revealed better performance on the
verbal digits task than the visuospatial dots task (see Table 2). The main goal of the secondary
recall analysis, however, was to establish that each of our recall tasks influenced the targeted WM
system. Analysis reveals a relationship between recall accuracy on the visuospatial dots task with
participants’ Corsi block scores, but no reliable predictors were found for recall on the verbal
digits task.

Table 1. Proportion correct (SD) on discourse comprehension task in


Experiment 1.
Congruity Verbal Load Visuospatial Load
Congruent 0.92 (.08) 0.94 (0.06)
Incongruent 0.90 (.10) 0.90 (0.09)
284 WU, ET AL.

Figure 4. Response times for discourse comprehension task in Experiment 1 with a concurrent load on verbal versus visuospatial WM.

Table 2. Proportion correct (SD) on secondary recall tasks in


Experiment 1.
Congruity Digit Recall Dots Recall
Congruent 0.91 (.10) 0.77 (0.14)
Incongruent 0.89 (.10) 0.74 (0.16)

Discourse comprehension task


Accuracy
Logistic regression was used to predict accurate performance on the picture probe task with fixed
effects factors congruity and load modality, and random intercept terms for Subjects and Items. This
analysis revealed no significant effects (see S1 in supplementary materials for the full model output
with odds ratios). Overall accuracy rates on the discourse comprehension task were high (see Table 1).

Response times
Analysis of response times on the picture probe task was limited to trials in which participants
responded correctly on both the primary (picture probe) and secondary (recall) task. These data
were filtered for response times greater than 2.5 standard deviations from the mean, resulting in
removal of slightly less than 2% of the data. Figure 4 shows mean response times in all conditions
along with 95% confidence intervals and suggests that the congruity effect was smaller when the
secondary task involved sequences of dot locations (visuospatial load) versus digits (verbal load).1
Analysis involved the use of linear mixed effects models of responses during the digits and the dots
trials, respectively. Both models included fixed effects of congruity, and random intercept terms for
subjects and for items. Analysis of the digits trials revealed a significant effect of congruity, β = 219.8,
t = 3.56, p < .001, as did analysis of the dots trials, β = 127.4, t = 2.36, p < .05. The full models can be
found in S2 in supplementary materials. These models suggest that the congruity effect was approxi­
mately 1.7 times larger when the task involved verbal than visuospatial load.
DISCOURSE PROCESSES 285

Individual differences
Finally, we modeled the relationship between WM abilities and sensitivity to gesture through two
ordinary least squares regression models, one for each trial type. The dependent variable was the
magnitude of the gesture congruity effect on response times – that is, the difference in the average
response times for trials with congruent gestures from those with incongruent gestures – under either
a verbal or visuospatial load, and span scores on both the Corsi block and sentence span tasks served as
predictor variables. The model of congruity effects under visuospatial load (dots) was significant, F (2,
50) = 4.01, p < .05, with a multiple R-squared value of 0.138. The Corsi block was a significant
predictor, β = 87.25, t = 2.82, p < .01, whereas the sentence span was not, β = −34.62, t = −0.88. That is,
when the secondary memory task engaged visuospatial WM, sensitivity to speech-gesture congruity
(i.e., the size of the difference in response times to congruent versus incongruent gestures), was related
to visuospatial WM capacity. The higher a given participant scored on the Corsi block, the more the
participant benefited from the congruent gestures relative to incongruent gestures. A similar analysis
of the congruity effect under verbal load failed to reveal any relationship between the Corsi block and
sentence span predictor variables and participants’ sensitivity to speech-gesture congruity. The full
output of these linear models can be found in S3 in supplementary materials.

Recall task performance


Recall task performance in each experimental condition is described in Table 2. A trial was considered
correct if the participant correctly recalled all four items in the correct order.
Logistic regression was used to model correct recall on each trial as a function of recall task and
congruity. Random intercept terms were used for both subjects and items. This model revealed
a significant effect of recall task, β = −1.16, C.I. [−1.40, −0.91], z = −9.23, p < .001. The negative
coefficient in the model indicates correct recall was less likely on the dots task (see S4 in supplementary
materials for the full model output reported in terms of odds ratios).
To investigate individual differences in performance on the secondary recall tasks we collapsed
across the congruity manipulation to compute for each participant the percentage of correct trials on
each of the two recall tasks. Using ordinary least squares regression, the percentage of correct trials on
the verbal recall task was regressed against participants’ Corsi block and sentence span scores, with
each score treated as an independent predictor. The model was not significant, F(2, 50) = 1.55, p = .22.
Analogously, the percentage of correct trials on the visuospatial recall task was regressed against
participants’ Corsi block and sentence span scores. The model of visuospatial recall was significant, F
(2, 50) = 5.88, p < .01, with Corsi block as the only significant predictor, β = 1.26, t = 2.85, p < .01.
Participants’ performance on the visuospatial (dots) recall task was positively associated with their
Corsi block scores. The full models are presented in S5.

Discussion
Speech-gesture congruity effects were less pronounced with the concurrent visuospatial load task
relative to the verbal one, consistent with our suggestion that understanding iconic gestures recruits
visuospatial memory resources. According to our visuospatial resources hypothesis, the meaning of
iconic gestures is often difficult to discern until they can be mapped onto concepts evoked by the
speech. The visuospatial WM system is used to store gestural information until it can be matched and
integrated with verbally evoked concepts.
Participants’ ability to benefit from the information conveyed by gestures was manifested in the
present study by reliable congruity effects on the discourse comprehension task because participants
responded faster following videos with congruent gestures. Because there was greater overlap between
the processing demands of the discourse comprehension task and the visuospatial relative to the verbal
load task, the benefits of congruent gestures were more profoundly diminished when participants were
286 WU, ET AL.

tasked with remembering visuospatial versus verbal information. In other words, visuospatial
resources that may have been allocated to speech-gesture integration were instead absorbed by the
rehearsal of dot locations. Results of the present study are thus in keeping with Wu and Coulson
(2014), in which gesture congruity effects were also less evident in participants whose secondary task
involved visuospatial than verbal recall.
Because all participants in the present study performed both verbal and visuospatial secondary
memory tasks, observed results are less amenable to explanation by domain-general models of WM
that emphasize the role of executive attention in these phenomena. Domain general models suggest
performance on the primary task depends on participants’ ability to switch fluidly between the tasks,
and to suppress information that may interfere with a correct response. Such models incorrectly
predict similarly sized congruity effects with both verbal (digits) and visuospatial (dots) memory loads.
However, interpretation of Experiment 1 importantly depends on whether our secondary recall tasks
are comparable in the demands they place on executive functions, that is, that they differ primarily in
terms of their recruitment of domain-specific (verbal versus visuospatial) processing resources. We
turn to this issue in Experiment 2.

Experiment 2
In Experiment 1, we observed that participants were more sensitive to the information in co-speech
iconic gestures while concurrently performing the digits relative to the dots task. This result was
attributed to the fact that the digits task imposed a load on verbal WM, whereas the dots task
recruited visuospatial WM. Thus, the dots task reduced participants’ sensitivity to congruent
gestures because gesture comprehension itself requires visuospatial processing resources. By con­
trast, the verbal WM resources recruited by the digits task are less crucial for interpretation of iconic
gestures, and thus most participants were still sensitive to gesture congruity, even under conditions
of verbal load.
However, an alternative interpretation of the data from Experiment 1 is also possible. In this
skeptical account, the difference in primary task performance in Experiment 1 in dots versus digits
trials is not the results of an overlap in the domain-specific processing resources needed for each of our
three tasks (multi-modal discourse comprehension, memory for a series of digits, and memory for
a sequence of dot locations), but rather caused by an overlap in demands for domain-general (“central
processing”) resources. The crux of this claim is that the dots task interfered with gesture comprehen­
sion in Experiment 1 because it placed greater demands on the domain-general attentional system
than did the digits task. We shall call this account the inherent task asymmetry hypothesis, namely that,
relative to the digits task, the dots task would be expected to have a greater impact on the performance
of any concurrent task because of the demands it places on domain-general resources.
We find the suggestion that the visuospatial WM task was more difficult than the verbal WM task
rather unlikely in view of previous work in our laboratory. Wu and Coulson (2014) used these same
visuospatial and verbal load tasks in a dual-task paradigm in which the primary task involved
searching for a target letter in an array of distractors (viz., a visual search task). The visual search
task has previously been used in this way to compare the demands of concurrent load tasks by
evaluating how search time increases with increasing numbers of distractors, with the slope of this set-
size function serving as an index of the difficulty of the secondary task (Treisman & Gelade, 1980).
Critically, Wu and Coulson (2014) found similar slopes for the distractor set-size function in both
concurrent tasks, suggesting they place similar demands on executive function.
An even more compelling argument for the domain-specific resources hypothesis, however, would
be to demonstrate that for a primary task that places significant demands on verbal rather than
visuospatial WM, the dots task may be less disruptive than the digits task and thus associated with
the observation of larger response time effects on the primary task. Accordingly, in Experiment 2 we
paired our two recall tasks (dots and digits) with a primary task widely acknowledged to draw on verbal
DISCOURSE PROCESSES 287

WM resources: namely, the processing of English sentences with two different kinds of relative clauses.
In this task, we asked participants to answer comprehension questions about orally presented sentences
such as (1a), known as a “subject-subject” relative, and (1b), known as a “subject-object” relative.

(1a) The fireman who speedily rescued the cop sued the city over working conditions. (subject-subject relative)

(1b) The fireman who the cop speedily rescued sued the city over working conditions. (subject-object relative)

Although such sentences are similar in vocabulary and structure, subject-object relatives like (1b) are
more difficult to understand (see e.g., Ford, 1983; King & Just, 1991; Müller et al., 1997). Whereas the
precise explanation of the processing difficulty induced by subject-object relatives is a matter of debate
(c.f. Roland et al., 2012; Staub, 2010; Wells et al., 2009), a great deal of evidence suggests the difficulty of
understanding these sentences is impacted by the availability of WM resources (see Farmer et al., 2012;
Lewis et al., 2006 for review). We thus predict that participants will perform our sentence processing
task in Experiment 2 more quickly and accurately when the sentence involves a subject-subject relative
than a subject-object relative, and that success on this task will be related to their sentence span scores.
However, because Experiment 2 was intended primarily as a control study to explore whether the
dots task used in Experiment 1 is inherently more difficult than the digits task from that study, our
primary interest is less in performance on the sentence-processing task per se, but rather how the two
secondary recall tasks modulate sentence processing performance. If the results of Experiment 1 were
attributable to an inherent asymmetry in the difficulty of the two recall tasks, we would expect the dots
task to be more disruptive to sentence processing than the digits task. The inherent task asymmetry
hypothesis thus predicts that just as gesture congruity effects were more evident when paired with the
putatively less demanding digits task in Experiment 1, parsing demands effects should likewise be
more pronounced when paired with the digits task in Experiment 2.
By contrast, if our recall tasks are reasonably well matched for their demands on central
processing resources, as we have suggested, asymmetrical interference effects are expected to
arise because of domain-specific demands of the primary task. Consequently, the dots task
should attenuate experimental effects in primary tasks such as mental imagery that recruit
visuospatial resources, whereas the digits task should attenuate experimental effects in primary
tasks that recruit verbal resources. Accordingly, we may expect primary task performance in
Experiment 2 to be more impacted by the digits task, which draws more heavily on verbal
resources than the dots task that recruits visuospatial WM. On the domain-specific resources
hypothesis, then, response time effects of parsing demands that depend on the availability of
verbal WM would be expected to be smaller in digits trials that tax those verbal resources than
in the dots trials that tax visuospatial WM.

Methods
Participants
The study recruited 64 healthy adults (14 male) who participated for extra credit in their cognitive
science, linguistics, or psychology courses at University of California San Diego. The average age was
20.65 years (SD = 2.38). All were fluent English speakers, and all participants provided informed consent.
Following participation in the main experiment (i.e., the dual-task paradigm), participants were given
both the Corsi block and the sentence span tests. As in Experiment 1, the order of administration of
these two assessments was counterbalanced across participants. Corsi block scores ranged from
a minimum score of 5 to the maximum possible score of 9. The average score among our participants
was 7.43 (SD = 1.18). sentence span scores ranged from 1.5 (just above the minimum possible score of 1)
to the maximum possible score of 5. The average score among our participants was 3.55 (SD = 0.84).
288 WU, ET AL.

Materials
Materials for the primary (sentence processing) task were adapted from a study by Müller et al. (1997)
on the comprehension of relative clauses in English. They included 32 subject-subject relatives and 32
subject-object relatives read by a native speaker of American English. Each sentence was followed by
the presentation of a question about the actions of one of the animate nouns. For example, the subject-
object relative “The tourist, who the merchant angrily insulted, tossed the money onto the counter,”
was followed either by the question “Who tossed the money onto the counter?” or the question “Who
insulted someone?” Concurrent with the presentation of the question, two vertically arrayed boxes
were displayed, each containing a noun from the sentence. For example, in this example, alternatives
included “tourist” and “merchant.” Each participant heard the same 64 experimental stimuli. The use
of four stimulus lists ensured that each of the two comprehension questions was paired with alter­
natives in each box (viz. 50% the participants saw “tourist” in the top box, and 50% saw it in the
bottom box).

Trial structure
Secondary encoding task
As in Experiment 1, each trial began with a fixation cross (1 s), followed by the encoding phase of the
secondary task. As before, each dot remained visible on the grid for one second during dots trials, and
during digits trials, each approximately 500 ms sound file was followed by a 500-ms pause.
A 0.5-s pause concluded the encoding phase.

Primary sentence processing task


Presentation of each audio file was accompanied by the visual presentation of a fixation cross in the
center of the monitor. Following each sentence, the fixation cross was replaced by the comprehension
question accompanied by two answer boxes – one centered above the question and one below.
Participants signaled their response via a mouse click in the appropriate box, which then triggered
the onset of the secondary recall task. As in Experiment 1, no feedback was given on primary task
performance.

Secondary recall
Secondary recall proceeded exactly as in Experiment 1. As in Experiment 1, written feedback on
secondary recall accuracy (“Correct” or “Incorrect”) was displayed for 500 ms. Between trials, the
screen was blank for a 0.5 s and the mouse cursor was reset to a neutral, hidden position.

Procedure
Participants were told they would be listening to sentences while rehearsing secondary memory
items. As in Experiment 1, instructions began with an explanation of each kind of memory task –
the verbal load task, referred to as the “digits” trials, and the visuospatial task, referred to as the
“dots” trials. Participants were then told that each trial also involved an audio recording of a man
describing an event. Participants were asked to listen to each sound file, then to read the question,
and use the mouse to click the box that contained the answer. Participants were encouraged to
respond as quickly and accurately as possible on both the primary and secondary tasks. They were
also encouraged to either visually (for dots) or verbally (for digits) rehearse items to be remembered.
The dual-task portion of the experiment began after a short practice block of two dots trials and two
digits trials.
After completion of the dual-task portion of the experiment, verbal and visuospatial WM capacity
was assessed through two short tests – the sentence span task (Daneman & Carpenter, 1980) and the
Corsi block task (Milner, 1971) – as previously described.
DISCOURSE PROCESSES 289

Analysis
Data from one participant whose accuracy was close to floor was removed from the dataset. The
following analyses represent the remaining 63 participants. As in Experiment 1, we used logistic
regression to analyze accuracy scores and linear mixed effects regression to analyze response times.
Individual differences were explored with ordinary least squares regression, as outlined in the follow­
ing text.

Results
Analysis of the sentence processing task was intended first to demonstrate predicted parsing demands
effects, such that participants would respond more accurately and faster on the subject-subject than
the subject-object relatives. Moreover, because the sentence processing task is quite difficult, our
individual differences analysis explored whether sentence processing accuracy was predicted by our
independent assessment of visuospatial or (as predicted) verbal WM capacity. Critically, analysis of
response times on the sentence processing task was intended to compare parsing demands effects
under conditions of verbal versus visuospatial WM load. Although initial analysis of performance on
the secondary recall tasks also addressed whether parsing demands effects differed for each of the recall
tasks, the main objective of the analyses in the section on secondary recall was to establish whether
verbal recall performance was related to our independent measure of verbal WM capacity and whether
visuospatial recall was related to our independent measure of visuospatial WM capacity.

Sentence processing task


Accuracy
A correct trial on the sentence processing task was one for which the participant chose the correct
response to the comprehension question from a set of two alternatives. Chance performance was thus
50%. Mean proportion correct in each experimental condition is listed in Table 3. Table 3 shows that,
as expected, participants were more accurate on trials involving the subject-subject relative clauses
than the subject-object relative ones. Moreover, participants’ overall sentence processing task accuracy
rates were higher when the recall task involved visuospatial load in the dots task than the verbal load in
the digits task.
Accuracy on the sentence processing task was analyzed with logistic regression, using fixed effects
of parsing demands and load modality, and random intercept terms for subjects and for items. This
analysis revealed a reliable effect of load modality, β = 0.29, C.I. [0.08, 0.50], z = 2.7, p < .01, because
participants were more likely to correctly respond to the sentence processing task when their memory
task involved dot locations. We also observed a reliable effect of parsing demands, β = −0.36, C.I.
[−0.60, −0.12], z = −2.9, p < .01, as participants were less likely to respond correctly to the sentence
processing task following sentences with subject-object relative clauses. The complete model reported
in terms of odds ratios is presented in S6 in supplementary materials.

Individual differences
To explore the impact of individual differences in WM capacity on performance on the sentence
processing task, we collapsed across the parsing demands manipulation to calculate each participant’s
sentence processing task accuracy under conditions of verbal load and under conditions of

Table 3. Accuracy rates (SD) on the sentence processing task in


Experiment 2.
Parsing Demands Verbal Load Visuospatial Load
Subject-Subject 0.74 (.17) 0.78 (.14)
Subject-Object 0.67 (.15) 0.73 (.14)
290 WU, ET AL.

visuospatial load. Using ordinary least squares regression, sentence processing accuracy rates under
conditions of verbal load were regressed against participants’ Corsi block and sentence span scores.
The resulting model had an R-squared value of 0.20, F(2, 55) = 6.85, p < .01, although sentence span
was the only significant predictor, t = 3.43, p < .01 (see S7 for the full model). A similar regression of
sentence processing task accuracy rates under conditions of visuospatial load versus an additive
combination of participants’ Corsi block and sentence span scores had an R-squared value of 0.19,
F(2, 55) = 6.60, p < .01, with sentence span as the only significant predictor, t = 3.48, p < .01 (see S7 for
the full model). Thus, regardless of the secondary task, accurate responses on the sentence processing
task were positively associated with participants’ verbal WM capacity, with the best performance being
exhibited by participants with the greatest verbal WM capacity.

Response times
Analysis of response times for the sentence processing task included only trials in which participants
both responded correctly on the sentence processing task and correctly recalled all of the items
encoded at the outset of the trial. Response times were filtered for data more than 2.5 standard
deviations away from the mean, resulting in the removal of 2.43% of the data set. Figure 5 shows the
average response times in each condition along with 95% confidence intervals.2
Analysis involved the construction of separate linear mixed effects models of sentence processing
response times collected during each of the secondary recall tasks. These models included the fixed
effect of parsing demands (subject-subject, subject-object) and random intercept terms for subjects
and for items. The parsing demands effect was not significant in the model of sentence processing
responses times under conditions of verbal load, β = 106, t = 1.011, p = .3, although the estimate
suggests a non-significant trend for slower responses on questions about sentences with subject-object
relative clauses (full model presented in S8 in supplementary materials). In the model of visuospatial
load trials (see S8 for details), the parsing demands effect was significant, β = 255.7, t = 2.65, p < .01.
In sum, response times on the sentence processing task were faster when the concurrent WM task
involved memory for dot locations versus memory for a series of digits. During dots trials, which
involved a concurrent visuospatial load, sentence processing response times were faster for probes
following the subject-subject relatives versus the subject-object relatives. During digits trials, which
incurred a verbal load, response times were not reliably affected by the parsing demands manipula­
tion. Participants thus exhibited greater sensitivity to the parsing demands manipulation during the
dots trials, which imposed a visuospatial load, than during the digits trials, which involved a verbal
load.

Recall task performance


Recall on the two types of secondary task in each experimental condition are presented in Table 4.
A trial was considered correct if the participant correctly recalled all four items (either dots or digits)
in the correct order. Analysis involved the use of logistic regression to model accuracy on each trial
with fixed effects of memory task and parsing demands, and random intercept terms for subjects
and items. The model revealed a significant effect of memory task due to the lower probability of
correct recall on the dots task (β = −0.46, C.I. [−0.66, −0.27], z = −4.686, p < .001). The differing
direction of the parsing demands effects in the two tasks (see Table 4) was reflected in a significant
interaction between memory task and parsing demands (β = 0.43, C.I. [0.15, 0.70], z = 3.057,
p < .01). The full model reported in terms of odds ratios can be found in S9 in supplementary
materials. This interaction was followed up by separate logistic regression models of the two recall
tasks, each with a single fixed effect of parsing demands and random intercept terms for subjects and
items (see S10 for the full model outputs reported in terms of odds ratios). In the model of verbal
recall performance, parsing demands was not a significant predictor (β = −0.14, C.I. [−0.39, 0.11],
z = −1.12, p = .262). In the model of visuospatial recall, parsing demands only approached
significance (β = 0.32, C.I. [0.00, 0.65], z = 1.93, p = .054).
DISCOURSE PROCESSES 291

Figure 5. Response times during the sentence processing task in Experiment 2. Error bars show 95% confidence intervals.

Table 4. Proportion correct (SD) on secondary recall tasks in


Experiment 2.
Parsing Demands Digit Recall Dots Recall
Subject-Subject 0.69 (0.17) 0.60 (0.24)
Subject-Object 0.66 (0.20) 0.65 (0.22)

To investigate individual differences in performance on the secondary recall tasks, we collapsed


across the parsing demands manipulation to compute for each participant the percentage of correct
trials on each of the two recall tasks. Using ordinary least squares regression, this value was then
regressed against the sum of each participant’s Corsi block and sentence span scores. The resultant
model of recall performance on the digits task has an R-squared value of 0.148, F(2, 56) = 4.864,
p < .05. sentence span was the only significant predictor, β = 1.9, t = 2.2, p < .05, because participants
with higher scores on the sentence span task had higher accuracy rates on the digits task (see S11 in
supplementary materials for the full model output).
Analogously, the percentage of correct trials on the dots recall task was regressed against the sum of
each participant’s Corsi block and sentence span scores. The resultant model of recall performance on
the dots task had an R-squared value of .243, F(2, 56) = 9.01, p < .001. Corsi block was the only
significant predictor, β = 2.9, t = 4.18, p < .001 (see S11 for the full model output).
Consistent with our assumption that the digits recall task imposed a load on verbal WM, perfor­
mance on digits recall was positively associated with participants’ sentence span scores. Likewise,
consistent with our assumption that the dots recall task imposed a load on visuospatial WM,
performance on dots recall was positively associated with participants’ Corsi block scores.

Discussion
Results of Experiment 2 argue against inherent task asymmetry, supporting instead the domain-
specificity hypothesis. In Experiment 2, parsing demands effects were less pronounced with the
concurrent verbal than visuospatial load task. This contrasts with the findings in Experiment 1 in
292 WU, ET AL.

which speech-gesture congruity effects were less pronounced with the concurrent visuospatial versus
verbal load task. Moreover, whereas the size of speech-gesture congruity effects in Experiment 1 was
related to participants’ visuospatial WM capacity, parsing demands effects in Experiment 2 were
related to participants’ verbal WM capacity. In sum, participants’ performance on the primary
sentence processing task in Experiment 2 depended not on the demands that the dots versus digits
recall tasks placed on central processing resources, but rather on the demands each placed on the
cognitive resources required by the primary task. In Experiment 2, the effects of parsing demands
reflect the utilization of verbal resources and consequently were disrupted more by the digits task,
which recruits verbal WM.
Results of the present study replicate the findings in the sentence processing literature that English
sentences with the sorts of subject-object relative clauses tested here were more difficult to process
than comparable subject-subject relative clauses. Success on the sentence processing task was impacted
more by demands of a concurrent verbal than visuospatial load, consistent with memory-based
accounts of relative clause difficulty. Further, individual differences in success on the sentence
processing task were related to performance on separately administered tests of verbal, but not
visuospatial, WM capacity. This finding, too, supports language-related memory-based accounts of
relative clause difficulty. However, given that success on the sentence span task is no doubt related to
numerous factors, potentially including linguistic experience and statistical learning ability, we leave
open the possibility that WM constraints may not fully explain the demands posed by object-extracted
relative clauses (e.g., Frank et al., 2016). These data, nonetheless, suggest that sentence processing
demands were more closely related to verbal WM resources taxed by the digits task and assessed by
sentence span scores than by the visuospatial resources taxed by the dots task and assessed by Corsi
block.
Before moving on to the general discussion, we will review the assumptions that underlie our
interpretation of participants’ performance on the dual-task paradigm employed in the present study.
First, does the digits recall task recruit verbal WM resources? This idea was supported by the
systematic relationship between participants’ sentence span scores and their performance on digits
recall. Just as one might expect, the higher a participant scored on the sentence span task, the better the
participant performed on digits recall. Analogously, does the dots recall task recruit visuospatial WM
resources? This question was affirmed by the systematic relationship between participants’ Corsi block
scores and their performance on the dots recall task. The higher a participant scored on the Corsi block
task, the better the participant performed on the dots recall task.
Finally, was performance on the sentence-processing task in Experiment 2 related to partici­
pants’ verbal WM resources? Indeed, it was. Performance on the sentence-processing task was
better (faster and more accurate) when the secondary recall task diverted visuospatial resources
(i.e., the dots trials) than on digits trials, which diverted verbal WM. Moreover, accuracy on the
sentence-processing task was better in participants with higher scores on the sentence span task,
both when the secondary recall task involved digits, and when it involved dots. Thus, accurate
performance on the sentence-processing task was related to participants’ verbal but not visuospatial
WM capacity.

General discussion
For this study, we used the dual-task paradigm to evaluate the relative impact of verbal versus
visuospatial WM load on the comprehension of multimodal discourse. Because the meaning of iconic
gestures is often difficult to discern until they can be mapped onto concepts evoked by the speech, our
visuospatial resources hypothesis proposes that visuospatial WM is used to store gestural information
until it can be matched and integrated with verbally evoked concepts. Using a within-subject design,
Experiment 1 revealed that speech-gesture congruity effects were less pronounced when paired with
the concurrent visuospatial relative to the verbal load task, supporting our hypothesis that under­
standing iconic gestures recruits visuospatial memory resources.
DISCOURSE PROCESSES 293

Pairing the same two secondary recall tasks with a sentence processing task, Experiment 2 revealed
that parsing demands effects were less pronounced when paired with the concurrent verbal than
visuospatial load task. Thus, the visuospatial recall task led to greater attenuation of gesture congruity
effects, while the verbal recall task led to greater disruption of parsing demands effects. The reversal of
the relative impact of the secondary recall tasks in our two experiments undermines the idea that there
are marked differences in the demands that each task places on domain-general resources (viz. the
inherent task asymmetry hypothesis).
Rather, results of the present study suggest the magnitude of primary task effects in our dual-task
paradigm are driven by the overlap of resources taxed by the primary and the secondary tasks. In
Experiment 2, the verbal recall task suppressed parsing demands effects more than visuospatial recall
because the sentence processing task required verbal resources. Likewise, in Experiment 1, the
visuospatial recall task suppressed gesture congruity effects because the multimodal discourse task
required visuospatial WM. Considered together, the present study suggests that visuospatial WM plays
an important role supporting the understanding of iconic co-speech gestures and their message-level
significance.3

Visuospatial WM and iconic gestures


Our finding that visuospatial WM helps mediate multi-modal discourse comprehension is consistent
with existing research on speech-gesture integration (Hostetter et al., 2018). For example, Wu and
Coulson (2011) describe evidence suggesting that gestures are interpreted through image-based
semantic analysis – analogous to the manner whereby objects in a picture are discerned through the
analysis of contours and shapes. Additionally, it has been shown that listeners use information in
gestures to formulate spatially specific conceptual models of speaker meaning (Wu & Coulson, 2007).
For example, if a speaker says, “green parrot, fairly large,” while indicating in gesture the bird’s size and
location (perched on the speaker’s forearm), listeners find it easier to comprehend a pictorial depiction
of a green parrot perched on a forearm relative to a green parrot in a different location, such as a cage
(Wu & Coulson, 2010).
Grounded theories of language have advanced the view that mental simulations of this type are part
of everyday language comprehension and reasoning. Unremarkable sentences such as, “The ranger
saw the eagle in the sky,” have been shown to prompt faster categorization and naming of a matching
picture probe depicting an eagle in flight than a mismatched probe depicting an eagle in a nest (Zwaan
et al., 2002), as would be expected if listeners were mentally simulating visualizable aspects of the
sentence’s meaning.
Likewise, in tasks such as feature generation and property verification, participants’ responses have
been shown to be modulated by the implied perspective from which the cue is presented (see Barsalou,
2008 for a review). For example, participants generate internal features such as seeds much more
frequently when lexical cues denote objects whose internal structure is visible (e.g., half watermelon)
than occluded (e.g., watermelon; Wu & Barsalou, 2009). When prompted to conceptualize objects from
either an internal perspective (driving a car) or an external one (washing a car), adults have also been
shown to categorize parts of the object more rapidly when they agree with the cued perspective, such as
steering wheel and door handle for internal and external perspectives, respectively (Borghi et al., 2004).
In light of findings such as these, gestures may be viewed as material prompts or scaffolding that can
enhance mental simulation processes regularly performed by listeners. Indeed, Hostetter and Alibali
(2008) suggest that the production of gestures is the bodily manifestation of sensorimotor simulation
processes that underlie the speakers’ conceptualization of their messages. Here we suggest the
comprehension of gestures also activates sensorimotor aspects of conceptual structure relevant for
understanding the speaker’s message.
The present study is all the more relevant in its replication of the beneficial impact of congruent
gestures on the comprehension of multimodal discourse. This relevance is because many important
studies in this literature have used materials derived from simulated discourse (e.g., Kelly et al., 2010).
294 WU, ET AL.

Whereas simulated gestures typically occur as isolated, punctate events precisely synchronized with
the onset of a single related target word, spontaneous gestures are often repeated over the course of
a clause to reiterate salient features multiple times. For example, in our corpus the speaker describes
the shape of a ball microphone by moving his cupped hands together and apart several times, as
though along the surface of a sphere. As it is possible that gesture repetition supports semantic
integration in multimodal discourse, the prevalence of these repeated gestures in our corpus may
have led to the relatively large congruity effects observed here. How repeated gestures influence
speech-gesture integration is thus an interesting avenue for future research on this topic.
At other times, spontaneous gestures are produced in “multiplexes,” a rapid succession of gestures
that build upon each other to convey complex meanings (Kendon, 2004; see also McNeill, 2005 on
“catchments”; Pouw & Dixon, 2019 on “ensembles” in addition to a method for operationalizing this
construct). In our corpus, for example, the speaker says, “A small mug with coffee in it and a silver
spoon next to that,” while indicating the round opening of a cup with both of his hands. Next, while
holding his left hand in place to indicate the cup, he proceeds to point to the “coffee” inside, and then
to depict the shape and location of the silver spoon. Although the clips in the present study were short,
comprising 1–2 intonation units, and rarely included a complete sentence, observed congruity effects
suggest that even complex gestures with a somewhat indirect relationship to the accompanying words
can modulate comprehension. Of course, the present study leaves open whether the observed findings
extend to co-speech gestures that accompany more protracted stretches of discourse between
interlocutors.
In sum, data presented here suggest visuospatial WM resources are needed to fully benefit from
the information in iconic gestures. This discovery is consistent with the idea that co-speech iconic
gestures promote image-based simulations of the meaning of an utterance, at least for the descrip­
tions of concrete objects and actions employed in the present study (Alibali, 2005; Wesp et al.,
2001). Given that iconic gestures depict visual and spatial properties such as shape, size, and relative
position, it is perhaps unsurprising that listeners recruit visuospatial resources to relate information
conveyed in speech to visual information conveyed in the accompanying gestures. One critical issue
for future research is whether such findings extend to the gestures accompanying more abstract
topics.

Notes
1. Repeated measures analysis of variance (ANOVA) with factors congruity and load modality suggested that
pictures were classified more rapidly when primed by congruent than incongruent speech and gestures, congruity
F(1,52) = 26.4, p < .01, η G 2 = 0.02, and when the secondary task involved a visuospatial (dots) rather than a verbal
(digits) load, load modality F(1, 52) = 32.1, p < .01, ηG 2 = 0.04. These main effects were qualified by an
interaction, congruity × load modality F(1,52) = 4.04, p < .05, ηG 2 = 0.004. This outcome replicates an analogous
analysis in Wu and Coulson (2014).
2. Initial analysis of response times on the sentence processing task involved repeated measures ANOVA with
factors parsing demands and memory load. Analysis revealed an effect of parsing demands, F(1,62) = 6.8, p < .05,
ηG2 = 0.01, as responses were faster following subject-subject relative clauses, as well as an effect of memory load,
F(1,62) = 5.24, p < .05, ηG2 = 0.01, because responses were faster when recall involved visuospatial load, in addtion
to an interaction of the two factors, F(1,62) = 7.40, p < .05, ηG2 = 0.007.
3. In view of evidence that participants who gesture perform better in dual-task paradigms (e.g., Goldin-Meadow
et al., 2001), one may question whether participants’ strategy was to use their own body motion to help encode
the information on the secondary recall tasks. Although instructions involved encouraging participants to
rehearse digits verbally and dots visually, gesturing was not mentioned in any way. Experimenters observed
participants during practice trials and reported no evidence of a gestural strategy. Of course, participants were left
alone during the study itself, so it is entirely possible they used co-thought gestures (Chu & Kita, 2008, 2011,
2016). The extent to which participants engaged in spontaneous gestures (if ever), their relative frequency during
each of the memory tasks and across the two experiments, and its impact (if any) on task performance are all
interesting questions with relevance for the issues addressed in the present study. Unfortunately, absent data on
what participants were doing, these points will have to remain unanswered.
DISCOURSE PROCESSES 295

Acknowledgments
Thanks also to Bonnie Chinh for her contributions to data collection. A preliminary report on this project was presented
at the Annual Meeting of the Cognitive Science Society in 2015.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by an NSF grant to SC (#BCS0843946).

ORCID
Ying Choon Wu http://orcid.org/0000-0002-9382-8805
Horst M. Müller http://orcid.org/0000-0001-7790-3556
Seana Coulson http://orcid.org/0000-0003-1246-9394

References
Alibali, M. W. (2005). Gesture in spatial cognition: Expressing, communicating, and thinking about spatial information.
Spatial Cognition and Computation, 5(4), 307–331. https://doi.org/10.1207/s15427633scc0504_2
Ateş, B. Ş., & Küntay, A. C. (2018). Referential interactions of Turkish-learning children with their caregivers about
non-absent objects: Integration of non-verbal devices and prior discourse. Journal of Child Language, 45(1), 148–173.
https://doi.org/10.1017/S0305000917000150
Azar, Z., Backus, A., & Özyürek, A. (2019). General- and language-specific factors influence reference tracking in speech
and gesture in discourse. Discourse Processes, 56(7), 553–574. https://doi.org/10.1080/0163853X.2018.1519368
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects
and items. Journal of Memory and Language, 59(4), 390–412. https://doi.org/10.1016/j.jml.2007.12.005
Baddeley, A. (1992). Working memory. Science, 255(5044), 556–559. https://doi.org/10.1126/science.1736359
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing:
Keep it maximal. Journal of Memory and Language, 68(3), 255–278. https://doi.org/10.1016/j.jml.2012.11.001
Barsalou, L. W. (2008). Grounded cognition. Annual Review of Psychology, 59(1), 617–645. https://doi.org/10.1146/
annurev.psych.59.103006.093639
Borghi, A. M., Glenberg, A. M., & Kaschak, M. P. (2004). Putting words in perspective. Memory and Cognition, 32(6),
863–873. https://doi.org/10.3758/BF03196865
Butterworth, B. L., & Beattie, G. W. (1978). Gesture and silence as indicators of planning in speech. In R. N. Campbell &
P. T. Smith (Eds.), Recent advances in the psychology of language 4: Formal and experimental approaches (pp.
347–360). Plenum.
Chafe, W. (1987). Cognitive constraints on information flow. In R. S. Tomlin (Ed.), Coherence and grounding in
discourse (pp. 21–51). John Benjamins.
Chu, M., & Kita, S. (2008). Spontaneous gestures during mental rotation tasks: Insights into the microdevelopment of
the motor strategy. Journal of Experimental Psychology: General, 137(4), 706–723. https://doi.org/10.1037/
a0013157
Chu, M., & Kita, S. (2011). The nature of gestures’ beneficial role in spatial problem solving. Journal of Experimental
Psychology: General, 140(1), 102–116. https://doi.org/10.1037/a0021790
Chu, M., & Kita, S. (2016). Co-thought and co-speech gestures are generated by the same action generation process.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 42(2), 257–270. https://doi.org/10.1037/
xlm0000168
Chu, M., Meyer, A., Foulkes, L., & Kita, S. (2014). Individual differences in frequency and saliency of
speech-accompanying gestures: The role of cognitive abilities and empathy. Journal of Experimental Psychology:
General, 143(2), 694–709. https://doi.org/10.1037/a0033861
Conway, A. R., Kane, M. J., Bunting, M. F., Hambrick, D. Z., Wilhelm, O., & Engle, R. W. (2005). Working memory span
tasks: A methodological review and user’s guide. Psychonomic Bulletin & Review, 12(5), 769–786. https://doi.org/10.
3758/BF03196772
296 WU, ET AL.

Cook, S. W., Yip, T. K., & Goldin-Meadow, S. (2012). Gestures, but not meaningless movements, lighten working
memory load when explaining math. Language and Cognitive Processes, 27(4), 594–610. https://doi.org/10.1080/
01690965.2011.567074
Coulson, S., & Wu, Y. C. (2014). 148. Multimodal discourse comprehension. In C. Müller, A. Cienki, E. Fricke,
S. H. Ladewig, D. McNeill, & J. Bressem (Eds.), Body – language – communication. An international handbook on
multimodality in human interaction (Vol. 2, pp. 1922–1929). Mouton de Gruyter.
Daneman, M., & Carpenter, P. A. (1980). Individual differences in working memory and reading. Journal of Verbal
Learning and Verbal Behavior, 19(4), 450–466. https://doi.org/10.1016/S0022-5371(80)90312-6
Engle, R. W. (2002). Working memory capacity as executive attention. Current Directions in Psychological Science, 11(1),
19–23. https://doi.org/10.1111/1467-8721.00160
Farmer, T. A., Misyak, J. B., & Christiansen, M. H. (2012). Individual differences in sentence processing. In M. H. Spivey,
M. Joannisse, & K. McRae (Eds.), Cambridge handbook of psycholinguistics (pp. 353–364). Cambridge University
Press.
Ford, M. (1983). A method of obtaining measures of local parsing complexity throughout sentences. Journal of Verbal
Learning and Verbal Behavior, 22(2), 203–218. https://doi.org/10.1016/S0022-5371(83)90156-1
Frank, S. L., Trompenaars, T., & Vasishth, S. (2016). Cross-linguistic differences in processing double-embedded relative
clauses: Working-memory constraints or language statistics? Cognitive Science, 40(3), 554–578. https://doi.org/10.
1111/cogs.12247
Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge University
Press.
Gillespie, M., James, A. N., Federmeier, K. D., & Watson, D. G. (2014). Verbal working memory predicts co-speech
gesture: Evidence from individual differences. Cognition, 132(2), 174–180. https://doi.org/10.1016/j.cognition.2014.
03.012
Goldin-Meadow, S., Nusbaum, H., Kelly, S. D., & Wagner, S. (2001). Explaining math: Gesturing lightens the load.
Psychological Science, 12(6), 516–522. https://doi.org/10.1111/1467-9280.00395
Habets, B., Kita, S., Shao, Z., Özyurek, A., & Hagoort, P. (2011). The role of synchrony and ambiguity in speech–gesture
integration during comprehension. Journal of Cognitive Neuroscience, 23(8), 1845–1854. https://doi.org/10.1162/jocn.
2010.21462
Hostetter, A. B., & Alibali, M. W. (2007). Raise your hand if you’re spatial: Relations between verbal and spatial skills and
gesture production. Gesture, 7(1), 73–95. https://doi.org/10.1075/gest.7.1.05hos
Hostetter, A. B., & Alibali, M. W. (2008). Visible embodiment: Gestures as simulated action. Psychonomic Bulletin &
Review, 15(3), 495–514. https://doi.org/10.3758/PBR.15.3.495
Hostetter, A. B., Murch, S. H., Rothschild, L., & Gillard, C. S. (2018). Does seeing gesture lighten or increase the load?
Effects of processing gesture on verbal and visuospatial cognitive load. Gesture, 17(2), 268–290. https://doi.org/10.
1075/gest.17017.hos
Kelly, S. D., Özyürek, A., & Maris, E. (2010). Two sides of the same coin: Speech and gesture mutually interact to
enhance comprehension. Psychological Science, 21(2), 260–267. https://doi.org/10.1177/0956797609357327
Kendon, A. (2004). Gesture: Visible action as utterance. Cambridge University Press.
King, J., & Just, M. A. (1991). Individual differences in syntactic processing: The role of working memory. Journal of
Memory and Language, 30(5), 580–602. https://doi.org/10.1016/0749-596X(91)90027-H
Kita, S. (2000). How representational gestures help speaking. In D. McNeill (Ed.), Language and gesture (pp. 162–185).
Cambridge University Press.
Krauss, R. M., Chen, Y., & Chawla, P. (1996). Nonverbal behavior and nonverbal communication: What do conversa­
tional hand gestures tell us? In M. P. Zana (Ed.), Advances in experimental social psychology (Vol. 28, pp. 389–450).
Academic Press.
Leonard, T., & Cummins, F. (2011). The temporal relation between beat gestures and speech. Language and Cognitive
Processes, 26(10), 1457–1471. https://doi.org/10.1080/01690965.2010.500218
Lewis, R. L., Vasishth, S., & Van Dyke, J. A. (2006). Computational principles of working memory in sentence
comprehension. Trends in Cognitive Sciences, 10(10), 447–454. https://doi.org/10.1016/j.tics.2006.08.007
Marstaller, L., & Burianová, H. (2013). Individual differences in the gesture effect on working memory. Psychonomic
Bulletin & Review, 20(3), 496–500. https://doi.org/10.3758/s13423-012-0365-0
McNeill, D. (2005). Gesture and thought. University of Chicago Press.
Milner, B. (1971). Interhemispheric differences in the localization of psychological processes in man. British Medical
Bulletin, 27(3), 272–277. https://doi.org/10.1093/oxfordjournals.bmb.a070866
Morrel-Samuels, P., & Krauss, R. M. (1992). Word familiarity predicts temporal asynchrony of hand gestures and
speech. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18(3), 615–622. https://doi.org/10.
1037/0278-7393.18.3.615
Müller, H. M., King, J. W., & Kutas, M. (1997). Event-related potentials elicited by spoken relative clauses. Cognitive
Brain Research, 5(3), 193–203. https://doi.org/10.1016/S0926-6410(96)00070-5
Obermeier, C., & Gunter, T. C. (2014). Multisensory integration: The case of a time window of gesture–speech
integration. Journal of Cognitive Neuroscience, 27(2), 292–307. https://doi.org/10.1162/jocn_a_00688
DISCOURSE PROCESSES 297

Obermeier, C., Holle, H., & Gunter, T. C. (2011). What iconic gesture fragments reveal about gesture–speech integra­
tion: When synchrony is lost, memory can help. Journal of Cognitive Neuroscience, 23(7), 1648–1663. https://doi.org/
10.1162/jocn.2010.21498
Perniss, P. M., & Özyürek, A. (2015). Visible cohesion: A comparison of reference tracking in sign, speech, and
co-speech gesture. Topics in Cognitive Science, 7(1), 36–60. https://doi.org/10.1111/tops.12122
Ping, R., & Goldin-Meadow, S. (2010). Gesturing saves cognitive resources when talking about nonpresent objects.
Cognitive Science, 34(4), 602–619. https://doi.org/10.1111/j.1551-6709.2010.01102.x
Pouw, W., & Dixon, J. A. (2019). Entrainment and modulation of gesture–speech synchrony under delayed auditory
feedback. Cognitive Science, 43(3), e12721. https://doi.org/10.1111/cogs.12721
Pouw, W., & Dixon, J. A. (2020). Gesture networks: Introducing dynamic time warping and network analysis for the
kinematic study of gesture ensembles. Discourse Processes, 57(4), 301–319. https://doi.org/10.1080/0163853X.2019.
1678967
Roland, D., Mauner, G., O’Meara, C., & Yun, H. (2012). Discourse expectations and relative clause processing. Journal of
Memory and Language, 66(3), 479–508. https://doi.org/10.1016/j.jml.2011.12.004
Smithson, L., & Nicoladis, E. (2013). Verbal memory resources predict iconic gesture use among monolinguals and
bilinguals. Bilingualism: Language and Cognition, 16(4), 934–944. https://doi.org/10.1017/S1366728913000175
Smithson, L., & Nicoladis, E. (2014). Lending a hand to imagery? The impact of visuospatial working memory
interference upon iconic gesture production in a narrative task. Journal of Nonverbal Behavior, 38(2), 247–258.
https://doi.org/10.1007/s10919-014-0176-2
Staub, A. (2010). Eye movements and processing difficulty in object relative clauses. Cognition, 116(1), 71–86. https://
doi.org/10.1016/j.cognition.2010.04.002
Treisman, A. M., & Gelade, G. (1980). A feature-integration theory of attention. Cognitive Psychology, 12(1), 97–136.
https://doi.org/10.1016/0010-0285(80)90005-5
Wagner, S. M., Nusbaum, H., & Goldin-Meadow, S. (2004). Probing the mental representation of gesture: Is handwaving
spatial? Journal of Memory and Language, 50(4), 395–407. https://doi.org/10.1016/j.jml.2004.01.002
Wells, J. B., Christiansen, M. H., Race, D. S., Acheson, D. J., & MacDonald, M. C. (2009). Experience and sentence
processing: Statistical learning and relative clause comprehension. Cognitive Psychology, 58(2), 250–271. https://doi.
org/10.1016/j.cogpsych.2008.08.002
Wesp, R., Hesse, J., Keutmann, D., & Wheaton, K. (2001). Gestures maintain spatial imagery. The American Journal of
Psychology, 114(4), 591–600. https://doi.org/10.2307/1423612
Wu, L. L., & Barsalou, L. W. (2009). Perceptual simulation in conceptual combination: Evidence from property
generation. Acta Psychologica, 132(2), 173–189. https://doi.org/10.1016/j.actpsy.2009.02.002
Wu, Y. C., & Coulson, S. (2007). How iconic gestures enhance communication: An ERP study. Brain and Language, 101
(3), 234–245. https://doi.org/10.1016/j.bandl.2006.12.003
Wu, Y. C., & Coulson, S. (2010). Gestures modulate speech processing early in utterances. NeuroReport, 21(7), 522–526.
https://doi.org/10.1097/WNR.0b013e32833904bb
Wu, Y. C., & Coulson, S. (2011). Are depictive gestures like pictures? Commonalities and differences in semantic
processing. Brain and Language, 119(3), 184–195. https://doi.org/10.1016/j.bandl.2011.07.002
Wu, Y. C., & Coulson, S. (2014). Co-speech iconic gestures and visuospatial working memory. Acta Psychologica, 153(9),
39–50. https://doi.org/10.1016/j.actpsy.2014.09.002
Yu, H. T. (2015). Applying linear mixed effects models with crossed random effects to psycholinguistic data: Multilevel
specification and model selection. The Quantitative Methods for Psychology, 11(2), 78–88. https://doi.org/10.20982/
TQMP.11.2.P078
Zwaan, R. A., Stanfield, R. A., & Yaxley, R. H. (2002). Language comprehenders mentally represent the shapes of objects.
Psychological Science, 13(2), 168–171. https://doi.org/10.1111/1467-9280.00430

You might also like