Cognitive Science - 2023 - Tütüncü - When Gestures Do or Do Not Follow Language Specific Patterns of Motion Expression in

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

Cognitive Science 47 (2023) e13261

© 2023 Cognitive Science Society LLC.


ISSN: 1551-6709 online
DOI: 10.1111/cogs.13261

When Gestures Do or Do Not Follow Language-Specific


Patterns of Motion Expression in Speech: Evidence from
Chinese, English and Turkish
Irmak Su Tütüncü,a Jing Paul,b Samantha N. Emerson,c Murat Şengül,a,d
Melanie Knezevic,e Şeyda Özçalışkana
a
Department of Psychology, Georgia State University
b
Asian Studies, Agnes Scott College
c
Training, Learning, & Readiness Department, Aptima, Inc.
d
Department of Turkish Language Education, Nevşehir Hacı Bektaş Veli University
e
Department of Linguistics, University of Ottawa
Received 16 June 2021; received in revised form 23 January 2023; accepted 7 February 2023

Abstract
Speakers of different languages (e.g., English vs. Turkish) show a binary split in how they package
and order components of a motion event in speech and co-speech gesture but not in silent gesture. In
this study, we focused on Mandarin Chinese, a language that does not follow the binary split in its
expression of motion in speech, and asked whether adult Chinese speakers would follow the language-
specific speech patterns in co-speech but not silent gesture, thus showing a pattern akin to Turkish and
English adult speakers in their description of animated motion events. Our results provided evidence
for this pattern, with Chinese—as well as English and Turkish—speakers following language-specific
patterns in speech and co-speech gesture but not in silent gesture. Our results provide support for the
“thinking-for-speaking” account, namely that language influences thought only during online, but not
offline, production of speech.

Keywords: Motion event; Co-speech gesture; Silent gesture; Cross-linguistic differences; Spatial cog-
nition; Chinese

1. Introduction

Languages differ widely in their expression of different experiential domains, but to what
extent do these differences influence the way speakers think about such domains? As proposed

Correspondence should be sent to Irmak Su Tütüncü, Psychology Department, Georgia State University, 140
Decatur St SE, Atlanta, GA 30302, USA. E-mail: irmaksu.tutuncu@gmail.com
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 of 30 I. S. Tütüncü et al. / Cognitive Science 47 (2023)

by theorists of linguistic relativity (Sapir, 1961; Whorf, 1956), there are two distinct ways
language might influence cognition. One possibility is that language has a stronger, more per-
manent, and invariable effect on cognition that is observable both when speaking and when
not speaking (i.e., strong Sapir–Whorf hypothesis). Another possibility, however, is that lan-
guage might have more limited and variable effects on cognition (i.e., weak Sapir–Whorf
hypothesis). This latter possibility is reflected in the “thinking-for-speaking” account—
originally proposed by Slobin (1996, 2004)—that suggests that language might influence cog-
nition during online production of speech, but this effect may not be evident when not speak-
ing. With the goal to tease apart these two possibilities, in this study, we focused on motion
events—an experiential domain that shows systematic variability in its expression across dif-
ferent languages of the world (Talmy, 1985, 2000). We examined three distinct types of lan-
guages (Mandarin Chinese, English, and Turkish), which vary in both packaging of motion
components (conflating manner and path vs. separating manner and path of motion) and
ordering of motion components (Figure-Ground-MOTION vs. Figure-MOTION-Ground).
We asked whether the patterns of cross-linguistic variability in speech would become evident
in gesture when the gestures were produced with speech (i.e., co-speech gesture) versus when
gestures were produced without speech (i.e., silent gesture). If language influences thought
only during online production of speech—a possibility in line with the thinking-for-speaking
account, a weaker version of linguistic relativity—we would expect to observe similar pat-
terns of cross-linguistic variability in speech and co-speech gesture but not in silent gesture.
If, on the other hand, language influences thought both online and beyond online production
of speech (i.e., offline)—a possibility in line with stronger linguistic relativity—we would
expect to find evidence for patterns of cross-linguistic variability in both co-speech and silent
gesture.

1.1. Packaging and ordering of semantic elements in speech


Physical motion in space constitutes a core human experience (Talmy, 1985, 2000), but the
expression of motion varies considerably across different languages (Slobin, 2004). Talmy
(2000) proposed that the world’s languages can be categorized into two types based on how
speakers semantically package the two major components of motion: manner (i.e., style of
movement) and path (i.e., direction of movement). Speakers of satellite-framed languages (S-
languages; e.g., English, German, and Polish) use a conflated strategy in speech where they
express manner in the main verb (e.g., run) and path outside the verb, typically in a path par-
ticle or a preposition (e.g., into), within the bounds of a single clause (e.g., “Adam RUNS
INTO the house”). In contrast, speakers of verb-framed languages (V-languages; e.g., Turk-
ish, Spanish, and Japanese) use a separated strategy in speech where they opt to express
path in the main verb (e.g., enter) and manner in an additional subordinate clause (e.g.,
“Adam eve KOŞARAK GİRER” = Adam house-to by RUNNING ENTER; Allen et al.,
2007; Özçalışkan et al., 2016a, 2016b; Özçalışkan & Slobin, 1999). At the same time, the
preference to encode path inside or outside the verb influences the extent to which speakers
of each language type express manner or path components of motion. S-language speakers
have the option to express manner in the verb and path outside the verb, which allows them
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
I. S. Tütüncü et al. / Cognitive Science 47 (2023) 3 of 30

to express both manner and path in a single clause frequently in their descriptions of motion
(Ibarretxe-Antuñano, 2004; Özçalışkan, 2004, 2009; Özçalışkan & Slobin, 1999, 2000). Con-
trary to S-languages, V-language speakers use the main verb to express path and use subordi-
nate clauses to express manner of motion. Given the extra processing load such constructions
impose on production, V-language speakers are typically more likely to exclude manner from
their descriptions, focusing only on path of motion (Cardini, 2010; Choi & Bowerman, 1991;
Özçalışkan, 2015; Özçalışkan & Slobin, 1999, 2003; Özçalışkan et al., 2016a, 2016b). Such
patterns of expression are pervasive and even influence speakers’ expectations for how motion
will be processed at the neural level (Emerson, Conway, & Özçalışkan, 2020).
However, the binary classification into S- versus V-languages leaves out a third group of
languages, serial-verb languages, such as Mandarin Chinese (hereafter Chinese), which have
two main verb slots (one for manner and one for path) both expressed within a single clause,
with no explicit marking as to which one is the main verb (Chen & Guo, 2009). For instance,
in Chinese, in order to say “Adam runs into the house” typically two verbs are used, both
equal in standing: “pǎo” = run (manner verb) and “jìn” = enter (path verb; “Yàdāng PǍOJÌN
fángzi” = Adam RUN-ENTER house), resulting in difficulties in the classification of Chinese
as belonging to either of the two language types (Chen & Guo, 2009). In fact, the classification
of Chinese within Talmy’s framework (2000) has been a center of a debate some researchers
(e.g., Hsueh, 1989; Tai, 2003) argue that the main verb in a serial-verb construction is the path
verb, classifying Chinese as a V-language; while others (Chang, 2001; Chao, 1968) argue
that the main verb in a serial-verb construction is the manner verb and the path verb serves
as a path satellite, placing Chinese within the group of S-languages (see Paul, Emerson, &
Özçalışkan, 2022 for further discussion). A more recent and conceptually more convincing
approach, however, proposes a third possibility, suggesting that both verbs in a serial verb
construction are of equal status and each can be used independently without the other in a
clause (Chen & Guo, 2009; Slobin, 2004). According to this theory—originally proposed by
Slobin (2004)—Chinese might be neither a V-language nor an S-language, but instead belong
to a third category of languages, called “equipollently-framed languages (E-languages).”
Unlike S-languages, Chinese allocates comparable weight on both the manner and path
components of motion with the use of serial-verb constructions, in which manner and path
are frequently expressed in a compound verb (“pǎojìn” = run-enter), thus designating Chi-
nese as a good fit for the category of E-languages (Brown & Chen, 2013; Chen & Guo,
2009; Xu, 2013). This strategy differs from both English and Turkish, in which speakers
can express only one type of motion in the verb (typically manner for English and path for
Turkish). As such, the core component of a motion event (i.e., the verb) provides different
affordances to the speakers of each language type—from a manner-focus (English) to both
manner- and path-focus (Chinese) and to a path-focus (Turkish)—thus raising the possibility
of an in-between status for the classification of Chinese in its packaging of motion elements.
It is important to note here that the patterns of speech expression highlighted for each of the
three language types constitute preferred patterns (i.e., frequently used by native speakers).
However, speakers might also use motion descriptions that differ from the preferred patterns
(e.g., a conflated description by a Chinese or Turkish speaker using a manner verb followed
by a path satellite, a separated description by an English speaker using only a path verb), but
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 of 30 I. S. Tütüncü et al. / Cognitive Science 47 (2023)

at rates much lower than the preferred patterns (Özçalışkan et al., 2016a; Paul et al., 2022).
Accordingly, the classification of the languages into types is based on the preferred patterns
of speech expression that are habitually used by native speakers of each language (Slobin,
2004).
Another feature that shows systematic cross-linguistic variation as well as patterned regu-
larities in the expression of motion events is the ordering of motion elements. In speech, two
distinct word orders predominate 90% of the languages spoken around the world in describing
Actor-Action-Object relations (Dryer, 2013; see also Greenberg, 1963), namely, the Subject-
Object-Verb order (SOV; e.g., Turkish, Japanese, and Korean) and the Subject-Verb-Object
order (SVO; e.g., English, Russian, and Italian). The two distinct ordering patterns are closely
mirrored in the expression of motion events: with SVO language speakers (e.g., English) fol-
lowing the Figure-MOTION-Ground order (FMG; e.g., “Adam RUN to the house”) and SOV
language speakers (e.g., Turkish) following the Figure-Ground-MOTION order (FGM; e.g.,
“Adam eve KOŞAR” = Adam house-to RUN; Özçalışkan et al., 2016a, 2018). This pattern
was evident even in agglutinative languages (e.g., Turkish) that allows for flexibility in word
ordering (Özçalışkan et al., 2016a).
Unlike English and Turkish, the classification of Chinese in terms of its ordering patterns
remains somewhat controversial (Li & Thompson, 1974, 1975; Sun & Givon, 1985): earlier
research, largely based on descriptions of Actor-Action-Object relations, classified Chinese
as either following an SOV or an SVO order. Importantly, the categorization of “Object”
varied across different studies, with some studies only focusing on object marking in transitive
constructions (Sun & Givon, 1985), while others focusing on object marking in both transitive
and intransitive constructions (Li & Thompson, 1974, 1975). The latter set of studies further
argued that Chinese speakers follow SVO order when describing indefinite objects (e.g., “Tā
rēng le yí gè qiú guò qù” = He throw aspect-marker one-ball pass go) and SOV order when
describing definite objects (e.g., “Tā bǎ qiú rēng le guò qù” = He object-marker- the ball
throw aspect-marker pass go). This proposal was challenged in later work (Sun & Givon,
1985), which classified Chinese as a predominantly SVO language regardless of whether the
object was definite or indefinite, with 90% of the expressions encoding actor-action-object
relations with an SVO order. At the same time, this overall preference for the SVO order
is accompanied by small but consistent use of SOV ordering in some specific cases (Qu,
1994; Shyu, 2001; see also Levshina, 2019 for further discussion on word order variation in
Chinese). Thus, the word order classification of the Chinese language is still being debated
and there is no existing work that has yet systematically examined patterns of ordering in
motion event descriptions in Chinese. The relatively sparse work, particularly focusing on
object marking in motion events, however, suggests that Chinese speakers may also show
some variability in their ordering of the three motion elements (Jin, 2008). Speakers rely
on both the Figure-MOTION-Ground order, “Yí gè rén (F) pá-zhe jìn le (M) yí gè fángzi
(G)” = A person by crawling entered a house, and the Figure-Ground-MOTION order—
although less frequently—“Yàdāng (F) zài dìtǎn shàng (G) páxíng (M)” = Adam at carpet
on(top) crawl, rendering the classification of Chinese as belonging to either order type largely
inconclusive.
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
I. S. Tütüncü et al. / Cognitive Science 47 (2023) 5 of 30

Overall, linguistic expression of motion events in different languages shows strong but
systematic cross-linguistic variation, both in packaging and ordering of semantic elements
in speech (Goldin-Meadow, Özçalışkan & Slobin, 1999; Özçalışkan et al., 2016a, 2018;
Özyürek et al., 2008; Talmy, 2000). While Turkish (i.e., V-language) speakers mostly rely on
separated packaging strategy and Figure-Ground-MOTION ordering; English (S-language)
speakers prefer conflated packaging strategy and Figure-MOTION-Ground ordering. Chinese
falls somewhere between these two languages—with researchers classifying it as either an S-
language (Talmy, 2000), V-language (Hsueh, 1989; Tai, 2003), or an E-language (Slobin,
2004) in its packaging of semantic elements and as following either a Figure-Ground-
MOTION (Jin, 2008; Sun & Givon, 1985) or a mix of Figure-Ground-MOTION and Figure-
MOTION-Ground pattern (Syhu, 2001) in its ordering of semantic elements. The aforemen-
tioned packaging—but not ordering—patterns stem from the typological classification into
V-, S-, and E-languages. In this study, we included both dimensions of analysis to provide a
more comprehensive assessment of similarities and differences across both semantic content
(i.e., packaging) and semantic structure (i.e., ordering) in the expression of motion events in
Chinese, as compared to English and Turkish.

1.2. Packaging and ordering of semantic elements in co-speech gesture


Speakers use their hands frequently when verbally describing motion events, thus produc-
ing co-speech gestures (McNeill, 1992). Accordingly, and as put forth by several theories of
co-speech gesture production, gesture and speech form tightly integrated systems (e.g., de
Ruiter, 2000; Kita & Özyürek, 2003; McNeill, 1992). Some earlier theories of gesture-speech
integration (e.g., The Growth Point Theory, McNeill, 1992) suggest that gesture and speech
arise from a single underlying conceptual system that includes both a linguistic and a visu-
ospatial structure and that these two systems co-depend on each other in conveying intended
meanings. Some of the later theories of gesture-speech integration (e.g., Interface theory;
Kita & Özyürek, 2003) argue for two separate systems instead—namely, an action generator
for gesture and a message generator for speech; but they also put forth that these systems
work closely together from conceptualization to articulation in conveying intended meanings.
Based on both theories, gesture may follow one of two possible paths: it may convey seman-
tic content not available through speech by providing semantically related but supplementary
information; or it may mirror the information expressed in speech, augmenting the semantic
distinctions that speech draws upon. Thus, regardless of being part of a single or dual system,
as these theories suggest, co-speech gesture is largely shaped by the way the information is
organized in speech.
Earlier research on co-speech gestures accompanying motion event descriptions has shown
that co-speech gestures largely mirror the language-specific patterns observed in speech (see
Özçalışkan & Emerson, 2016 for a review). The one exception to this large body of work is
an earlier study by McNeill (2000), which showed that Spanish (V-language) speakers used
gestures to supplement motion components that they do not convey in speech (i.e., manner
gestures accompanying speech expressing path of motion), thus not mirroring the patterns
found in speech. However, apart from this work, most of the research comparing co-speech
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 of 30 I. S. Tütüncü et al. / Cognitive Science 47 (2023)

gestures in motion event descriptions in S- and V- languages suggests an augmentative role


for gestures in which gesture conveys information similar to speech (Kita & Özyürek, 2003;
Özçalışkan et al., 2016a, 2016b). Kita and Özyürek (2003) compared the speech and co-
speech gestures produced by English (S-language), Turkish, and Japanese (both V-languages)
adult speakers and found language-specific patterns in both speech and co-speech gestures in
each language group. Japanese and Turkish speakers were more likely to use separated co-
speech gestures to express either manner or path of motion (e.g., circling finger in place as
if rolling, moving index finger left to right to convey rightward trajectory), while English
speakers were more likely to use the conflated strategy, expressing both path and manner
simultaneously in their co-speech gestures (e.g., circling finger while moving it left to right
to convey rolling in rightward trajectory).
Özçalışkan and colleagues (Özçalışkan, Lucero, & Goldin-Meadow, 2016b, 2018)
extended this earlier work to blind speakers, asking whether the online effect of language-
specific speech patterns in co-speech gesture also becomes evident even in the absence
of visual access to language-specific gestures. Blind and sighted adult English and Turk-
ish speakers produced co-speech gestures (along with speech) while describing three-
dimensional motion scenes. Results showed that, while Turkish speakers primarily relied
on the separated packaging strategy in their co-speech gestures, encoding primarily path of
motion, English speakers relied largely on conflated gestures, combining manner and path
components into a single gesture—a pattern that remained identical for both sighted and
blind speakers. Other studies that expanded the applicability of packaging strategies to var-
ious other S- and V-languages showed similar patterns as well, with V-language speakers
producing more separated gestures (e.g., Gullberg, Hendriks, & Hickmann, 2008; Spanish:
Wieselman Schulman, 2004) and S-language speakers using more conflated co-speech ges-
tures (e.g., German; Lewandowski & Özçalışkan, 2018).
In contrast to considerable work on the packaging of semantic elements in co-speech ges-
ture for S- and V-languages, there is very little work that examines gestures produced by
speakers of E-languages, who show speech patterns characteristic of both S- and V-languages.
The few existing studies on the co-speech gestures produced by speakers of E-languages pro-
vide mixed results (Brown & Chen, 2013; Chui, 2012; Duncan, 2001). Duncan (2001), com-
paring gestures produced by Spanish (V-language) speakers to those produced by Chinese
(E-language) and English (S-language) speakers in an animated motion event description
task, found no differences: all speakers used similar amounts of gestures encoding manner
information regardless of language type and the extent of manner information in their speech.
In a later study, comparing the gestures produced by Chinese speakers to those of English and
Japanese (V-language) speakers in their description of animated motion scenes, Brown and
Chen (2013) found that Chinese and English speakers, even though they expressed manner
information in their speech at similar rates (and more than Japanese speakers), did not include
manner at similar rates in their gestures. More specifically, Chinese speakers rarely expressed
manner information in gesture if it was not expressed in their speech, whereas English speak-
ers conveyed manner in their gestures both when they were and when they were not expressing
manner in speech, resulting in more manner gestures in English compared to Chinese. The
inconclusive findings from the relatively sparse research on the co-speech gestures produced
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
I. S. Tütüncü et al. / Cognitive Science 47 (2023) 7 of 30

by Chinese speakers make it difficult to draw broader conclusions, highlighting the need for
future work.
Research on the ordering of semantic elements in co-speech gesture remains relatively
limited as well. The few existing studies provide evidence for the “thinking-for-speaking”
account by showing that the ordering of semantic elements in co-speech gesture largely
follows the ordering of semantic elements in speech in a given language. In an earlier study,
Goldin-Meadow, So, Özyürek, and Mylander (2008) asked English (S-language), Turkish,
and Japanese (V-languages) adult native speakers to describe scenes involving simple tran-
sitive actions with three key components: an actor, an action, and an object (e.g., woman
puts on hat, man swings pail) and found cross-linguistic differences in the placement of the
predicate in relation to actor and object: English speakers placed the predicate (i.e., action) in
the middle position, following the Actor-ACTION-Object order, while Turkish and Japanese
speakers placed the predicate in the final position, following the Actor-Object-ACTION order
in their co-speech gestures, thus mirroring the two distinct word order patterns seen in their
native speech—SVO versus SOV—in describing the same scenes. More recently, Özçalışkan
et al. (2016a) extended this work to intransitive events (i.e., motion events; e.g., woman
runs to house), also showing similar alignment between speech and co-speech gesture in the
placement of the predicate (i.e., motion) in the description of event scenes with three key
components: a figure, a motion, and a ground: English speakers were more likely to use the
Figure-MOTION-Ground order (e.g., place index finger in front of body and run it forward
to imaginary landmark) which mirrors the Figure-MOTION-Ground (i.e., SVO) order in
their speech. In contrast, Turkish speakers were more likely to use Figure-Ground-MOTION
order (e.g., place left cupped palm in front of body as if a landmark and move right index
finger right to left toward left hand), which corresponds to the Figure-Ground-MOTION
(i.e., SOV) order in their speech about motion. Thus, both sets of studies, one focusing
on transitive object placement events (Goldin-Meadow et al., 2008) and the other focusing
on intransitive motion events (Özçalışkan et al., 2016a) showed similar language-specific
patterns in the placement of the predicates in relation to the two other event components,
namely, actor/figure and object/ground, in both speech and co-speech gestures, thus showing
an effect of language in the ordering of co-speech gestures across different event types.
However, there is no research examining the ordering of semantic elements in co-speech
gesture among E-language speakers, such as Chinese.
In summary, packaging and ordering of semantic elements in co-speech gestures produced
by S- and V-language speakers show strong cross-linguistic differences, and these differences
mirror the differences found in speech, thus providing support for the “thinking-for-speaking”
account and the Interface theory of co-speech gesture production. Patterns of co-speech ges-
ture production remain relatively understudied for E-languages with the few existing studies
showing mixed findings, thus marking a significant gap in the literature.

1.3. Packaging and ordering of semantic elements in silent gesture


Gesture mirrors the cross-linguistic differences observed in speech when it is accompanied
by speech. Recent research took this one step further asking whether the effect of language on
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 of 30 I. S. Tütüncü et al. / Cognitive Science 47 (2023)

cognition (as assessed through gesture) can extend beyond online production of speech and
become evident “offline” when describing events without speech, using only gestures (i.e.,
silent gesture). Özçalışkan and colleagues (Özçalışkan, 2016; Özçalışkan et al., 2016a, 2018)
compared the co-speech and silent gestures produced by adult Turkish- and English-speaking
participants and showed evidence to the contrary. More specifically, they showed that even
though English and Turkish speakers differed in their co-speech gestures about motion, they
did not differ in their silent gestures. In fact, speakers of both languages almost exclusively
used the conflated strategy in their silent gestures, synthesizing manner and path components
into a single gesture. This effect has been shown across different groups, including sighted
and blind speakers (Özçalışkan, Lucero, & Goldin-Meadow, 2018), as well as monolingual
and bilingual speakers of English or Turkish (Özçalışkan, 2016). These results, thus, sug-
gest that the effect of cross-linguistic variation on the packaging of the semantic elements in
gesture appears only during online production of language (i.e., co-speech gestures), but not
during “offline” production (i.e., silent gesture). The packaging of semantic elements in silent
gesture has not been examined for E-language speakers. However, the strong cross-linguistic
similarities observed in the preferred packaging strategy in silent gesture in both S- and V-
languages raise the possibility for a natural semantic organization that humans might impose
on motion events when conveying them nonverbally, in silent gesture across all language
types, including E-languages.
The crosslinguistic similarities observed in the packaging of semantic elements for motion
events in silent gesture are also evident in the ordering of semantic elements across different
event types (i.e., action on objects and motion across space). In one study that involved the
description of simple transitive action scenes (e.g., captain swings pail), Goldin-Meadow et al.
(2008) examined how English, Turkish, Spanish, and Chinese speakers ordered transitive
event components when speaking and subsequently when not speaking. Using both gesture
and an additional picture sorting task (in which participants were asked to organize pictures
of actor, patient, and action), they found that all four languages used the same Actor-Patient-
ACTION order on both nonverbal tasks without speech, suggesting that the effect of language
does not extend beyond online production of speech in the case of simple transitive events.
Özçalışkan and colleagues (Özçalışkan et al., 2016b, 2018) extended this work by examining
the ordering of semantic elements specifically for motion events in silent gestures produced by
blind and sighted English and Turkish speakers and showed that all the participants gestured
in the same order when they gestured without speech. More specifically, across the different
samples of participants in each study, the English speakers abandoned the Figure-MOTION-
Ground (i.e., SVO) order that they used in speech and in co-speech gesture and replaced it
with Figure-Ground-MOTION (i.e., SOV) order in silent gesture, thus mirroring the pattern
found in Turkish speakers’ silent gestures.
In summary, as evidenced across several studies—one focusing on simple transitive events
(Goldin-Meadow et al., 2008) and others focusing on motion events in S- and V-languages
(Özçalışkan et al., 2016b, 2018)—the cross-linguistic differences observed in co-speech
gesture dissipate when speakers are asked to describe scenes without speech, only in silent
gesture. These results thus support “thinking-for-speaking” account and further suggest that
nonverbal representation of events in gesture without speech might elicit a natural order of
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
I. S. Tütüncü et al. / Cognitive Science 47 (2023) 9 of 30

semantic elements that humans prefer regardless of the language they speak (Goldin-Meadow
et al., 2008; Özçalışkan et al., 2016b).

1.4. Current study


Previous research shows that in languages that sit on opposite ends of the typological
continuum, such as English (S-language) and Turkish (V-language), patterns of gesture pro-
duction are influenced by language in co-speech gesture but not in silent gesture, and this
language effect (or its lack in silent gesture) becomes evident in both the packaging (con-
flated vs. separated) and ordering (Figure-MOTION-Ground vs. Figure-Ground-MOTION)
of semantic elements of a motion event (Özçalışkan et al., 2016a, 2016b, 2018). How-
ever, we do not yet know how language impacts patterns of gesture production when the
language sits in the middle of the typological continuum, as in the case of E-languages
(e.g., Chinese), where both S- and V-language structures are prominent, largely with the
use of serial verb constructions. In this study, we aim to answer this question by focusing
on the patterns of motion event descriptions in speech, co-speech gesture, and silent ges-
ture produced by adult Chinese speakers (E-language), comparing their production to the
speech and gestures produced by adult English (S-language) and adult Turkish (V-language)
speakers.
We asked three questions:
(1) We first asked whether Chinese, English, and Turkish speakers would differ in the
way they packaged and ordered motion elements in their speech about motion.
We expected that our results would replicate earlier work on speech about motion,
showing strong cross-linguistic differences between the three languages (Chen &
Guo, 2009; Kita & Özyürek, 2003; Özçalışkan et al., 2016a, 2016b, 2018; Slobin,
2004). More specifically, for speech production, we expected greater reliance on con-
flated packaging strategy and Figure-MOTION-Ground order in English and greater
reliance on separated packaging strategy and Figure-Ground-MOTION order in Turk-
ish. We also predicted that Chinese speakers would differ from both English and
Turkish speakers in packaging and ordering of semantic elements in their speech
about motion. We expected that Chinese speakers would use the conflated packaging
strategy and Figure-MOTION-Ground order less than English but more than Turk-
ish speakers; in contrast, we predicted that they would use the separated packaging
strategy and Figure-Ground-MOTION order more than English but less than Turkish
speakers.
(2) We next asked whether Chinese, English, and Turkish speakers would differ in
the way they packaged and ordered motion elements in their co-speech gestures
about motion. We expected co-speech gestures to follow the patterns observed in
speech—with greater use of conflated packaging and Figure-MOTION-Ground order
in English and, separated packaging and Figure-Ground-MOTION order in Turkish,
based on earlier work (Özçalışkan et al., 2016a, 2018). We also predicted that Chinese
speakers would differ from English and Turkish speakers in packaging and ordering
of semantic elements in their co-speech gestures about motion and mostly follow the
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 of 30 I. S. Tütüncü et al. / Cognitive Science 47 (2023)

patterns observed in their speech (Brown & Chen, 2013; Chui, 2012): Chinese speak-
ers would use the conflated packaging strategy and Figure-MOTION-Ground order
less than English but more than Turkish speakers; as a corollary to this, we predicted
that they would use the separated packaging strategy and Figure-Ground-MOTION
order more than English but less than Turkish speakers.
(3) We last asked whether Chinese, English, and Turkish speakers would differ in the
way they packaged and ordered motion elements in their silent gestures about motion.
Based on earlier work showing no cross-linguistic difference in silent gestures of S-
and V-language speakers (e.g., Özçalışkan et al., 2016a, 2018), we predicted that
speakers of all three languages would rely on the conflated packaging strategy and
Figure-Ground-MOTION order in their silent gestures about motion, showing no
cross-linguistic differences.
Overall, if the predictions were supported, our findings would provide further evidence
for the “thinking-for-speaking” account (Slobin, 1996), which states that language influences
thought, but only during online production of speech (i.e., co-speech gesture), but not when
thinking “offline” (i.e., silent gesture), and the Interface Theory (Kita & Özyürek, 2003),
which states that speech and co-speech gesture form a tightly integrated system in production
across different languages of the world.

2. Methods

2.1. Participants
Participants included 60 adult speakers, either with Chinese (n = 20, Mage = 19.55 [SD
= 1.36], range = 18–23, 10 females, 10 males), English (n = 20, Mage = 18.95 [SD =
1.10], range = 18–22, 13 females, 7 males), or Turkish (n = 20, Mage = 20.8 [SD = 1.76],
range = 18–24, 10 females, 10 males) as their native language. Data from the Chinese,
English, and Turkish participants were collected in Jingzhou City Hubei Province (China),
Atlanta (USA), and Istanbul (Turkey), respectively, by native speakers in each language. The
sample size was based on a similar earlier study that showed that n = 20 per group was
adequate to detect reliable effects at p < .05 (Özçalışkan et al., 2016b). The majority of the
participants in each language had some knowledge of another language, but none of the par-
ticipants were fluent in a second language. The speakers of each language were comparable
in education: the majority were either college students or recent college graduates. The par-
ticipants were compensated by either course credit or small monetary compensation for their
participation in the study.

2.2. Procedure for data collection


Each participant was interviewed individually by a native speaker in each language, using
an animated motion description task, originally developed by Özçalışkan (2016). At the
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
I. S. Tütüncü et al. / Cognitive Science 47 (2023) 11 of 30

Table 1
List of the animated motion event scenes

Event description Type of PATH Type of MANNER


Set 1
Crawl into a house INTO CRAWL
Flip over a beam OVER FLIP
Run out of a house OUT OF RUN
Crawls toward a carpet TOWARD CRAWL
Set 2
Climb into a treehouse INTO CLIMB
Jump over a hurdle OVER JUMP
Fly out of a trashcan OUT OF FLY
Climb toward a treehouse TOWARD CLIMB
Set 3
Tumble into a trashcan INTO TUMBLE
Crawl over a carpet OVER CRAWL
Crawl out of a house OUT OF CRAWL
Flip toward a beam TOWARD FLIP
Set 4
Run into a house INTO RUN
Jump over a cat OVER JUMP
Tumble out of a treehouse OUT OF TUMBLE
Crawl toward a house TOWARD CRAWL

beginning of the interview, participants were introduced to an animated cartoon character


named Adam, who performed the motion events in the animations. Each participant was then
presented with 16 animated motion events with various manners and paths in relation to dif-
ferent landmarks, one at a time (see Table 1 for a list of the 16 animated motion events).
The participants were asked to describe the events in two different ways: (1) with speech in
co-speech gesture condition (“You’re going to watch short video clips. Describe each video
after each clip. Make sure to use your hands as naturally as possible and also include the
landmark in your description.”), and (2) without speech in silent gesture condition (“Now
you’re going to see the same short video clips one more time. This time, I want you to only
use your hands while describing the videos without using any speech.”). To avoid influencing
the naturalness of co-speech gestures, participants first described the 16 animated clips in the
co-speech gesture condition and then in the silent gesture condition. However, the presenta-
tion order of the 16 animated motion clips was counterbalanced across participants within
both the co-speech gesture and silent gesture conditions, with each participant completing
the descriptions in one of four different set orders in both conditions—following procedures
in earlier work (Özçalışkan, 2016; Özçalışkan et al., 2016a). Each participant completed two
practice trials prior to describing the scenes in the co-speech gesture and the silent gesture
condition to familiarize them with the demands of the task; these trials were not included in
the analysis. Our decision to explicitly request the inclusion of landmark (i.e., Ground) in the
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
12 of 30 I. S. Tütüncü et al. / Cognitive Science 47 (2023)

responses was to elicit descriptions of motion events that encoded various types of path (out
of, over, into, and toward) accompanied by different types of manner (e.g., running, crawling,
and flipping) to provide a more representative sample of motion event descriptions across the
three languages. The data included here were also part of a larger project that included sev-
eral other tasks examining cross-linguistic patterns in speakers’ language abilities (i.e., novel
word learning task and narratives). Data collection for these other tasks was sandwiched
between the co-speech gesture condition and the silent gesture condition. As such, the silent
gesture condition did not immediately follow the co-speech gesture condition, thus preventing
an immediate effect of co-speech gesture on silent gesture patterns.

2.3. Transcription, coding, and reliability


All speech produced in the co-speech gesture condition was transcribed and segmented into
sentence units by native speakers in each language. A sentence unit was defined as a segment
of speech that contained at least one verb (or a serial verb) and its associated arguments
and subordinate clauses, following earlier work (Özçalışkan et al., 2016a; e.g., “Adam RUNS
INTO a house”; “Adam eve KOŞARAK GIRER” = Adam house-to by RUNNING ENTERS;
“Yàdāng PAO JÌN fángzi” = Adam RUN-ENTER house). The decision to treat the two verbs
in a serial verb construction in Chinese as part of a single sentence unit was based on earlier
work that considered a serial verb construction as a syntactic unit that forms a single clause
(e.g., Givon, 2009; Li & Thompson, 1989; Paul et al., 2022). All gestures that accompanied
each sentence unit in the co-speech gesture condition and that were produced on their own in
the silent gesture condition were also coded. Gesture was defined as a communicative hand or
body movement that conveyed meaning; only gestures that indicated or characterized motion,
figure, or the ground associated with the animated stimulus scenes were coded, following
earlier work (Özçalışkan, 2016).
Each sentence unit was further coded for the packaging of semantic elements and for the
ordering of semantic elements, separately for speech, co-speech gesture, and silent gesture,
following earlier work (Özçalışkan et al., 2016a, 2018). For the packaging of semantic ele-
ments, each sentence unit was coded as either conflated (i.e., manner and path are both
described within a single clause or within a single gesture) or separated (i.e., manner and
path are described in separate clauses or in separate gestures). A sentence unit was classified
as separated if it contained manner-only (e.g., “he RUNS,” “Adam KOŞAR” = he RUNS,
“Tā PǍO” = he RUN), path-only (e.g., “he ENTERS the house,” “Adam evE GİRER” = he
house-TO ENTER, “Tā JÌN fángzi” = he ENTER house), or manner and path that were con-
veyed in two separate clauses (e.g., “he enters the house by running,” “Adam eve KOŞARAK
GİRER” = Adam house-to by RUNNING ENTERS, “Ta PǍO-zhe JÌN fángzi” = “he RUN-
durative marker (by running) ENTER house”). The less frequently used separated strategies
in speech included neutral verbs (e.g., move) with either a manner adjunct or a path satellite in
English, Turkish, and Chinese (e.g., “Adam MOVES rapidly”; “Adam MOVES TOWARD the
house”; “Adam eve dogru HAREKET EDER” = Adam house-to toward MOVE; “Tā ZUÒ
le yí gè kōngfān” = he DO aspect-marker a flip), serial verbs with two manner verbs or two
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
I. S. Tütüncü et al. / Cognitive Science 47 (2023) 13 of 30

path verbs in Chinese (e.g., “zhè gè rén BĒN-PǍO” = this person RUN-RUN; “Rén DÌAO-
XIÀ lái le” = person FALL-ASCEND come aspect-marker). The conflated strategy in speech
was largely similar across the three languages, with a manner verb followed by a path satel-
lite (e.g., “Adam RUNS TOWARD the house”; “Adam evin İÇİNE DOĞRU KOŞUYOR” =
Adam house-possessive INSIDE TOWARD RUNNING; “Yàdāng WǍNG fángzi-lı̌ PǍO” =
Adam TOWARD house-inside RUN); Chinese speakers also used serial manner + path verbs
with or without path satellites in their conflated descriptions.
A gesture was classified as separated if it contained manner only (e.g., wiggling fingers in
place rapidly as if running), path only (e.g., moving index finger left to right as if conveying
rightward trajectory), or manner and path conveyed in two separate gestures (e.g., wiggling
fingers in place as if running then moving index finger left to right as if conveying rightward
trajectory). A sentence unit or gesture was classified as conflated if it synthesized manner
and path into a single clause (e.g., “Adam RUNS INTO the house”; “Adam ev-E KOŞAR” =
Adam house-TO RUN; “Yàdāng PǍOJÌN fangzi” = Adam RUN-ENTER house) or a single
gesture (e.g., wiggling fingers left to right as if running left to right; see Fig. 1).
For the ordering of semantic elements, each sentence unit was coded as either Figure-
Ground-MOTION or Figure-MOTION-Ground, according to the placement of the primary
motion element (i.e., predicate) in relation to the ground. The primary motion element was
either the single main verb or the serial verb string (for Chinese) of the sentence unit in speech
and the gestural element conveying motion in gesture (manner only, path only, manner-path
sequential, manner + path conflated gesture) within a single sentence unit (Özçalışkan et al.,
2018). For example, when describing the figure’s running motion into a house, if a participant
placed his right middle and index finger on the right (i.e., Figure) and left palm on the left (i.e.,
Ground), then moved his right finger toward the left palm (i.e., MOTION) all within a sen-
tence unit, the gesture sequence in the sentence unit was coded as Figure-Ground-MOTION.
If, however, the participant placed her finger on the right (i.e., Figure), wiggled it right to
left (i.e., MOTION), and then placed left cupped hand on the left (i.e., Ground), the gesture
sequence in the sentence unit was coded as following the Figure-MOTION-Ground order
(see Fig. 2). It is important to note here that speakers of all three languages frequently omit-
ted the Figure in their co-speech and silent gestures. For these gesture strings, we relied on the
ordering of the gestures for Ground and Motion for coding: gesture strings that followed the
MOTION-Ground order were classified as Figure-MOTION-Ground and gesture strings that
followed the Ground-MOTION order were classified as Figure-Ground-MOTION, following
earlier work (Özçalışkan et al., 2016a, 2018). To capture all observed instances of ordering—
with or without explicit marking of Figure—hereafter we place Figure in parentheses and
refer to ordering types as (Figure)-MOTION-Ground and (Figure)-Ground-MOTION.
The coding of ordering in gesture required the production of two or more gestures (i.e.,
a gesture string) that conveyed either one of the two orders: (Figure)-Ground-MOTION
or (Figure)-MOTION-Ground in a sentence unit. Most of the sentence units in the co-
speech gesture (75%) and some of the sentence units in the silent gesture (23%) condi-
tion included only one gesture across the three languages, thus were not included in the
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
14 of 30 I. S. Tütüncü et al. / Cognitive Science 47 (2023)

CO-SPEECH GESTURE
(a1 ) ENGLISH (b1 ) CHINESE (c1 ) TURKISH

SILENT GESTURE
(a2 ) ENGLISH (b2 ) CHINESE (c2 ) TURKISH

Fig. 1. Example stimulus scene of a figure climbing into a treehouse (top) and its description in co-speech gesture
(a1 , b1 , c1 ,) and silent gesture (a2 , b2 , c2 ) by speakers of English (A pictures on left), Chinese (B pictures in the
middle), and Turkish (C pictures on right). In co-speech gesture, native speakers of English and Chinese preferred
to express manner (circling fists) and path (upward trajectory) simultaneously in a single gesture (a1 -b1), and
native speakers of Turkish preferred to express path (leftward trajectory) by itself, omitting manner (c1 ). In silent
gesture, native speakers of English, Chinese, or Turkish preferred to express manner and path simultaneously
within a single gesture by circling fists upward (a2 , b2 , c2 ).
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
I. S. Tütüncü et al. / Cognitive Science 47 (2023) 15 of 30

CO-SPEECH GESTURE
(a1 ) ENGLISH (b1 ) CHINESE (c1 ) TURKISH

MOTION GROUND GROUND MOTION GROUND MOTION


‘climb.into’ ‘treehouse’ ‘treehouse’ ‘climb.into’ ‘treehouse’ ‘climb.into’
(Figure)-MOTION-Ground (Figure)-Ground-MOTION (Figure)-Ground-MOTION

SILENT GESTURE
(a2 ) ENGLISH (b2 ) CHINESE (c2 ) TURKISH

GROUND MOTION GROUND MOTION GROUND MOTION


‘treehouse’ ‘climb.into’ ‘treehouse’ ‘climb.into’ ‘treehouse’ ‘climb.into’

(Figure)-Ground-MOTION (Figure)-Ground-MOTION (Figure)-Ground-MOTION

Fig. 2. Example stimulus scene of a figure climbing into a treehouse (top) and its description in co-speech gesture
(a1 , b1 , c1 ,) and silent gesture (a2 , b2 , c2 ) by speakers of English (A pictures on left), Chinese (B pictures in the mid-
dle), and Turkish (C pictures on right). In co-speech gesture, some English speakers preferred to express motion
(climb.into) first, followed by ground (treehouse); Chinese and Turkish speakers preferred to express ground (tree-
house) first, followed by motion (climb/move into). In silent gesture, English, Chinese, and Turkish speakers
preferred to express ground (treehouse) first, followed by motion (climb.into).

analyses. Some sentence units included multiple gestures but with unclear orders; for exam-
ple, (Figure)-Ground-Motion-Ground; Motion-Ground-(Figure)-Motion-Ground—a pattern
that was less frequently observed in silent gesture (34% of sentence units) than in co-
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
16 of 30 I. S. Tütüncü et al. / Cognitive Science 47 (2023)

speech gesture (47% of sentence units) across the three languages. All such instances were
excluded from the ordering analysis. In addition, we observed a few instances of the Ground-
Figure-MOTION ordering in silent gesture (∼9% of sentence units across languages); we
excluded these few instances from the analysis as well, given their limited usage in all three
languages. The packaging analysis, on the other hand, did not have this constraint; even
if a response included a single gesture conveying motion information—be it path-only,
manner-only, manner-path, or manner + path, that sentence unit was included in the
analysis.

2.4. Scoring and analyses


We computed the total number of sentence units with separated versus conflated motion
packaging strategy and the total number of sentence units with (Figure)-MOTION-Ground
versus (Figure)-Ground-MOTION order that each speaker produced in speech, co-speech
gesture, and silent gesture, separately in each language. The independent variables included
the language (English, Turkish, and Chinese) and either type of packaging (Conflated vs.
Separated) or type of ordering—(Figure)-Ground-Motion versus (Figure)-Motion-Ground—
all categorical variables. The dependent variables were the frequency of sentence units with
conflated versus separated descriptions and the frequency of sentence units with (Figure)-
Ground-MOTION versus (Figure)-MOTION-Ground orders produced by speakers of each of
the three languages in speech or in gesture—all continuous variables. We analyzed differ-
ences using two-way mixed ANOVAs, with language as a between-subjects factor (English,
Turkish, and Chinese) and either packaging type: separated versus conflated or ordering type:
(Figure)-MOTION-Ground versus (Figure)-Ground-MOTION as within-subject factors, sep-
arately for speech, co-speech gesture, and silent gesture. In two of the analyses—frequency
of sentence units with each of the two orders in co-speech gesture and in silent gesture—the
assumptions of the ANOVA were violated (i.e., normal distribution of data). For these two
analyses, we first tested differences with ANOVAs that would allow us to observe interac-
tion effects, along with main effects. However, we also followed up these two analyses with
nonparametric tests (i.e., Kruskal–Wallis) that do not require normal distribution of data to
confirm that the patterns of similarities and differences that we detected with the ANOVAs
remained the same.
All gestures and speech produced by speakers in each language were coded by native
speakers of the language, who were trained for speech and gesture coding. Intercoder reli-
ability was established by additional coders in each language—also native speakers—who
coded a randomly selected subsample of 20% of the speech, co-speech gestures, and silent
gestures in each language. Agreement between coders was 97% for identifying gestures, 99%
for describing gesture form (i.e., hand shape of gesture), and 97% and 98% for coding motion
elements in speech and gesture, respectively.
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
I. S. Tütüncü et al. / Cognitive Science 47 (2023) 17 of 30

3. Results

3.1. Packaging of semantic elements


3.1.1. Speech
Motion descriptions in speech showed a main effect of Packaging (F(1, 57) = 14.54,
p < .001, ɳ2 p = 0.20; Mconflated = 11.86, SE = 0.43 vs. Mseparated = 9.43, SE = 0.38), a
main effect of Language (F(2, 57) = 16.82, p < .001, ɳ2 p = 0.37)—with a greater amount of
motion descriptions in Chinese (M = 12.72, SE = 0.45) than in English (M = 9.32, SE = 0.45,
Bonferroni, p < .001) and Turkish (M = 9.88, SE = 0.45, Bonferroni, p < .001)—but more
importantly, a significant interaction between Language and Packaging (F(2, 57) = 115.13,
p < .001, ɳ2 p = 0.80). As predicted, both English and Chinese speakers produced more con-
flated (70% for Chinese and 72% for English) than separated responses (Bonferroni, ps <
.001). In their conflated expressions, English speakers exclusively used manner verbs with
path satellites (100%; e.g., He RUNS INTO house) and Chinese speakers relied primarily on
manner verb + path verb serial verb constructions (86%)—with or without additional path
satellites (e.g., “Yí gè rén CÓNG lóutı̄-shàng TIÀO-XIÀ lái” = A person FROM stairs-top
JUMP-DESCEND come; “Tā PǍOJÌN fángzi” = Adam RUN-ENTER house)—and manner
verb + path satellite constructions (13%; e.g., “Tā WǍNG fángzi-lı̌ PǍO” = He TOWARD
house INSIDE RUN).
Turkish speakers, on the other hand, showed the opposite pattern, using more separated
than conflated packaging strategies (78% vs. 22%; Bonferroni, p < .001). In their separated
descriptions, Turkish speakers described the same scenes by expressing only path (27%; e.g.,
“Adam eve GİRİYOR” = Adam is house-TO ENTERING), only manner (8%; e.g., “Adam
KOŞUYOR” = Adam is RUNNING), or path in one clause and manner in a subordinate
clause (65%; e.g., “Adam eve KOŞARAK GİRİYOR” = Adam house-TO by RUNNING
ENTERING; see Fig. 3a).
Importantly, Chinese speakers showed an in-between pattern for separated expressions,
producing more than English but fewer than Turkish speakers (Bonferroni, ps < .001)—a pat-
tern that did not extend to conflated expressions. In fact, contrary to our prediction, Chinese
speakers produced more conflated responses in speech than English speakers (Bonferroni,
p < .001), but did so by relying primarily on serial verb constructions. These findings thus
show that Chinese speakers follow patterns different from both English and Turkish speak-
ers in their relative use of different packaging strategies in speech, but the predicted in-
between pattern becomes only evident in their relative production of the separated packaging
strategy.

3.1.2. Co-speech gesture


Motion descriptions in co-speech gesture largely mirrored the patterns observed in speech,
with a main effect of Packaging (F(1, 57) = 12.06, p = .001, ɳ2 p = 0.18; Mconflated = 8.53, SE
= 0.50 vs. Mseparated = 5.87, SE = 0.41), no effect of Language (F(2, 57) = 2.68, p = .08),
but more importantly, a significant interaction between Language and Packaging (F(2, 57) =
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
18 of 30 I. S. Tütüncü et al. / Cognitive Science 47 (2023)

23.14, p < .001, ɳ2 p = 0.45). English and Chinese speakers both produced more conflated
than separated responses (Bonferroni, p = .001, p < .001, respectively) by synthesizing man-
ner and path into the same gesture (e.g., wiggling fingers while moving them left to right as
if running into; English: 68%, Chinese: 78%). Turkish speakers showed the opposite pattern,
producing more separated than conflated responses in gesture (65% vs. 34%; Bonferroni, p =
.001). Their separated strategy included gestures that expressed only path (12%; e.g., moving
both hands left to right as if entering), only manner (74%; e.g., circling fists rapidly next to
body as if running), or sequential gestures for manner and path but within the bounds of a
sentence unit (14%; e.g., first moving both arms up and down with fisted hands next to body
as if running then moving both hands left to right as if entering; see Fig. 3b).
Importantly, Chinese speakers did not differ from English speakers in their production
of either separated (Bonferroni, p = .82) or conflated (Bonferroni, p = .32) gestures, but
they did differ from Turkish speakers, producing more conflated but fewer separated gestures
(Bonferroni, ps < .001). These findings show that Chinese speakers follow patterns akin to
English speakers in their relative use of different packaging strategies in co-speech gesture,
frequently synthesizing manner and path into a single gesture.

3.1.3. Silent gesture


As expected, the three languages showed similarities in the packaging of motion elements
in silent gesture, with a main effect of Packaging (F(1, 57) = 240.71, p < .001, ɳ2 p = 0.81),
a main effect of Language (F(2, 57) = 7.79, p = .001, ɳ2 p = .22), but no interaction between
Language and Packaging (F(2, 57) = 2.21, p = .119). Speakers across the three languages
produced a greater number of conflated than separated responses (M = 12.38, SE = 0.38 vs.
M = 2.30, SE = 0.31; see Fig. 3c)—a pattern that was more pronounced in Chinese particu-
larly compared to Turkish (Bonferroni, p = .03).

Fig. 3. Mean number of sentence units with separated (manner-only, path-only, manner-path) or conflated
(manner+path) motion elements in speech (a), in gesture with speech (co-speech gesture; b), and in gesture with-
out speech (silent gesture; c). Error bars represent the standard error. The maximum possible number of sentence
units was 16 for silent gesture condition.
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
I. S. Tütüncü et al. / Cognitive Science 47 (2023) 19 of 30

3.2. Ordering of semantic elements


3.2.1. Speech
Motion descriptions in speech showed no effect of Ordering (F(1,57) = 0.01, p = .92) or
Language (F(2,57) = 2.58, p = .08), but a significant interaction between the two (F(2,57)
= 324.48, p < .001, ɳ2 p = 0.92). As expected, English speakers produced the (Figure)-
MOTION-Ground order more often than (Figure)-Ground-MOTION order (Bonferroni,
p < .001). The pattern was reversed for Turkish speakers: they produced (Figure)-Ground-
MOTION order more often than (Figure)-MOTION-Ground order (Bonferroni, p < .001).
Chinese speakers, on the other hand, did not differ in their production of the two order types,
producing each at comparable rates (Bonferroni, p = .151; see Fig. 4a). Importantly, Chi-
nese speakers differed from both English and Turkish speakers: they used (Figure)-Ground-
MOTION order more than English but less than Turkish speakers (Bonferroni, ps < .001)
and (Figure)-MOTION-Ground order more than Turkish but less than English speakers (Bon-
ferroni, ps < .001) in their spoken descriptions. These findings thus suggest that Chinese
speakers show a pattern different from both English and Turkish speakers, by using the two
order types at comparable rates in their speech about motion.

3.2.2. Co-speech gesture


Motion descriptions in co-speech gesture followed a pattern similar to speech, with a main
effect of Ordering (M(Figure)-Ground-MOTION = 2.00, SE = 0.36 vs. M(Figure)- MOTION-Ground = 0.85,
SE = 0.15; F (1,57) = 8.91, p = .004, ɳ2 p = 0.14) and no effect of Language (F (2,57) =
1.64, p = .20), but an interaction between Language and Order, F (2,57) = 324.48, p =
.006, ɳ2 p = 0.16. Chinese speakers produced significantly more (Figure)-Ground-MOTION
order than (Figure)-MOTION-Ground order (Bonferroni, p < .001)—a pattern that was also
observed but not reliable in Turkish (Bonferroni, p = .140). English speakers tended to pro-
duce the opposite pattern, with slightly more (Figure)-MOTION-Ground order than (Figure)-
Ground-MOTION order responses in their co-speech gestures; however, this tendency was
not significant either (Bonferroni, p = .602; see Fig. 4b). Chinese speakers also produced
more (Figure)-Ground-MOTION order in co-speech gesture compared to English speakers
(Bonferroni, p = .02).
It is important to note here that in all three languages, including Chinese, speakers produced
relatively small numbers of gesture strings in co-speech gesture: (Figure)-MOTION-Ground
order: English: n = 26, Chinese n = 10, Turkish n = 16 instances; (Figure)-Ground-MOTION
order: English n = 18, Chinese n = 66, Turkish n = 37 instances, resulting in non-normal
distribution of the data. In fact, across the three languages, only 6% of the sentence units
included a gesture string that showed one of the two orders; the rest of the sentence units
either included a single gesture (e.g., gesture for motion only), unclear order patterns (e.g.,
Ground-Motion-Ground), or no co-speech gesture. We, therefore, followed up the ANOVAs,
with Kruskal–Wallis tests and found similar patterns which showed reliable differences only
for the Figure-Ground-MOTION order (H (2) = 13.77, p = .001)—with Chinese and Turkish
speakers producing significantly more responses with Figure-Ground-MOTION order than
English speakers (p < .05).
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
20 of 30 I. S. Tütüncü et al. / Cognitive Science 47 (2023)

3.2.3. Silent gesture


Turning next to silent gesture, we found an effect of Ordering, with speakers across the
three languages producing (Figure)-Ground-MOTION order significantly more than (Figure)-
MOTION-Ground order in their silent gestures (M = 7.51, SE = 0.56 vs. M = .72, SE = 0.20,
F (1,57) = 115.12, p < .001, ɳ2 p = 0.68; see Fig. 4c). The description of motion events in
silent gesture also showed a main effect of Language (F (2,57) = 10.98, p < .001, ɳ2 p =
0.28), and a significant interaction between Language and Order (F (2,57) = 14.98, p <
.001, ɳ2 p = 0.34). Overall, Chinese speakers (M = 5.85, SE = 0.48) produced more silent
gestures than both English (M = 3.78, SE = 0.45, p < .001) and Turkish (M = 2.72, SE
= 0.48, p < .001) speakers, and also showed a more pronounced preference for (Figure)-
Ground-MOTION order in their silent gestures (99%) particularly compared to English (80%;
Bonferroni, ps < .001).
Speakers showed individual variability in their likelihood of combining gestures into
strings in silent gesture, resulting in non-normal distribution of the data. We, therefore, fol-
lowed up the ANOVAs with Kruskal–Wallis tests. We found differences both for the (Figure)-
Ground-MOTION order (H(2) = 19.70, p < .001) and (Figure)-MOTION-Ground order (H
(2) = 11.68, p = .003)—with Chinese speakers producing significantly more responses with
Figure-Ground-MOTION order and fewer responses with (Figure)-MOTION-Ground order
compared to English and Turkish speaking participants (ps < .05)—a pattern consistent with
the results of the ANOVA. Thus overall, speakers in all three languages showed a greater
preference for (Figure)-Ground-MOTION order in their silent gestures, with this preference
being stronger in Chinese.

Fig. 4. Mean number of sentence units that follow (Figure)-Ground-MOTION or (Figure)-MOTION-Ground


orders in speech (a), in gesture with speech (co-speech gesture; b), and in gesture without speech (silent ges-
ture; c). Error bars represent standard error. The maximum possible number of sentence units was 16 for silent
gesture condition. Overall, speakers produced more sentence units that followed either of the two orders in the
silent gesture condition than in the co-speech gesture condition (M = 8.23, SD = 5.0 vs. M = 2.85, SD = 3.01)
across the three languages.
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
I. S. Tütüncü et al. / Cognitive Science 47 (2023) 21 of 30

4. Discussion

Previous research focusing on S- and V-languages demonstrated that languages differ in


the packaging (conflated vs. separated) and ordering (Figure-MOTION-Ground vs. Figure-
Ground-MOTION) of semantic elements of a motion event in speech and co-speech gesture
but not in silent gesture (e.g., Özçalışkan et al., 2016a, 2016b, 2018). In the current study, we
took these findings one step further by asking whether patterns of packaging and ordering of
motion events found in speech, co-speech gesture, and silent gesture extend to Chinese—an
E-language that does not follow the binary split in its expression of motion. We found that
English, Turkish, and Chinese speakers showed cross-linguistic differences in the way they
packaged and ordered semantic elements of a motion event in their speech and co-speech
gestures. However, these cross-linguistic differences were not observable in silent gesture: all
languages including Chinese demonstrated the same preference for the conflated pattern in
packaging and the (Figure)-Ground-MOTION pattern in ordering. Our results thus provide
further support for the “thinking-for-speaking” account proposed by Slobin (1996, 2004),
which argues that language influences thought only during the process of online speech pro-
duction (i.e., speech and co-speech gesture) but not in offline production (i.e., silent gesture).

4.1. Packaging of semantic elements


Our study follows earlier work in patterns of speech production, with English speakers
showing a preference for conflated, Turkish speakers showing a preference for separated pack-
aging strategies (Emerson, Limia, & Özçalışkan, 2021; Kita & Özyurek, 2003; Özçalışkan
et al., 2016a, 2016b), and Chinese following a pattern akin to English (Chen & Guo, 2010;
Paul et al., 2022) by relying on conflated packaging strategy in their speech about motion. In
addition to replicating earlier work on patterns of speech production, our study also extended
these findings to gesture, showing that co-speech gestures—including those produced by Chi-
nese speakers—mirrored the patterns found in speech. These results thus provide further evi-
dence that speech and co-speech gesture form a tightly integrated system (Kita & Özyürek,
2003; McNeill, 1992).
One of our predictions was that Chinese speakers would show patterns characteristic of
both English and Turkish speakers in their packaging of motion in speech. Our results, how-
ever, showed that Chinese speakers were more closely aligned with English speakers, also
showing a greater preference for the conflated strategy in speech. Then, does this mean that
Chinese should be classified as an S-language as originally proposed by Talmy (1985, 2002)
or as an E-language as later suggested by Slobin (2004)? If we base our typological classifi-
cation on the frequency of conflated (manner + path) expressions in speech, Chinese should
be classified as an S-language. At the same time, however, Chinese differed from English (an
S-language) in an important way as the speakers predominantly used serial verb construc-
tions for their conflated expressions (86%), in which they encoded both manner and path in
the verb (“pǎo-jìn” = run-enter), the core component of a motion event description. This
places equal emphasis on both manner and path information (Brown & Chen, 2013) and dif-
fers from English speakers, who placed the emphasis on the manner component by expressing
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
22 of 30 I. S. Tütüncü et al. / Cognitive Science 47 (2023)

it exclusively in the verb followed by a path satellite (100%). This is a pattern that has also
been shown in recent work which showed that Chinese speakers relied heavily on the con-
flated strategy, using primarily serial verbs (70%; Paul et al., 2022), while English speakers
used almost exclusively manner verb + path satellite constructions (Özçalışkan et al., 2016a,
2016b) in describing motion event scenes. Accordingly, if we base our classification in terms
of the specific linguistic strategies (manner verb + path satellite vs. manner verb + path verb)
speakers employ and the types of motion information the verb conveys as well as the relative
frequency of separated expressions in speech, then we can classify Chinese as belonging to
a different language type (i.e., E-language). This then raises the question about the applica-
bility of using a binary (S- vs. V-languages) or a tertiary split (S- vs. V- vs. E-languages) in
explaining patterns of habitual expression by native speakers of the world’s languages (Croft,
Barðdal, Hollmann, Sotirova, & Taoka, 2010; Lewandowski & Mateu, 2020 for related dis-
cussions). As suggested by Beavers, Levin, and Tham (2010), speakers of a language of one
type (e.g., S-language) use expressions that are more typical of another language type (e.g.,
V-language). In fact, we saw some evidence of this in our data, where Chinese speakers, also
used (although not as frequently, 12%) manner verb + path satellite constructions in their
motion expressions—an expression type that is more commonly associated with S-languages,
such as English. However, our finding that Chinese speakers use conflated constructions in
speech at rates comparable to or even higher than English speakers but do so by primarily
using a different construction type that places equal weight on manner and path in the verb
(i.e., the core component of a motion event) might be sufficient to classify it as a different
language type (i.e., E-language).
We also predicted that the co-speech gestures—including those produced by Chinese
speakers—will mirror the patterns found in speech. Our results confirmed this prediction
by showing that all three different types of languages showed the same cross-linguistic dif-
ferences in co-speech gesture as in speech. Both English (68%) and Chinese (78%) speakers
primarily relied on the conflated packaging strategy (describing both the manner and path
information in a single gesture; e.g., wiggling fingers left to right to convey running in left to
right trajectory). Turkish speakers, on the other hand, primarily relied on the separated pack-
aging strategy (65%, describing only manner or only path information in a single gesture;
e.g., circling arms with fisted hands near body to convey running; tracing a line from left to
right with index finger to convey left to right trajectory).
The speakers in the three languages also showed some variability in their separated strate-
gies in both speech and co-speech gestures. The majority (60%) of the separated descrip-
tions in English were manner only in both speech and co-speech gestures. This pattern was
reversed in Chinese speakers, where the majority of the separated descriptions (55–60%)
were path only in both speech and co-speech gestures. Turkish speakers, on the other hand,
opted to express path and manner sequentially in a main clause + subordinate clause con-
struction in speech (65%), accompanied mostly with either manner-only gestures (74%) or
manner-path sequential gestures (14%). Even though the three languages showed some vari-
ability in the types of separated strategies they relied on, they showed a clear contrast in their
relative preference for conflated versus separated packaging of motion events: Chinese and
English speakers relied mostly on conflated descriptions, Turkish speakers relied mostly on
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
I. S. Tütüncü et al. / Cognitive Science 47 (2023) 23 of 30

separated descriptions both in their speech and co-speech gestures. In addition, regardless of
the variability in the choice of specific separated packaging strategies, the co-speech gestures
produced by speakers of the three languages predominantly conveyed the same information
as the accompanying speech (English: 63%, Chinese: 62%, Turkish: 63%). These findings
thus provide further support for theories of gesture-speech integration (e.g., Interface theory;
Kita & Özyürek, 2003), namely that gesture and speech form a closely knit system, where
co-speech gesture is largely shaped by how semantic information is packaged for speech
production, along with the spatio-motoric properties of the motion event. Our findings thus
further extend earlier work that showed similar patterns of packaging in speech and co-speech
gesture production in the description of motion events (Duncan, 2001; Kita & Özyürek, 2003;
Özçalışkan, 2009; Özçalışkan et al., 2016a, 2018; Wessel-Tolvig & Paggio, 2016).
Importantly, the differences that we observed in co-speech gesture were not evident when
speakers described each scene only in silent gesture without using speech—providing the
first evidence that Chinese follows the patterns previously seen for both English and Turkish
(Özçalışkan et al., 2016a, 2018)—even in Turkish where the speakers had to abandon the
separated packaging strategy that they relied on almost exclusively in their speech and co-
speech gestures. Furthermore, the use of conflated gestures by Chinese and English speakers
was more pronounced in the silent gesture condition than in the co-speech gesture condi-
tion. What might underlie this preference for conflated packaging that expresses both manner
and path together in a single clause or gesture? One possible explanation—as suggested by
Özçalışkan et al. (2016a)—could be that the preference is driven by the need to convey maxi-
mal information with minimal effort. Turkish speakers need an adjunct or a subordinate clause
to express manner in descriptions that use path verbs, thus resulting in greater syntactic com-
plexity and greater omission of manner information in speech about motion (Özçalışkan &
Slobin, 1999). Gesture, on the other hand, provides an opportunity to express both the manner
and the path information simultaneously, thus creating a relatively easy tool to express both
components of motion. This, however, raises the question as to why Turkish speakers use the
separated pattern in co-speech gesture but revert to the conflated pattern in silent gesture and
why Mandarin speakers increase their use of conflated gestures in the silent gesture condi-
tion. One likely explanation might be the modality that carries the burden of communication
in conveying information (Goldin-Meadow, 2005). In the co-speech gesture condition, it is
the speech system that carries the primary burden and gesture is secondary to this system.
Accordingly, speakers’ descriptions in co-speech gesture—as the secondary system—simply
follow what is in their speech. In the silent gesture condition, on the other hand, the main bur-
den rests on the gesture itself, which when not accompanied by speech, might become a more
structured system itself akin to a “language of pantomime.” This language of pantomime
might be driven by the need to be communicatively more informative (Grice, 1975) without
the constraints of an accompanying spoken language system, thus resulting in gestures that
convey greater semantic information (both manner and path instead of only manner or only
path).
Our results also provide further support for thinking-for-speaking account (Slobin, 1996)
which argues that language influences thought but only during online production of speech.
In line with the weaker version of the linguistic relativity hypothesis, speakers of all three
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
24 of 30 I. S. Tütüncü et al. / Cognitive Science 47 (2023)

languages showed the same cross-linguistic differences in their gestures as in their speech but
only when they were gesturing about them with speech (co-speech gesture) but not without
speech (silent gesture). These findings are also in line with earlier work that tested the effects
of language in perceiving and remembering motion events in structurally different languages.
Findings from several studies (e.g., Gennari, Sloman, Malt, & Fitch, 2002; Hohenstein, 2005;
Naigles & Terrazas, 1998) also showed that speakers showed an effect of language on cogni-
tion when the cognitive tasks were accompanied by verbalization of the event, but no effect of
language was observed when speakers were not allowed to verbalize the event. As such, our
study joins others in suggesting a strong effect of language on the nonverbal representation of
motion events in gesture across structurally different languages of the world—but only during
online production of speech.

4.2. Ordering of semantic elements


The ordering of motion elements showed a similar pattern to packaging where cross-
linguistic differences were observed for ordering in speech and co-speech gesture—but not in
silent gesture. Our findings replicated earlier research (Özçalışkan et al., 2016a, 2018): Turk-
ish and English speakers differed strongly in their speech with a preference for a (Figure)-
MOTION-Ground order in English and (Figure)-Ground-MOTION order in Turkish. Chinese
speakers, on the other hand, showed the predicted in-between pattern, using both (Figure)-
MOTION-Ground and (Figure)-Ground-MOTION orders equally in their speech, extending
earlier findings (Li & Thompson, 1974) to the domain of motion events. In fact, our findings
might provide some support to Li and Thompson’s argument that Chinese is shifting from an
SVO (i.e., Figure-MOTION-Ground) to an SOV (i.e., Figure-Ground-MOTION) language,
with evidence of both orders in Chinese speakers’ language production.
We also expected to see a cross-linguistic difference in co-speech gestures mirroring the
order patterns found in speech for each language. However, similar to previous research on
ordering of semantic elements in co-speech gesture (Özçalışkan et al., 2016a), we found
that speakers of none of the languages including Chinese produced many gesture strings
when speaking. Even though English speakers produced slightly more gesture strings with
(Figure)-MOTION-Ground order and Turkish speakers produced slightly more strings with
(Figure)-Ground-MOTION order—mimicking the patterns of their speech in ordering—none
of these patterns were statistically reliable largely due to the limited number of gesture strings
available in co-speech gesture condition. Chinese speakers, who also produced relatively few
gesture strings as speakers of the other two languages, showed a preference for the (Figure)-
Ground-MOTION order—an order that differed from their speech which showed a preference
for both orders. Importantly, however, the majority of the co-speech gesture strings across the
three languages matched the order of the accompanying speech (61% for Chinese, 68% for
Turkish, and 64% for English), highlighting the close coupling between speech and co-speech
gesture in the ordering of motion elements. At the same time, the relatively small numbers
of gesture sequences with clear orders in the co-speech gesture condition (6% of all sentence
units) render it difficult to draw broader conclusions from the observed patterns for any of the
three languages.
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
I. S. Tütüncü et al. / Cognitive Science 47 (2023) 25 of 30

Importantly, however, speakers showed a lack of a cross-linguistic difference when describ-


ing the same scenes only with their hands, in silent gesture, with an overwhelming prefer-
ence for (Figure)-Ground-MOTION order, thus replicating previous work on motion events in
English and Turkish (Özçalışkan et al., 2016a, 2016b, 2018) and transitive events in Chinese
(Goldin-Meadow et al., 2008). However, in contrast to the results for the packaging prefer-
ences, it was English and Chinese speakers who abandoned the (Figure)-MOTION-Ground
order that they used in their speech and predominantly used (Figure)-Ground-MOTION order
in their silent gestures.
What might explain this preference for Figure-Ground-MOTION ordering? One possi-
ble explanation could be that Figure-Ground-MOTION order is cognitively easier (Gentner,
1982; Goldin-Meadow et al., 2008; Özçalışkan et al., 2016b). More specifically, describing
the figure and the ground before the motion might ease the processing load, and it might be
particularly effective in silent gestures as gesture does not allow for the grammatical marking
of who did what to whom (Goldin-Meadow et al., 2008; Özçalışkan et al., 2016b). Evidence
for this possibility comes from an earlier study (Langus & Nespor, 2010) that tested speakers
of two languages with different word orders (Italian vs. Turkish) in a silent gesture com-
prehension test using transitive events, involving an actor, an action, and a patient (e.g., girl
catches fish): speakers of both languages showed faster reaction times in their comprehension
of silent gestures that followed the Actor-Patient-ACTION order (akin to the Figure-Ground-
MOTION order in our study).
The relative cognitive ease of the Figure-Ground-MOTION order might also be reflected
in the ordering patterns used by home signers as well as signers of newly emerging sign
languages (Napoli & Sutton-Spence, 2014), both of which are communication systems at
a relatively less complex stage of development. For instance, a study by Goldin-Meadow
and Mylander (1998) showed that American and Chinese deaf children of hearing parents
who had not received early exposure to a conventional sign language displayed consistent
OV (Ground-MOTION) order in their gestures. Similarly, emergent sign languages (e.g.,
Nicaraguan Sign Language; Senghas, Newport, & Supalla, 1997; Bedouin Sign Language;
Sandler, Meir, Padden, & Aronoff, 2005) that emerged as new languages in the background of
spoken languages with different word orders still relied on predicate-final orders (e.g., SOV:
Figure-Ground-MOTION) in their early sign systems. The preferred predicate-final order-
ing of semantic elements in silent gesture also extends to other types of events encoded in
silent gesture across speakers of different languages (e.g., transitive events encoding Actor-
Object-ACTION; Spanish, English, Chinese, Turkish; Goldin-Meadow et al., 2008; English,
Japanese, Korean; Gibson et al., 2013).
Research also suggests that this pattern even extends to nonverbal representation of
events other than gesture. Earlier work (Goldin-Meadow et al., 2008; see also Gershkoff-
Stowe & Goldin-Meadow, 2002) has show that adult native speakers of different lan-
guages order pictorial depictions of event components in ways similar to silent ges-
ture, placing the picture depicting the action or motion at the end a sequence, sug-
gesting that this type of ordering might be broadly applicable to other nonverbal repre-
sentations beyond gesture as well. These findings thus suggest that the Figure-Ground-
MOTION order might be the default way we conceptualize order of events in the world—
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
26 of 30 I. S. Tütüncü et al. / Cognitive Science 47 (2023)

a hypothesis that needs to be tested in future studies with young infants who have yet
to acquire linguistic representations of events in their native language. Gibson et al.
(2013), in fact, argue that speakers show a predicate-final ordering preference, but there
are contexts in which this default pattern might be altered. For example, when speakers
are asked to describe the relation between two semantically reversible entities (e.g., girl
kicks boy) in silent gesture, they prefer a predicate-middle order (Actor-ACTION-Patient)
due to the potential ambiguity that arises with the Actor-Patient-ACTION construction as
there are two plausible agents on the same side of the verb (Gibson et al., 2013; Meir,
Lifshitz, Ilkbasaran, & Padden, 2010). As such, and as suggested by Langus and Nespor
(2010; see also Schouwstra 2012), predicate-FINAL order is only preferred when the events
involve simple, but not complex scenarios. Taken together, these findings suggest that the
default order that is preferred in the nonverbal representation of events and actions can vary
by the type or complexity of the event. The scenes in our study all consisted of an animate
entity moving in relation to an inanimate entity; as such, the Figure-Ground-MOTION order
might reflect the default pattern in conveying simple motion events involving irreversible
entities.
In our study, we focused on adults speaking each of the three languages. Earlier work
on cross-linguistic differences in children showed evidence of early-emerging differences in
their speech about motion events (e.g., Gullberg et al., 2008, Hickmann, Taranne, & Bonnet,
2009, Özçalışkan, 2009; Özçalışkan & Slobin, 1999). Relatively less is known about pat-
terns in co-speech gesture in children; and the existing research suggests inconclusive results:
some studies show early attunement (ages 3–4; Özçalışkan, Gentner, & Goldin-Meadow,
2014; Özçalışkan, Goldin-Meadow, Stam, & Ishino, 2011), while others show later attune-
ment (Özyürek et al., 2008) to language-specific patterns in co-speech gesture. Research on
the developmental trajectory of silent gestures is even sparser, with only one study suggest-
ing early emerging similarities in silent gesture in both packaging and ordering of motion
elements in both S- and V-languages (Özçalışkan, Lucero, & Goldin-Meadow, 2023). As
such, future work that examines cross-linguistic similarities and differences in co-speech
and silent gestures across a broader set of languages over developmental time can provide
further insight into the origins of cross-linguistic variability (or the lack thereof) in ges-
ture when speaking and when not speaking. Moreover, in this study, we focus only on one
language as representing a language type (e.g., English for S-language), but there are stud-
ies that suggest that expression of motion in speech shows considerable variability within
a typological group in terms of their packaging strategies (i.e., intratypological variation;
e.g., Lewandowski & Mateou, 2020; Lewandowski & Özçalışkan, 2021; also see Goschler
& Stefanowitsch, 2013 for a recent review) and that languages can be grouped along a con-
tinuum with different levels of path or manner salience (Ibarretxe-Antunano, 2009; Slobin,
2004). Most of the earlier work also focused on either predicate-final or predicate-middle
languages, leaving patterns of expression in predicate-initial languages largely unexamined.
As such, future studies that include multiple exemplars of a given language type and order
can provide further insight into the effect of language structure on gestures with or without
speech.
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
I. S. Tütüncü et al. / Cognitive Science 47 (2023) 27 of 30

In conclusion, our study showed that speakers of all three types of languages (S-, V-, and
E-languages) display cross-linguistic differences in their speech and co-speech gestures about
motion; but they do not rely on language-specific patterns in silent gesture and instead show
cross-linguistic similarities. These findings, thus, give further support to the “thinking-for-
speaking account” proposed by Slobin (1996), which argues that language influences thought
only during the process of online speech production (i.e., speech and co-speech gesture) but
not during offline production (i.e., silent gesture).

Acknowledgments

We thank Damla Çörekli and Vanessa Larrick for their help with data collection, and Kai
Thomas for the transcription and coding of the data.

Funding

This work was partially supported by The Scientific and Technological Research Council
of Turkey (TUBITAK) [53325897-115.02-170549; Şengül, PI]; Research in Acquiring Lan-
guage and Literacy Seed Grant (Özçalışkan, PI).

References

Allen, S., Özyürek, A., Kita, S., Brown, A., Furman, R., Ishizuka, T., & Fujii, M. (2007). Language-specific and
universal influences in children’s syntactic packaging of Manner and Path: A comparison of English, Japanese
and Turkish. Cognition, 102, 16–48.
Beavers, J., Levin, B., & Tham, S. W. (2010). The typology of motion expressions revisited. Journal of Linguistics,
46(2), 331–377.
Brown, A., & Chen, J. (2013). Construal of manner in speech and gesture in Mandarin, English, and Japanese.
Cognitive Linguistics, 24(4), 605–631.
Cardini, F. E. (2010). Evidence against Whorfian effects in motion conceptualisation. Journal of Pragmatics,
42(5), 1442–1459.
Chang, J. (2001). The syntax of event structure in Chinese [Ph.D. Dissertation]. University of Hawaii.
Chao, Y. (1968). A grammar of spoken Chinese. University of California Press.
Chen, L., & Guo, J. (2009). Motion events in Chinese novels: Evidence for an equipollently-framed language.
Journal of Pragmatics, 41(9), 1749–1766.
Chen, L., & Guo, J. (2010). From language structures to language use: A case from Mandarin motion expression
classification. Chinese Language and Discourse, 1(1), 31–65.
Choi, S., & Bowerman, M. (1991). Learning to express motion events in English and Korean: The influence of
language-specific lexicalization patterns. Cognition, 41(1–3), 83–121.
Chui, K. (2012). Crosslinguistic comparison of representations of motion in language and gesture. Gesture, 12(1),
40–61.
Croft, W., Barðdal, J., Hollmann, W., Sotirova, V., & Taoka, C. (2010). Revising Talmy’s typological classification
of complex event constructions. Contrastive Studies in Construction Grammar, 10, 201–236.
de Ruiter, J. P. (2000). The production of gesture and speech. In D. Mc-Neill (Ed.), Language and gesture (pp.
284–311). Cambridge: Cambridge University Press.
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
28 of 30 I. S. Tütüncü et al. / Cognitive Science 47 (2023)

Dryer, M. S. (2013). On the six-way word order typology, again. Studies in Language, 37(2), 267–301.
Duncan, S. (2001). Co-expressivity of speech and gesture: Manner of motion in Spanish, English, and Chinese. In
Annual Meeting of the Berkeley Linguistics Society (pp. 353–370).
Emerson, S. N., Conway, C. M., & Özçalışkan, Ş. (2020). Semantic P600—but not N400—effects index crosslin-
guistic variability in speakers’ expectancies for expression of motion. Neuropsychologia, 149(107638), 1–21.
Emerson, S. N., Limia, V. D., & Özçalışkan, S. (2021). Turkish-English bilinguals’ description of motion events.
Lingua, 264, 1–19.
Gennari, S. P., Sloman, S. A., Malt, B. C., & Fitch, W. T. (2002). Motion events in language and cognition.
Cognition, 83(1), 49–79.
Gentner, D. (1982). Why nouns are learned before verbs: Linguistic relativity versus natural partitioning. In S.
A. Kuczaj (Ed.), Language development: Vol. 2. Language, thought and culture (pp. 301–334). Hillsdale, NJ:
Erlbaum.
Gershkoff-Stowe, L., & Goldin-Medow, S. (2002). Is there a natural order for expressing semantic relations?
Cognitive Psychology, 45(3), 375–412.
Gibson, E., Piantadosi, S. T., Brink, K., Bergen, L., Lim, E., & Saxe, R. (2013). A noisy channel account of
crosslinguistic word order variation. Psychological Science, 24(7), 1079–1088.
Givon, T. (2009). The genesis of syntactic complexity. Amsterdam: Benjamins.
Goldin-Meadow, S. (2005). The two faces of gesture: Language and thought. Gesture, 5(1–2), 241–257.
Goldin-Meadow, S., & Mylander, C. (1998). Spontaneous sign systems created by deaf children in two cultures.
Nature, 391(6664), 279–281.
Goldin-Meadow, S., So, W. C., Özyürek, A., & Mylander, C. (2008). The natural order of events: How speakers of
different languages represent events nonverbally. Proceedings of the National Academy of Sciences, 105(27),
9163–9168.
Goschler, J., & Stefanowitsch, A. (2013). Variation and change in the encoding of motion events (Vol. 41). John
Benjamins Publishing.
Greenberg, J. H. (1963). Universals of language. MIT Press.
Grice, H. P. (1975). Logic and conversation. In P. Cole & J. L. Morgan (Eds.), Syntax and Semantics: Speech acts
(pp. 41–58). Brill.
Gullberg, M., Hendriks, H., & Hickmann, M. (2008). Learning to talk and gesture about motion in French. First
Language, 28(2), 200–236.
Hickmann, M., Taranne, P., & Bonnet, P. (2009). Motion in first language acquisition: Manner and Path in French
and English child language. Journal of Child Language, 36(4), 705–741.
Hohenstein, J. M. (2005). Language-related motion event similarities in English- and Spanish-speaking children.
Journal of Cognition and Development, 6(3), 403–425.
Hsueh, F.-S. (1989). The structure meaning of BA and BEI constructions in Mandarin Chinese. In J. Tai & F.-H.
Hsueh (Eds.), Functionalism and Chinese grammar (pp. 95–125). Chinese Language Teachers Association.
Ibarretxe-Antuñano, I. (2004). Language typologies in our language use: The case of Basque motion events in
adult oral narratives. Cognitive Linguistics, 15(3), 317–349.
Ibarretxe-Antuñano, I.(2009). Path salience in motion events. In J. Guo, E. Lieven, N. Budwig, S. Ervin-Tripp, K.
Nakamura & Ş. Özçalışkan (Eds.), Crosslinguistic approaches to the psychology of language: Research in the
tradition of Dan Isaac Slobin (pp. 403–414). New York: Psychology Press.
Jin, L. (2008). Markedness and second language acquisition of word order in Mandarin Chinese. In Proceedings
of the 20th North American Conference on Chinese Linguistics (pp. 297–308). East Asian Studies Center.
Kita, S., & Özyürek, A. (2003). What does cross-linguistic variation in semantic coordination of speech and
gesture reveal?: Evidence for an interface representation of spatial thinking and speaking. Journal of Memory
and Language, 48(1), 16–32.
Langus, A., & Nespor, M. (2010). Cognitive systems struggling for word order. Cognitive Psychology, 60(4),
291–318.
Levshina, N. (2019). Token-based typology and word order entropy: A study based on universal dependencies.
Linguistic Typology, 23(3), 533–572.
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
I. S. Tütüncü et al. / Cognitive Science 47 (2023) 29 of 30

Lewandowski, W., & Özçalışkan, Ş. (2021). The specificity of event expression in first language influences expres-
sion of object placement events in second language. Studies in Second Language Acquisition, 43(4), 838–869.
Lewandowski, W., & Mateu, J. (2020). Motion events again: Delimiting constructional patterns. Lingua, 247,
1–25.
Lewandowski, W., & Özçalışkan, Ş. (2018). How event perspective influences speech and co-speech gestures
about motion. Journal of Pragmatics, 128, 22–29.
Li, C. N., & Thompson, S. (1975). The semantic function of word order: A case study in Mandarin. In C. N. Li
(Ed.), Word order and word order change (pp. 163–195). Austin, TX: University of Texas Press.
Li, C. N., & Thompson, S. A. (1974). An explanation of word order change SVO→ SOV. Foundations of Lan-
guage, 12(2), 201–214.
Li, C. N., & Thompson, S. A. (1989). Mandarin Chinese: A functional reference grammar (Vol. 3). University of
California Press.
McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago, IL: University of Chicago
Press.
McNeill, D. (2000). Analogic/analytic representations and crosslinguistic differences in thinking for speaking.
Cognitive Linguistics, 11(1–2), 43–60.
Meir, I., Lifshitz, A., Ilkbasaran, D., & Padden, C. (2010). The interaction of animacy and word order in human
languages: A study of strategies in a novel communication task. In A. D. M. Smith, M. Schouwstra, B. de Boer,
& K. Smith (Eds.), Proceedings of the 8th Evolution of Language Conference (pp. 455–456). Singapore: World
Scientific Publishing Co.
Naigles, L. R., & Terrazas, P. (1998). Motion-verb generalizations in English and Spanish: Influences of language
and syntax. Psychological Science, 9(5), 363–369.
Napoli, D. J., & Sutton-Spence, R. (2014). Order of the major constituents in sign languages: Implications for all
language. Frontiers in Psychology, 5, 376.
Özçalışkan, Ş. (2004). Typological variation in encoding the manner, path, and ground components of a metaphor-
ical motion event. Annual Review of Cognitive Linguistics, 2(1), 73–102.
Özçalışkan, Ş. (2009). Learning to talk about spatial motion in language-specific ways. In J. Guo, E. Lieven, S.
Ervin-Tripp, N. Budwig, K. Nakamura, & Ş. Özçalışkan (Eds.), Cross-linguistic approaches to the psychology
of language: Research in the tradition of Dan Isaac Slobin (pp. 263–276). Psychology Press.
Özçalışkan, Ş. (2015). Ways of crossing a spatial boundary in typologically distinct languages. Applied Psycholin-
guistics, 36(2), 485–508.
Özçalışkan, Ş. (2016). Do gestures follow speech in bilinguals’ description of motion? Bilingualism: Language
and Cognition, 19(3), 644–653.
Özçalışkan, Ş., & Emerson, S. (2016). Learning to talk, think and gesture about motion in language-specific
ways: Insights from Turkish. In N. Ketrez & B. Haznedar (Eds.), Trends in language acquisition research: The
acquisition of Turkish in childhood (pp. 177–191). John Benjamins.
Özçalışkan, Ş., & Slobin, D. I. (1999). Learning ‘how to search for the frog’: Expression of manner of motion in
English, Spanish and Turkish. In A. Greenhill, H. Littlefield, & C. Tano (Eds.), Proceedings of the 23rd Boston
University Conference on Language Development (pp. 541–552). Cascadilla Press.
Özçalışkan, Ş., & Slobin, D. I. (2000). Climb up vs. ascend climbing: Lexicalization choices in expressing motion
events with manner and path components. In Proceedings of the 24th Annual Boston University Conference on
Language Development (Vol. 2, pp. 558–570). Somerville, MA: Cascadilla Press.
Özçalışkan, Ş., & Slobin, D. I. (2003). Codability effects on the expression of manner of motion in Turkish and
English. In Studies in Turkish linguistics. Istanbul: Boğaziçi.
Özçalışkan, Ş., Gentner, D., & Goldin-Meadow, S. (2014). Do iconic gestures pave the way for children’s early
verbs? Applied Psycholinguistics, 35, 1143–1162.
Özçalışkan, Ş., Lucero, C., & Goldin-Meadow, S. (2016a). Does language shape silent gesture? Cognition, 148,
10–18.
Özçalışkan, Ş., Lucero, C., & Goldin-Meadow, S. (2016b). Is seeing gesture necessary to gesture like a native
speaker? Psychological Science, 27(5), 737–747.
15516709, 2023, 4, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cogs.13261, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
30 of 30 I. S. Tütüncü et al. / Cognitive Science 47 (2023)

Özçalışkan, Ş., Lucero, C., & Goldin-Meadow, S. (2018). Blind speakers show language-specific patterns in co-
speech gesture but not silent gesture. Cognitive Science, 42(3), 1001–1014.
Özçalışkan, Ş., Goldin-Meadow, S. (2011). Is there an iconic gesture spurt at 26 months. In Gale Stam & Mika
Ishino (Eds.), Integrating Gestures: The Interdisciplinary Nature of Gesture. Amsterdam, NL: John Benjamins.
(163–174).
Özçalışkan, Ş., Lucero, C., & Goldin-Meadow, S. (2023). What the development of gesture with and without
speech can tell us about the effect of language on thought. [Accepted] Language & Cognition
Özyürek, A., Kita, S., Allen, S., Brown, A., Furman, R., & Ishizuka, T. (2008). Development of cross-linguistic
variation in speech and gesture: Motion events in English and Turkish. Developmental Psychology, 44(4),
1040–1054.
Paul, J., Emerson, S. N., & Özçalışkan, Ş. (2022). Does dialect matter in the expression of motion? Ways of
encoding manner and path in Babao and standard Mandarin. Lingua, 270, 1–17.
Qu, Y. (1994). Object noun phrase dislocation in Mandarin Chinese [Doctoral dissertation] University of British
Columbia.
Sandler, W., Meir, I., Padden, C., & Aronoff, M. (2005). The emergence of grammar: Systematic structure in a
new language. Proceedings of the National Academy of Sciences, 102(7), 2661–2665.
Sapir, E. (1961). Culture, language and personality. Selected essays. Berkeley and Los Angeles: University of
California Press.
Schouwstra, M. (2012). Semantic structures, communicative principles and the emergence of language. Nether-
lands Graduate School of Linguistics.
Senghas, A., Newport, E. L., & Supalla, T. (1997). Argument structure in Nicaraguan Sign Language: The emer-
gence of grammatical devices. In Proceedings of the 21st Annual Boston University Conference on Language
Development.
Shyu, S. I. (2001). Remarks on object movement in Mandarin SOV order. Language and Linguistics, 2(1), 93–124.
Slobin, D. I. (2004). The many ways to search for a frog: Linguistic typology and the expression of motion events.
In S. Strömqvist & L. Verhoeven (Eds.), Relating events in narrative: Typological and contextual perspectives
(pp. 219–257). Lawrence Erlbaum Associates.
Slobin, D. I. (1996). From ‘thought’ and ‘language’ to ‘thinking for speaking’. In J. J. Gumperz & S. C. Levinson
(Eds.), Rethinking linguistic relativity (pp. 70–96). Cambridge University Press.
Sun, C. F., & Givón, T. (1985). On the so-called SOV word order in Mandarin Chinese: A quantified text study
and its implications. Language, 61(2), 329–351.
Tai, J. (2003). Cognitive relativism: Resultative construction in Chinese. Language and Linguistics, 4(2), 301–316.
Talmy, L. (1985). Lexicalization patterns: Semantic structure in lexical forms. In T. Shopen (Ed.), Language
typology and semantic description. Vol. III: Grammatical categories and the lexicon (pp. 36–149). Cambridge
University Press.
Talmy, L. (2000). Toward a cognitive semantics. MIT Press.
Wessel-Tolvig, B., & Paggio, P. (2016). Revisiting the thinking-for-speaking hypothesis: Speech and gesture rep-
resentation of motion in Danish and Italian. Journal of Pragmatics, 99, 39–61.
Whorf, B. L. (1956). Language, thought and reality. Selected writings. New York: MIT Press.
Wieselman Schulman, B. (2004). A crosslinguistic investigation of the speech–gesture relationship in motion event
descriptions [Ph.D. Dissertation] University of Chicago.
Xu, Z. (2013). An empirical study of motion expressions in Mandarin Chinese. English Language and Literature
Studies, 3(4), 53–67.

You might also like