Professional Documents
Culture Documents
Phillips Borodistky 2003 Replication Spring 2022
Phillips Borodistky 2003 Replication Spring 2022
Grammatical Gender and Thought: A Partial Replication of Philips & Boroditsky (2003)
Sara Finley, Saige D. Ballard, Tina Cao, Dacia Chorman, Marin Deifel, Clay Farrer, Christney
V. Kpodo, Tahra L. Menon, Selena Sandoval, Elena J. Schmidt, Kattia Teas, Emily A. Turner,
Author Note
This paper is a (slightly) revised version of a class project for PSYC 481 (Research Seminar:
Language and Thought) Spring 2022 at Pacific Lutheran University. Dr. Sara Finley served as the
instructor of record, and lead the replication. The student authors are listed in alphabetical order,
and their contributions to the project varied, but they did the vast majority of the work for this
project. Dr. Finley accepts responsibility for any and all errors on this final (public) version of the
manuscript.
Acknowledgements:
We are grateful to Ting Qiang, Jon Grahe, and the attendees of the Spring 2022 Psychology
Research Conference.
Conflicts of Interest
The authors declare no conflict of interest to disclose. This manuscript has not been previously
published and is not under consideration in the same or substantially similar form in any other
peer-reviewed media.
This study was conducted in compliance with HPRB ethical standards at Pacific Lutheran
University. Consent was received from participants prior to participating in the study.
GENDER EFFECTS ON LANGUAGE AND THOUGHT 3
Abstract
Phillips and Boroditsky (2003; Experiment 3) trained adult participants on a novel “alien”
language to test for effects of grammatical gender on perceptions of object similarity. The
present study aimed to replicate their methods and provide further support for their results. Adult
participants learned the grammatical gender categorizations for 20 animate and inanimate objects
and were tested to ensure they had mastered the categories. They were then given pairs of objects
that were either both from the same gender category or included one from each gender category
and were asked to rate how similar the items were. When analyzed by item, participants rated
objects from the same grammatical gender category as more similar than those from different
categories, suggesting that language influenced participants’ perception of the objects. However,
this same effect was not present when analyzed by participant, which could be due to the number
of items greatly exceeding the number of participants. Though we partially replicated the results
from the original study with a similar sample size, future studies could replicate this experiment
Gender Effects on Language and Thought: Replication of Philips and Boroditsky (2003)
There are over 7,000 different languages across the globe (ethnologue.com, 2022), each
with their own grammar systems, speakers, and cultures. Because language allows us to express
our thoughts, a classic question in the cognitive science of language is whether the language one
speaks determines one’s thinking. Sapir (1929) and Whorf (1940) proposed that language is the
primary indicator of thought categorization and overall experiences and perception. According to
Sapir (1929), a necessary aspect of linguistic relativity holds that language should be understood
as a product of social and cultural influences. Additionally, he argued that language is a system
of symbols representing thoughts and emotions, giving the example of the United States flag,
Linguistic determinism has been a topic of much debate in psychology and linguistics.
One perspective on the debate called to shift the focus from ‘whether language and thought are
intertwined or not’ to ‘in what ways they are intertwined’ (Thierry, 2016). Another classified the
research on linguistic determinism into three separate categories, hoping to gain insight through
categorization, coming to the conclusion that more research needs to encompass how language
interprets experiences and what that says about thought, rather than the elusive and
oversimplified question of whether language shapes thought (Lucy, 1997). These different
approaches speak to the allotment of interpretations and modifications creating the broad debate
on linguistic determinism.
Since this hypothesis was first proposed, many researchers have studied the underlying
significance of language differences, using various lenses and applications including numbers,
direction, color, and gender (Kay & Kempton, 1984; Samuel et al., 2019). Others have also
looked at the language thought relationship in general, such as ‘thinking for speaking’, which
GENDER EFFECTS ON LANGUAGE AND THOUGHT 5
focuses on the language that we use in our mind when we are processing incoming stimuli
(Slobin, 1996). Because much of the empirical findings on linguistic relativity has been mixed or
science; if an effect is unable be replicated or reproduced, there may be questions about the
reliability and validity of the findings. The recent ‘replication crisis’ in psychology suggests that
many published findings may not replicate to the same extent as their original findings, further
supporting the need to replicate findings (Nosek et al., 2022). The present study replicates
Experiment 3 of Philips and Borodistky’s (2003) paper on the role of grammatical gender and
Some languages such as Spanish or German make use grammatical gender to classify
noun objects. It is labeled “gendered” because there are often two or three categories that are
labeled feminine, masculine, and neuter. Some languages have four or more grammatical
genders, while others, like English, only have gender in the pronoun system (Corbett, 2012).
Many experiments have sought to determine whether the presence of grammatical gender in a
language has an influence on speakers’ thoughts and perceptions (Konishi, 1993; Phillips and
Boroditsky, 2003; Samuel et al., 2019; Sedlmeier et al., 2016; among others). For example,
Konishi (1993) considered whether the presence of grammatical gender might have an influence
on the potency of words among German and Spanish speakers. Sedlmeier et al. (2016) found an
effect of grammatical gender on perception of objects, but they noted that this effect seems to be
dependent upon a linguistic context. In most studies of linguistic relativity, researchers test
effects of native language. However, this creates the potential confound of culture and language,
since cultural effects may interact with linguistic effects. One way control for cultural effects is
to train naïve participants on a novel language with the grammatical property of interest. If
GENDER EFFECTS ON LANGUAGE AND THOUGHT 6
language-thought effects are robust and independent of culture, they will show up even in a
newly acquired language. This procedure also allows researchers to observe the relationship
between language and thought by objectively defining gendered language and removing
Phillips and Boroditsky (2003) told participants they would be learning a new alien
language, Gumbuzi, in which there are two ways to categorize words: either “sou” or “oos”
(these were the new gender categories). Participants viewed images of items and were told which
category the words belonged to and then were tested on their knowledge of the categories. Once
they had mastered the list, they were given random pairings of all of the items and asked to rate
how similar the items were. In these ratings, some of the trials had pairs with consistent gender
(both “oos” or “sou”) and some had inconsistent pairs (one “oos” and one “sou”). Their results
revealed that participants rated items more similarly when they were from the same gender
category than when they were from different categories, which supported the effects of
grammatical gender on a cognitive task, therefore supporting linguistic relativity (Phillips &
Boroditsky, 2003). In this paper, we provide a partial replication of one of the artificial language
learning experiments in Phillips and Boroditsky (2003) (Experiment 3). By replicating this
experiment, we hope to further support the presence of grammatical gender and its influences on
perception.
Current Study
The current study aims to replicate the findings from Philips and Boroditsky (2003) to
understand the influence that grammatical linguistic gender has on perceptions, as suggested by
Hypothesis
Consistent with the results from Philips & Boroditsky (2003), we hypothesized that
participants would rate pairs of items as more similar when they both came from the same gender
category (both “oos” or “sou”) as opposed to pairs in which there was one item from each
category (one “oos” and one “sou”). This would suggest that their familiarity with the new
Methods
Participants
A total of 31 participants were recruited for the online study through social media and
direct email, as well as students enrolled in psychology courses at Pacific Lutheran University
descriptions were provided to all participants. There was no contact between the participants and
researchers during the study so they were all treated equally and in accordance with APA ethical
guidelines. For taking part in this study, those from the PLU student population received course
credit for their participation as applicable to their class requirements; participants from the
general public received no compensation for their participation. Three participants were excluded
via a post-completion opt-out question. Additional four participants were excluded from analysis
because they failed to meet the training criteria for learning the novel language (as described
On average, participants were 20.29 years-old (SD= 4.66), 61% identified as female, 35%
male, and 4% responded as ‘other/prefer not to say.’ 64% of participants identified as white, 21%
biracial, 7% Hispanic, 4% African-American, and 4% Asian. All participants except for 2 native
English, 36% of participants spoke Spanish, 10% French, 10% American-Sign-Language, and
8% in other languages such as Swahili and various Asian languages. 12% of participants had no
prior experience in learning a second language while 22% considered themselves to be either
Materials
Because we did not have access to the original images used in Phillips and Boroditsky
(2003), we recreated a novel stimulus set with the same objects as the original study. Twenty
original digital images were created using the Adobe Illustrator software (Version 26.2.1; Adobe
Inc., 2022). Examples of these images and a list of all items are included in Appendix A. Eight
of the images were of people (four men and four women) and the other 12 were of inanimate
objects. Inanimate objects were paired based on similar categories, such as apple and pear (which
Members of each pair of inanimate objects were labeled either “oosative” or “soupative”,
as they were in the Phillips and Boroditsky study. For half of the participants, female images had
the “oosative” distinction and males had the “souptive” distinction; the distinctions were
reversed for the other half of participants. Demographic data was collected at the end to identify
race, gender, age, and participants native language, as well as if they spoke any other languages.
Procedure
While Phillips and Boroditsky (2003) study was a traditional, in-lab study, due to
COVID-19 restrictions, the present study was run using the online data collection platform
FindingFive (findingfive.com). Participants were given a description of the study and asked to
remember the pairings as they were going to be tested on them. Participants were then shown
each of the 20 images along with its grammatical distinction (i.e., sou, ballerina). Each image
GENDER EFFECTS ON LANGUAGE AND THOUGHT 9
and distinction pair were shown individually, centered on the screen, presented in a random
order. The images with their grammatical distinctions were shown three times.
After the third time, participants were shown each image without its grammatical
distinction and asked to assign its correct distinction. This was a forced choice test in which
participants had to select either “oosative” or “soupative” for each object. If the participant
answered all 20 items correctly, they were able to move on to the next phase. If the participant
answered one or more items incorrectly, they had to repeat the test again. Due to programming
constraints in FindingFive, it was not possible to have participants take the test more than twice,
so data was screened upon completion; participants scoring less than 90% on the second test
were excluded from analysis. This is different from Phillips and Boroditsky (2003), where
participants could retake the test an unlimited number of times until they received a perfect
score.
Participants were then presented with random person and inanimate object pairs for all 96
possible combinations. In half of the pairings, the assigned gender category was the consistent
with their training (e.g., both “oos” or both “sou”). The other half of the pairings showed were
opposite from their training (e.g., one object was “oos” and the other was “sou”). Participants
were asked to rate the similarity of the pairs from 1 (not similar) to 9 (very similar) using a
sliding scale. The order of pairs was randomized for each participant. Pairs would stay on screen
until participants made their selection. After completing the similarity ratings, participants
completed the demographic information and then received a debriefing page. Participants were
also asked whether they wanted their data to be included in the analysis (as a way to allow
Results
Phillips and Boroditsky (2003) used an ANOVA to analyze their data. However, the
specific variables used in their ANOVA were not clearly specified, as it appeared that they
simply compared inconsistent to consistent items, with the conditions (Oos-Fem and Oos-Masc)
collapsed. Thus, it made sense to use two paired-samples t-tests, one averaging across items, and
the other averaging across trials. When analyzed by each individual item, there was a significant
difference between the consistent pairs (M = 3.56, SD = 0.69) and the inconsistent pairs (M =
3.20, SD = 0.68), t(95) = 4.9, p < .001 (See Table 1). However, the by-participant analysis was
Table 1
Consistent Inconsistent
M SD M SD t P
While the results for this study trended in the direction as Philips and Boroditsky (2003),
only the by-items t-test was significantly significant, thus showing a partial replication of the
original study. While more contemporary statistical techniques would use a logistic regression as
opposed to an ANOVA or t-test (thus, allowing for random intercepts for subjects and items, and
GENDER EFFECTS ON LANGUAGE AND THOUGHT 11
training condition), we opted for tests that more closely matched the tests used in the original
study1.
Discussion
order to test whether learning a new language with grammatical gender influences perception of
similarity based on grammatical gender. Participants viewed 20 different images, each assigned
either a “oosative” or “soupative” label. The labels were associated with one of two grammatical
gender categories: feminine or masculine. Once participants fully learned the “oos / sou”
categories, they were given pairs of items and asked to give them a similarity rating. If
participants view items that share the same grammatical gender in the newly learned language, it
suggests that learners make use of grammatical gender in determining similarity. Our results
partially replicated the Phillips and Boroditsky (2003) original. Participants tended to rate items
as more similar when the genders were the same as the Gumbozi language they were trained on,
but only when comparing across items. This could be due to the fact that there were many more
items compared to participants, meaning the statistical power of the items test would be greater
Limitations
While the current study provides a replication Philips and Boroditsky (2003), there are
some limitations to address. First, it is important to note that the present study was not an exact
1
A generalized mixed effects model was run using the glmer function in the lme4 package (Bates et al., 2015) in R
(R Development Core Team, 2018) using RStudio (RStudio Team, 2020).
(lmer(response_value ~ 1+Consistent + (1+Consistent|stimuli_presented) + (1|participant_id) + (1|group_id), data
=Gumbozi_results, REML=FALSE). This model showed a significant difference between Inconsistent and
Consistent items, b = 0.39, SE = 0.068, t = 5.57, p < 0.001.
GENDER EFFECTS ON LANGUAGE AND THOUGHT 12
replication of Philips and Boroditsky (2003). Our study was administered online, while the
original study was conducted in a traditional laboratory setting. Using web-based testing leaves
room for potential limitations such as noisy conditions and less control over the experiment (such
as allowing participants to repeat the test until they achieved a perfect score on the test). In
addition, the items (artwork) were created by one of the authors, and differed from the original
study (though the semantic content of the items was the same). It is possible that differences in
the images could have led to differences in the similarity ratings. However, the fact that we
found our data generally replicated the original findings to be significant in these uncontrolled
conditions which greatly contrast those in the study we replicated (Phillips and Boroditsky,
An additional limitation to both the present study and Philips and Boroditsky (2003) was
that our participant pool largely consisted of native English-speaking college students. While this
was the participant pool that was readily available to us (and the one used in the original study),
it was limited in diversity and may limit the potential generalizability of the findings. While our
sample size similar to Phillips and Boroditsky (2003)’s 22 participants, the overall sample size is
relatively small, particularly for making strong, generalizable claims related to language and
thought. Another limitation of both the present study and Philips and Boroditsky (2003) is the
statistical analysis. Philips and Borodisty (2003) provided minimal details about their ANOVA,
and we were therefore unable to use the same statistical tests as the original study. In addition,
more contemporary methods for statistical analysis should make use of logistic regression.
A final limitation of the present study is that we replicated experiment four excluding the
verbal shadowing task to control for language interference. In their original study, Phillips and
Boroditsky (2003) included a word-shadowing task, where participants read out random letters
GENDER EFFECTS ON LANGUAGE AND THOUGHT 13
ratings to prevent participants from subvocally naming the objects. We decided against using the
word-shadowing task because of the low likelihood of it making a significant impact due to the
small amount of difference it made in the original study, as well as our limitations using a web-
based format for research. However, future research could include shadowing to give more
The findings for the original experiment indicate that language affects thought because
objects were found to be more similar if they had the same gendered label. Our replication also
supports the influence of grammatical gender, because pairs with a consistent gendered label
scored higher compared to pairs with inconsistent labels. The replication results reinforce the
Importance of Replication
Study replication is important because it provides more evidence for the particular field
and further tests previous findings to see how reliable they are. Our replication success suggests
the viability of the method by ensuring that the format works to test for grammatical gender and
can be useful for other research in language and thought. Another reason why replication studies
are important to carry out is because there are so few being administered and completed. This
“replication crisis” makes it difficult to know which results are robust, valid and ‘trustworthy’
It is important to note that this replication was an undergraduate class assignment, which
means that the data were collected in a constrained and limited timeframe, and the students were
not experts in linguistic relativity at the start of the study. However, there are numerous benefits
to using replication as a class project for undergraduates (Frank & Saxe, 2012; Grahe et al. 2012,
GENDER EFFECTS ON LANGUAGE AND THOUGHT 14
2022; Wagge et al., 2019). Replicating an original study served to support student learning in
terms of the research design and methods for studying language and thought, as well as the
conceptual knowledge required to understand research questions related to language and thought.
Regardless of our significant findings relating to grammatical gender, perceived limitations, and
what the results say about the topic of language and thought as a whole, this replication study has
been a process that included much expansion of technological knowledge and new learning
opportunities. As a psychology class working on this project together, collaboration was essential
in order to achieve success. While each class member contributed individually in a range of
ways, we came together to ensure consistency throughout the project, as well as support for each
other and the new software or skills we were learning. This replication project gave each of us
the opportunity to grow in our skills pertaining to psychology, collaboration, and overall hands-
on experience.
Conclusion
learned a new grammatical gender set and rated item pairs on similarity. We found their
similarity ratings tended to coincide with the grammatical gender matchups they had previously
learned. Therefore, pairs were rated as less similar if they came from different grammatical
gender categories. This provides additional evidence that grammatical gender can influence non-
linguistic category judgments, even for a newly acquired language. Our successful replication
also emphasizes the importance of replicating previous research to see if the methods and results
References
https://adobe.com/products/illustrator.
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models
https://doi.org/10.18637/jss.v067.i01
Earp, B. D., and Trafimow, D. (2015). Replication, falsification, and the crisis of confidence in
FindingFive Team (2019). FindingFive: A web platform for creating, running, and managing your
https://www.findingfive.com
Grahe, J. E., Cuccolo, K., Leighton, D. C., & Cramblet Alvarez, L. D. (2020). Open science
promotes diverse, just, and sustainable research and educational outcomes. Psychology
Grahe, J. E., Reifman, A., Hermann, A. D., Walker, M., Oleson, K. C., Nario-Redmond, M., &
Kay, P. & Kempton, W. (1984). What is the Sapir-Whorf hypothesis? American Anthropologist,
Kepes, S., and McDaniel, M. A. (2013). How trustworthy is the scientific literature in industrial
doi: 10.1111/iops.12045
Lucy, J. (1997). Linguistic relativity. Annual Review of Anthropology, 26: 291-312. Doi:
10.1146/annurev.anthro.26.1.291
Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Dreber, A., ... &
Phillips, W. & Boroditsky, L. (2003). Can quirks of grammar affect the way you think?
Grammatical gender and object concepts. Proceedings of the Annual Meetings of the
R Development Core Team, R. (2018). R: A language and environment for statistical computing.
http://www.rstudio.com/
Sapir, E. (1929). The status of linguistics as a science. Language, 5(4): 207-214. doi:
10.2307/409588.
GENDER EFFECTS ON LANGUAGE AND THOUGHT 17
Samuel, S., Cole, G., & Eacott, M. (2019). Grammatical gender and linguistic relativity: A
10.3758/s13423-019-01652-3.
Sedlmeier, P., Tipandjan, A., & Janchen, A. (2016). How persistent are grammatical gender
effects? The case of German and Tamil. Journal of Psycholinguistic Research, 45, 317-
Slobin, D.I. (1996). From “thought and language” to “thinking for speaking.” In J. J. Gumperz
University Press. (Reprinted in modified form from "Pragmatics," 1, 1991, pp. 7–26)
Sona Systems (n.d.). Sona Systems: Cloud-based Participant Management Software [Computer
Thierry, G. (2016). Neurolinguistic relativity: How language flexes human perception and
Wagge, J. R., Brandt, M. J., Lazarevic, L. B., Legate, N., Christopherson, C., Wiggins, B., &
work: The collaborative replications and education project. Frontiers in psychology, 10,
247.
Whorf, B. L. (1944). The relation of habitual thought and behavior to language. ETC: A Review