Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Running head: GENDER EFFECTS ON LANGUAGE AND THOUGHT

Grammatical Gender and Thought: A Partial Replication of Philips & Boroditsky (2003)

Sara Finley, Saige D. Ballard, Tina Cao, Dacia Chorman, Marin Deifel, Clay Farrer, Christney

V. Kpodo, Tahra L. Menon, Selena Sandoval, Elena J. Schmidt, Kattia Teas, Emily A. Turner,

Hannah R. VanHeyningen, Allisa Washington

Address correspondence to:


Sara Finley
Department of Psychology
Pacific Lutheran University
12180 Park Ave S
Tacoma, WA 98447
finleysr@plu.edu
GENDER EFFECTS ON LANGUAGE AND THOUGHT 2

Author Note
This paper is a (slightly) revised version of a class project for PSYC 481 (Research Seminar:

Language and Thought) Spring 2022 at Pacific Lutheran University. Dr. Sara Finley served as the

instructor of record, and lead the replication. The student authors are listed in alphabetical order,

and their contributions to the project varied, but they did the vast majority of the work for this

project. Dr. Finley accepts responsibility for any and all errors on this final (public) version of the

manuscript.

Acknowledgements:

We are grateful to Ting Qiang, Jon Grahe, and the attendees of the Spring 2022 Psychology

Research Conference.

Conflicts of Interest

The authors declare no conflict of interest to disclose. This manuscript has not been previously

published and is not under consideration in the same or substantially similar form in any other

peer-reviewed media.

Data Availability Statement

The data is publicly available through (https://osf.io/b3zw8/).

Compliance with Ethical Standards

This study was conducted in compliance with HPRB ethical standards at Pacific Lutheran

University. Consent was received from participants prior to participating in the study.
GENDER EFFECTS ON LANGUAGE AND THOUGHT 3

Abstract

Phillips and Boroditsky (2003; Experiment 3) trained adult participants on a novel “alien”

language to test for effects of grammatical gender on perceptions of object similarity. The

present study aimed to replicate their methods and provide further support for their results. Adult

participants learned the grammatical gender categorizations for 20 animate and inanimate objects

and were tested to ensure they had mastered the categories. They were then given pairs of objects

that were either both from the same gender category or included one from each gender category

and were asked to rate how similar the items were. When analyzed by item, participants rated

objects from the same grammatical gender category as more similar than those from different

categories, suggesting that language influenced participants’ perception of the objects. However,

this same effect was not present when analyzed by participant, which could be due to the number

of items greatly exceeding the number of participants. Though we partially replicated the results

from the original study with a similar sample size, future studies could replicate this experiment

with a larger and more diverse sample.


GENDER EFFECTS ON LANGUAGE AND THOUGHT 4

Gender Effects on Language and Thought: Replication of Philips and Boroditsky (2003)

There are over 7,000 different languages across the globe (ethnologue.com, 2022), each

with their own grammar systems, speakers, and cultures. Because language allows us to express

our thoughts, a classic question in the cognitive science of language is whether the language one

speaks determines one’s thinking. Sapir (1929) and Whorf (1940) proposed that language is the

primary indicator of thought categorization and overall experiences and perception. According to

Sapir (1929), a necessary aspect of linguistic relativity holds that language should be understood

as a product of social and cultural influences. Additionally, he argued that language is a system

of symbols representing thoughts and emotions, giving the example of the United States flag,

which symbolizes an undercurrent of emotions and attitudes (Sapir, 1929).

Linguistic determinism has been a topic of much debate in psychology and linguistics.

One perspective on the debate called to shift the focus from ‘whether language and thought are

intertwined or not’ to ‘in what ways they are intertwined’ (Thierry, 2016). Another classified the

research on linguistic determinism into three separate categories, hoping to gain insight through

categorization, coming to the conclusion that more research needs to encompass how language

interprets experiences and what that says about thought, rather than the elusive and

oversimplified question of whether language shapes thought (Lucy, 1997). These different

approaches speak to the allotment of interpretations and modifications creating the broad debate

on linguistic determinism.

Since this hypothesis was first proposed, many researchers have studied the underlying

significance of language differences, using various lenses and applications including numbers,

direction, color, and gender (Kay & Kempton, 1984; Samuel et al., 2019). Others have also

looked at the language thought relationship in general, such as ‘thinking for speaking’, which
GENDER EFFECTS ON LANGUAGE AND THOUGHT 5

focuses on the language that we use in our mind when we are processing incoming stimuli

(Slobin, 1996). Because much of the empirical findings on linguistic relativity has been mixed or

controversial, it is important to replicate previous findings. Replication is a fundamental part of

science; if an effect is unable be replicated or reproduced, there may be questions about the

reliability and validity of the findings. The recent ‘replication crisis’ in psychology suggests that

many published findings may not replicate to the same extent as their original findings, further

supporting the need to replicate findings (Nosek et al., 2022). The present study replicates

Experiment 3 of Philips and Borodistky’s (2003) paper on the role of grammatical gender and

linguistic relativity as part of a class project (Frank & Saxe, 2012).

Some languages such as Spanish or German make use grammatical gender to classify

noun objects. It is labeled “gendered” because there are often two or three categories that are

labeled feminine, masculine, and neuter. Some languages have four or more grammatical

genders, while others, like English, only have gender in the pronoun system (Corbett, 2012).

Many experiments have sought to determine whether the presence of grammatical gender in a

language has an influence on speakers’ thoughts and perceptions (Konishi, 1993; Phillips and

Boroditsky, 2003; Samuel et al., 2019; Sedlmeier et al., 2016; among others). For example,

Konishi (1993) considered whether the presence of grammatical gender might have an influence

on the potency of words among German and Spanish speakers. Sedlmeier et al. (2016) found an

effect of grammatical gender on perception of objects, but they noted that this effect seems to be

dependent upon a linguistic context. In most studies of linguistic relativity, researchers test

effects of native language. However, this creates the potential confound of culture and language,

since cultural effects may interact with linguistic effects. One way control for cultural effects is

to train naïve participants on a novel language with the grammatical property of interest. If
GENDER EFFECTS ON LANGUAGE AND THOUGHT 6

language-thought effects are robust and independent of culture, they will show up even in a

newly acquired language. This procedure also allows researchers to observe the relationship

between language and thought by objectively defining gendered language and removing

cognitive language biases.

Phillips and Boroditsky (2003) told participants they would be learning a new alien

language, Gumbuzi, in which there are two ways to categorize words: either “sou” or “oos”

(these were the new gender categories). Participants viewed images of items and were told which

category the words belonged to and then were tested on their knowledge of the categories. Once

they had mastered the list, they were given random pairings of all of the items and asked to rate

how similar the items were. In these ratings, some of the trials had pairs with consistent gender

(both “oos” or “sou”) and some had inconsistent pairs (one “oos” and one “sou”). Their results

revealed that participants rated items more similarly when they were from the same gender

category than when they were from different categories, which supported the effects of

grammatical gender on a cognitive task, therefore supporting linguistic relativity (Phillips &

Boroditsky, 2003). In this paper, we provide a partial replication of one of the artificial language

learning experiments in Phillips and Boroditsky (2003) (Experiment 3). By replicating this

experiment, we hope to further support the presence of grammatical gender and its influences on

perception.

Current Study

The current study aims to replicate the findings from Philips and Boroditsky (2003) to

understand the influence that grammatical linguistic gender has on perceptions, as suggested by

the Sapir-Whorfian hypothesis.


GENDER EFFECTS ON LANGUAGE AND THOUGHT 7

Hypothesis

Consistent with the results from Philips & Boroditsky (2003), we hypothesized that

participants would rate pairs of items as more similar when they both came from the same gender

category (both “oos” or “sou”) as opposed to pairs in which there was one item from each

category (one “oos” and one “sou”). This would suggest that their familiarity with the new

gender categories influenced their perception of the similarity of the items.

Methods

Participants

A total of 31 participants were recruited for the online study through social media and

direct email, as well as students enrolled in psychology courses at Pacific Lutheran University

(PLU) (using SONA, https://www.sona-systems.com/). A link to the experiment and general

descriptions were provided to all participants. There was no contact between the participants and

researchers during the study so they were all treated equally and in accordance with APA ethical

guidelines. For taking part in this study, those from the PLU student population received course

credit for their participation as applicable to their class requirements; participants from the

general public received no compensation for their participation. Three participants were excluded

via a post-completion opt-out question. Additional four participants were excluded from analysis

because they failed to meet the training criteria for learning the novel language (as described

below). Thus, a total of 24 participants were included in the final analysis.

On average, participants were 20.29 years-old (SD= 4.66), 61% identified as female, 35%

male, and 4% responded as ‘other/prefer not to say.’ 64% of participants identified as white, 21%

biracial, 7% Hispanic, 4% African-American, and 4% Asian. All participants except for 2 native

Spanish speakers identified themselves as native English speakers. In addition to speaking


GENDER EFFECTS ON LANGUAGE AND THOUGHT 8

English, 36% of participants spoke Spanish, 10% French, 10% American-Sign-Language, and

8% in other languages such as Swahili and various Asian languages. 12% of participants had no

prior experience in learning a second language while 22% considered themselves to be either

intermediate or advanced in more than one language.

Materials

Because we did not have access to the original images used in Phillips and Boroditsky

(2003), we recreated a novel stimulus set with the same objects as the original study. Twenty

original digital images were created using the Adobe Illustrator software (Version 26.2.1; Adobe

Inc., 2022). Examples of these images and a list of all items are included in Appendix A. Eight

of the images were of people (four men and four women) and the other 12 were of inanimate

objects. Inanimate objects were paired based on similar categories, such as apple and pear (which

are both fruits).

Members of each pair of inanimate objects were labeled either “oosative” or “soupative”,

as they were in the Phillips and Boroditsky study. For half of the participants, female images had

the “oosative” distinction and males had the “souptive” distinction; the distinctions were

reversed for the other half of participants. Demographic data was collected at the end to identify

race, gender, age, and participants native language, as well as if they spoke any other languages.

Procedure

While Phillips and Boroditsky (2003) study was a traditional, in-lab study, due to

COVID-19 restrictions, the present study was run using the online data collection platform

FindingFive (findingfive.com). Participants were given a description of the study and asked to

remember the pairings as they were going to be tested on them. Participants were then shown

each of the 20 images along with its grammatical distinction (i.e., sou, ballerina). Each image
GENDER EFFECTS ON LANGUAGE AND THOUGHT 9

and distinction pair were shown individually, centered on the screen, presented in a random

order. The images with their grammatical distinctions were shown three times.

After the third time, participants were shown each image without its grammatical

distinction and asked to assign its correct distinction. This was a forced choice test in which

participants had to select either “oosative” or “soupative” for each object. If the participant

answered all 20 items correctly, they were able to move on to the next phase. If the participant

answered one or more items incorrectly, they had to repeat the test again. Due to programming

constraints in FindingFive, it was not possible to have participants take the test more than twice,

so data was screened upon completion; participants scoring less than 90% on the second test

were excluded from analysis. This is different from Phillips and Boroditsky (2003), where

participants could retake the test an unlimited number of times until they received a perfect

score.

Participants were then presented with random person and inanimate object pairs for all 96

possible combinations. In half of the pairings, the assigned gender category was the consistent

with their training (e.g., both “oos” or both “sou”). The other half of the pairings showed were

opposite from their training (e.g., one object was “oos” and the other was “sou”). Participants

were asked to rate the similarity of the pairs from 1 (not similar) to 9 (very similar) using a

sliding scale. The order of pairs was randomized for each participant. Pairs would stay on screen

until participants made their selection. After completing the similarity ratings, participants

completed the demographic information and then received a debriefing page. Participants were

also asked whether they wanted their data to be included in the analysis (as a way to allow

participants to opt out, as described above).


GENDER EFFECTS ON LANGUAGE AND THOUGHT 10

Results

Phillips and Boroditsky (2003) used an ANOVA to analyze their data. However, the

specific variables used in their ANOVA were not clearly specified, as it appeared that they

simply compared inconsistent to consistent items, with the conditions (Oos-Fem and Oos-Masc)

collapsed. Thus, it made sense to use two paired-samples t-tests, one averaging across items, and

the other averaging across trials. When analyzed by each individual item, there was a significant

difference between the consistent pairs (M = 3.56, SD = 0.69) and the inconsistent pairs (M =

3.20, SD = 0.68), t(95) = 4.9, p < .001 (See Table 1). However, the by-participant analysis was

not quite significant, t(23) = 1.81, p = .083.

Table 1

Mean comparisons for consistent vs inconsistent pairings, by item and participant

Consistent Inconsistent

M SD M SD t P

By item 3.56 0.69 3.20 0.68 4.90 <.001

By participant 3.60 1.58 3.22 1.63 1.81 .083

While the results for this study trended in the direction as Philips and Boroditsky (2003),

only the by-items t-test was significantly significant, thus showing a partial replication of the

original study. While more contemporary statistical techniques would use a logistic regression as

opposed to an ANOVA or t-test (thus, allowing for random intercepts for subjects and items, and
GENDER EFFECTS ON LANGUAGE AND THOUGHT 11

training condition), we opted for tests that more closely matched the tests used in the original

study1.

Discussion

We taught participants an artificial language with a grammatical gender distinction in

order to test whether learning a new language with grammatical gender influences perception of

similarity based on grammatical gender. Participants viewed 20 different images, each assigned

either a “oosative” or “soupative” label. The labels were associated with one of two grammatical

gender categories: feminine or masculine. Once participants fully learned the “oos / sou”

categories, they were given pairs of items and asked to give them a similarity rating. If

participants view items that share the same grammatical gender in the newly learned language, it

suggests that learners make use of grammatical gender in determining similarity. Our results

partially replicated the Phillips and Boroditsky (2003) original. Participants tended to rate items

as more similar when the genders were the same as the Gumbozi language they were trained on,

but only when comparing across items. This could be due to the fact that there were many more

items compared to participants, meaning the statistical power of the items test would be greater

than that of the participants.

Limitations

While the current study provides a replication Philips and Boroditsky (2003), there are

some limitations to address. First, it is important to note that the present study was not an exact

1
A generalized mixed effects model was run using the glmer function in the lme4 package (Bates et al., 2015) in R
(R Development Core Team, 2018) using RStudio (RStudio Team, 2020).
(lmer(response_value ~ 1+Consistent + (1+Consistent|stimuli_presented) + (1|participant_id) + (1|group_id), data
=Gumbozi_results, REML=FALSE). This model showed a significant difference between Inconsistent and
Consistent items, b = 0.39, SE = 0.068, t = 5.57, p < 0.001.
GENDER EFFECTS ON LANGUAGE AND THOUGHT 12

replication of Philips and Boroditsky (2003). Our study was administered online, while the

original study was conducted in a traditional laboratory setting. Using web-based testing leaves

room for potential limitations such as noisy conditions and less control over the experiment (such

as allowing participants to repeat the test until they achieved a perfect score on the test). In

addition, the items (artwork) were created by one of the authors, and differed from the original

study (though the semantic content of the items was the same). It is possible that differences in

the images could have led to differences in the similarity ratings. However, the fact that we

found our data generally replicated the original findings to be significant in these uncontrolled

conditions which greatly contrast those in the study we replicated (Phillips and Boroditsky,

2003) points to the robustness of this effect.

An additional limitation to both the present study and Philips and Boroditsky (2003) was

that our participant pool largely consisted of native English-speaking college students. While this

was the participant pool that was readily available to us (and the one used in the original study),

it was limited in diversity and may limit the potential generalizability of the findings. While our

sample size similar to Phillips and Boroditsky (2003)’s 22 participants, the overall sample size is

relatively small, particularly for making strong, generalizable claims related to language and

thought. Another limitation of both the present study and Philips and Boroditsky (2003) is the

statistical analysis. Philips and Borodisty (2003) provided minimal details about their ANOVA,

and we were therefore unable to use the same statistical tests as the original study. In addition,

more contemporary methods for statistical analysis should make use of logistic regression.

A final limitation of the present study is that we replicated experiment four excluding the

verbal shadowing task to control for language interference. In their original study, Phillips and

Boroditsky (2003) included a word-shadowing task, where participants read out random letters
GENDER EFFECTS ON LANGUAGE AND THOUGHT 13

while simultaneously completing similarity ratings in addition to completing the similarity

ratings to prevent participants from subvocally naming the objects. We decided against using the

word-shadowing task because of the low likelihood of it making a significant impact due to the

small amount of difference it made in the original study, as well as our limitations using a web-

based format for research. However, future research could include shadowing to give more

insight into the processes of language and thought.

The findings for the original experiment indicate that language affects thought because

objects were found to be more similar if they had the same gendered label. Our replication also

supports the influence of grammatical gender, because pairs with a consistent gendered label

scored higher compared to pairs with inconsistent labels. The replication results reinforce the

conclusions of the original study that language affects thought.

Importance of Replication

Study replication is important because it provides more evidence for the particular field

and further tests previous findings to see how reliable they are. Our replication success suggests

the viability of the method by ensuring that the format works to test for grammatical gender and

can be useful for other research in language and thought. Another reason why replication studies

are important to carry out is because there are so few being administered and completed. This

“replication crisis” makes it difficult to know which results are robust, valid and ‘trustworthy’

(Kepes & McDaniel, 2013; Earp & Trafimow, 2014).

It is important to note that this replication was an undergraduate class assignment, which

means that the data were collected in a constrained and limited timeframe, and the students were

not experts in linguistic relativity at the start of the study. However, there are numerous benefits

to using replication as a class project for undergraduates (Frank & Saxe, 2012; Grahe et al. 2012,
GENDER EFFECTS ON LANGUAGE AND THOUGHT 14

2022; Wagge et al., 2019). Replicating an original study served to support student learning in

terms of the research design and methods for studying language and thought, as well as the

conceptual knowledge required to understand research questions related to language and thought.

Regardless of our significant findings relating to grammatical gender, perceived limitations, and

what the results say about the topic of language and thought as a whole, this replication study has

been a process that included much expansion of technological knowledge and new learning

opportunities. As a psychology class working on this project together, collaboration was essential

in order to achieve success. While each class member contributed individually in a range of

ways, we came together to ensure consistency throughout the project, as well as support for each

other and the new software or skills we were learning. This replication project gave each of us

the opportunity to grow in our skills pertaining to psychology, collaboration, and overall hands-

on experience.

Conclusion

We replicated Experiment 3 of Phillips and Boroditsky (2003) in which participants

learned a new grammatical gender set and rated item pairs on similarity. We found their

similarity ratings tended to coincide with the grammatical gender matchups they had previously

learned. Therefore, pairs were rated as less similar if they came from different grammatical

gender categories. This provides additional evidence that grammatical gender can influence non-

linguistic category judgments, even for a newly acquired language. Our successful replication

also emphasizes the importance of replicating previous research to see if the methods and results

continue persist years after the original data were collected.


GENDER EFFECTS ON LANGUAGE AND THOUGHT 15

References

Adobe Inc. (2022). Adobe Illustrator [Computer Software]. Retrieved from

https://adobe.com/products/illustrator.

Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models

using {lme4}. Journal of Statistical Software, 67, 1–48.

https://doi.org/10.18637/jss.v067.i01

Corbett, G. (2012). Gender. Cambridge University Press.

Earp, B. D., and Trafimow, D. (2015). Replication, falsification, and the crisis of confidence in

social psychology. Frontiers in Psychology, 6:621. doi: 10.3389/fpsyg.2015.00621

ethnologue.com (2022). Website. https://www.ethnologue.com/guides/how-many-languages#

Retrieved June 30, 2022.

FindingFive Team (2019). FindingFive: A web platform for creating, running, and managing your

studies in one place. FindingFive Corporation (nonprofit), NJ, USA.

https://www.findingfive.com

Frank, M. C., & Saxe, R. (2012). Teaching replication. Perspectives on Psychological

Science, 7(6), 600-604.

Grahe, J. E., Cuccolo, K., Leighton, D. C., & Cramblet Alvarez, L. D. (2020). Open science

promotes diverse, just, and sustainable research and educational outcomes. Psychology

Learning & Teaching, 19(1), 5-20.

Grahe, J. E., Reifman, A., Hermann, A. D., Walker, M., Oleson, K. C., Nario-Redmond, M., &

Wiebe, R. P. (2012). Harnessing the undiscovered resource of student research

projects. Perspectives on Psychological Science, 7(6), 605-607.


GENDER EFFECTS ON LANGUAGE AND THOUGHT 16

Kay, P. & Kempton, W. (1984). What is the Sapir-Whorf hypothesis? American Anthropologist,

86: 65-79. doi: 10.1525/aa.1984.86.1.02a00050

Kepes, S., and McDaniel, M. A. (2013). How trustworthy is the scientific literature in industrial

and organizational psychology? Industrial and Organizational Psychology, 6: 252–268.

doi: 10.1111/iops.12045

Konishi, T. (1993). The semantics of grammatical gender: A cross-cultural study. Journal of

Psycholinguistic Research, 22: 519-534. doi: 10.1007/BF01068252

Lucy, J. (1997). Linguistic relativity. Annual Review of Anthropology, 26: 291-312. Doi:

10.1146/annurev.anthro.26.1.291

Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Dreber, A., ... &

Vazire, S. (2022). Replicability, robustness, and reproducibility in psychological

science. Annual Review of Psychology, 73(1), 719-748.

Phillips, W. & Boroditsky, L. (2003). Can quirks of grammar affect the way you think?

Grammatical gender and object concepts. Proceedings of the Annual Meetings of the

Cognitive Science Society, 25(25): 928-933.

R Development Core Team, R. (2018). R: A language and environment for statistical computing.

In R. D. C. Team (Ed.), R Foundation for Statistical Computing (Vol. 1, Issue 2.11.1, p.

409). R Foundation for Statistical Computing. https://doi.org/10.1007/978-3-540-74686-7

RStudio Team. (2020). RStudio: Integrated Development for R. In 2020.

http://www.rstudio.com/

Sapir, E. (1929). The status of linguistics as a science. Language, 5(4): 207-214. doi:

10.2307/409588.
GENDER EFFECTS ON LANGUAGE AND THOUGHT 17

Samuel, S., Cole, G., & Eacott, M. (2019). Grammatical gender and linguistic relativity: A

systematic review. Psychonomic Bulletin & Review, 26: 1767-1786. doi:

10.3758/s13423-019-01652-3.

Sedlmeier, P., Tipandjan, A., & Janchen, A. (2016). How persistent are grammatical gender

effects? The case of German and Tamil. Journal of Psycholinguistic Research, 45, 317-

336. doi: 10.1007/s10936-015-9350-x

Slobin, D.I. (1996). From “thought and language” to “thinking for speaking.” In J. J. Gumperz

& S. C. Levinson (Eds.), Rethinking linguistic relativity (pp. 70–96). Cambridge

University Press. (Reprinted in modified form from "Pragmatics," 1, 1991, pp. 7–26)

Sona Systems (n.d.). Sona Systems: Cloud-based Participant Management Software [Computer

software]. Sona Systems, Ltd. https://www.sona-systems.com/

Thierry, G. (2016). Neurolinguistic relativity: How language flexes human perception and

cognition. Language Learning, 66(3), 690–713. doi: 10.1111/lang.12186.

Wagge, J. R., Brandt, M. J., Lazarevic, L. B., Legate, N., Christopherson, C., Wiggins, B., &

Grahe, J. E. (2019). Publishing research with undergraduate students via replication

work: The collaborative replications and education project. Frontiers in psychology, 10,

247.

Whorf, B. L. (1944). The relation of habitual thought and behavior to language. ETC: A Review

of General Semantics, 1(4), 197-215. doi: 10.1007/978-1-349-25582-5_35.


GENDER EFFECTS ON LANGUAGE AND THOUGHT 18

Appendix A: Stimuli (note, sizes are adjusted to fit in the document).


GENDER EFFECTS ON LANGUAGE AND THOUGHT 19

You might also like