Professional Documents
Culture Documents
CaplanAdamsBoyd2020 PersonalityandLanguage
CaplanAdamsBoyd2020 PersonalityandLanguage
CaplanAdamsBoyd2020 PersonalityandLanguage
net/publication/315671233
CITATION READS
1 11,998
3 authors, including:
All content following this page was uploaded by Ryan L. Boyd on 09 November 2020.
Language, broadly defined, includes both the words people use and the ways in which they
are used. Research on language and personality has found that various patterns in a person’s
language can reveal a great deal about their underlying psychological composition. In other
words, each person’s unique language patterns contain embedded clues about their stable
individual characteristics. Psychological measurements of language tend to exhibit robust,
trait-like properties that have been widely used as a mode of exploring personality and
other psychological constructs.
The increasing use of quantified language in personality research is principally due to
the consistent relationships established between the two domains. By analyzing patterns in
language samples (e.g. essays, speeches, interviews, recorded conversations, etc.), research-
ers are able to establish an individual’s linguistic profile, a cluster of unique language
patterns that is predictive of psychological composition. This linguistic profile can be used
to understand and assess personality by conducting research on linguistic tendencies that
correlate with, or are indicative of, personality across various research designs. For exam-
ple, language patterns can vary as a function of depression, social status manipulation
paradigms, or self-reports of personality. Relatedly, language-based measures are often
used to predict and understand personality-relevant variables of interest, such as
demographics (e.g. sex, age, level of education), behaviors (e.g. cigarette smoking, aggres-
sion), and other individual differences (e.g. decision-making patterns, trait affect, motiva-
tional processes).
Research has found that language is a valid and reliable way to explore personality, indi-
vidual differences, and lower-level processes that drive such constructs. Several empirical
studies have demonstrated that the various ways in which people express themselves
through language are stable and consistent within an individual across time and context.
The Wiley Encyclopedia of Personality and Individual Differences: Models and Theories, Volume I, First Edition.
Edited by Bernardo J. Carducci and Christopher S. Nave.
© 2020 John Wiley & Sons Ltd. Published 2020 by John Wiley & Sons Ltd.
The most common way that personality research is conducted using language measures is
by establishing statistical relationships between quantified measures of language use and
other measures of personality. Studies in this area often rely upon traditional measures of
personality (e.g. self-report measures, behavioral outcome measures) and have found that
various categories of language show consistent relationships with such personality meas-
ures, suggesting that language can serve as an unique mode of analyzing personality
(sometimes referred to as Language-Based Assessments, LBA, or L-data). Furthermore,
research has also shown that language-based measures of personality are able to capture
personality characteristics that are not easily captured by self-report measures due to fac-
tors such as self-report biases and accessibility. This is partly due to the fact that language
production involves several automatic, low-level generative processes into which humans
lack accurate self-insights.
The ways in which language can be measured/quantified for use in personality research
are diverse, ranging from the rate at which a person uses purely syntactic categories of
speech to words with inherent semantic value, average word length, and frequency of verb
tenses, among other approaches. The comprehensive analysis of language includes quanti-
fying both a person’s language style in addition to their language content. Language content
represents the “what” of a language and primarily consists of semantically-laden words
(e.g. emotion words, social words). Language style primarily consists of function (i.e. syn-
tactical) words (e.g. prepositions, conjunctions). Language content typically include words
classified as nouns, regular verbs, and most adjectives and adverbs – words that generally
have meaning even without context, such as “happiness” or “family.” Function words pri-
marily include pronouns, prepositions, articles, conjunctions, and auxiliary verbs, among
other classes of linguistic particles – words that do not possess inherent meaning without
context, such as “of” and “the.”
Function and content words often show different personality correlates. Content words
most commonly exhibit statistical relationships to explicitly-accessible self-information,
such as sociability and trait affect, whereas function words are typically predictive of
lower-level personality processes, including things like automatic cognitive and
attentional processes.
Qualitative Methods
Language analysis in psychology dates back to the early beginnings of modern psychol-
ogy. The earliest research methods of psychological language analysis were qualitative
methods. Early work on language and personality principally consisted of case studies
that emphasized the deep interpretation and discovery of hidden meanings in a person’s
language. Such methods include classical projective tests developed by Rorschach and
others to discover people’s thoughts, intentions, and motives from their verbalized inter-
pretations of abstract images such as inkblots, as well as Murray’s Thematic Apperception
Test (TAT), which was designed to reveal information about a person’s implicit
motivational processes.
Many approaches to language analysis research require multiple human raters to manually
code texts. These approaches typically employ human judges that rely on standardized
manuals in order to identify and quantify prevalent themes and patterns within a sample
of participant language. These approaches use language to explore personality characteris-
tics by coding latent constructs such as motivational processes, intimacy, explanatory style,
gregariousness, and decision-making, among others. Manual-based coding techniques are
also sometimes applied to aid in the diagnosis of clinical phenomena, such as personality
disorders.
Based on formalized coding systems, manual analyses can be used to explore whether a
particular theme is indicative of certain personality constructs. For example, some findings
show that manually-coded themes about “status” suggest a striving for power, and other
studies show that people’s explanatory styles (the ways people explain the cause of an
event) are linked with optimism, depression, and habitual health behaviors. The primary
drawback of manual text coding is that studies can suffer from issues of inter-rater reliabil-
ity and a reliance on subjective judgment calls in cases that are not adequately covered by
coding manuals. Nevertheless, the manual-based coding of language can be particularly
useful for small samples and difficult-to-assess constructs that require advanced
knowledge to recognize. Manual methods continue to be a mainstream mode of exploring
language within psychology, and such methods allow researchers to closely understand
the language samples in their studies.
With the technological advances of recent decades, automated text analysis methods have
begun to replace traditional ones with increasing frequency. Automated text analysis
gained popularity primarily as a result of the expansion of the internet and a sudden avail-
ability of language samples that were too large to code manually. Much like manual coding,
automated text analysis methods quantify a person’s language for the purpose of better
understanding underlying psychological patterns. Most automated text analysis systems
use word counting approaches, where sets of words are clustered into construct-specific
dictionaries (e.g. affective words, morality words, risk-sensitivity words). Texts are then
scanned for words in these dictionaries, and categories of words are then scored for their
relative frequency (e.g. 10% cognitive words, 4% positive emotion words). The creation of
dictionaries that exhibit acceptable psychometric properties is a difficult task, and this has
been cited as the major drawback of dictionary-based text analysis approaches.
Automated theme extraction, or topic modeling, refers to a family of methods that use a
bottom-up approach to discover what topics or themes exist in a collection of documents.
These methods typically aim to determine the inherent themes of the texts and can be used
to categorize language samples. Most topic modeling approaches consist of some variation
of generating statistical models designed to capture concept co-occurrence (e.g. “beach”
and “sand” may commonly co-occur, as might “lawyer” and “divorce”). These methods
most often express themes as a latent construct that can be mathematically approximated
from the words that people use. Other latent model methods, such as Latent Semantic
Analysis (LSA), are conceptually related to topic modeling methods and are typically used
to extract meaning from texts by modeling the proximity of word use in addition to the
similarity of various words and phrases. Methods such as topic modeling and LSA are con-
sidered “bottom-up” approaches to automated text analysis as they are data-driven, rather
than the “top-down,” dictionary-based methods that rely on premade dictionaries. Whereas
top-down methods of language analysis are limited in scope but typically theory-driven,
bottom-up methodologies are more broadly applicable but largely exploratory.
The objectivity and reliability of computerized language analysis corrects for several
issues inherent to qualitative and manual coding methods, however, other drawbacks exist
that are better addressed via non-automated coding. A primary challenge in automated text
analysis lies in the difficulty of automatically modeling language that is sensitive to context
or can take on alternative meanings. Sorting words into clusters can be difficult due to
issues of homography, wherein many words are spelled identically but have multiple mean-
ings and different connotations depending on when, where, how, by whom, and toward
whom they are used. For example, the common word like may express evaluation or
approval (“I like that song”), similarity (“This car is like that one”), or be used to serve as a
filler word (“I am, like, really happy”). Computerized dictionary-based approaches are gen-
erally unable to detect subtle aspects of language such as irony or metaphors, which are
potentially important psychological constructs for personality research.
While most personality research involving automatic language analysis quantifies the
relative frequency of various predefined words/thematic categories, methods hailing from
other traditions, such as information sciences or computational linguistics, have been
successfully applied to personality research as well.
Several studies have established stable associations between personality traits and the types
of words people use, allowing researchers to accurately estimate/predict a person’s distinct
personality traits by analyzing their language. For example, as hypothesized, individuals
who score high in self-reported extroversion more frequently use words indicative of social
processes (e.g. friend, talk, husband). Self-reported openness is usually marked by an
e levated use of articles and words related to insight (think, know, consider), and conscien-
tiousness is indicated by a person’s infrequent use of negations and negative emotion words.
Importantly, more atomic facets of personality, or personality processes, are often more
directly encoded in a person’s language than broader personality constructs. Personality
processes are the lower-level facets of behavior, cognition, and affect that contribute to the
broader, overall personality constructs. These processes include any psychological mecha-
nism that is stable over time and cohere to form, or contribute to, personality. For example,
many low-level social processes are indicative of broader personality traits. A tendency to
mitigate social conflict contributes to the broader trait of agreeableness, and elevated posi-
tive affect during socialization is a process that contributes to extroversion. Relatively low
use of anger words during conflict may be predictive of an agreeable peace-keeper, and an
increased use of positive emotion words during socialization can mark an extrovert’s
affective processes in trait-congruent contexts.
Relatedly, cognitive processes that contribute to personality show language embeddings
as well, typically in the form of function words. Frequent use of first-person singular pro-
nouns (e.g. “I,” “me,” “my”) is reflective of self-reflective attentional processes that are
markers of neuroticism and depression. Thinking styles (e.g. slow and deliberate versus
fast and non-conscious) are reflected through function word use as well, and such cognitive
processes can be indicative of personality variables as diverse as conservatism, attributional
tendencies, and self-schemata.
See Also
Further Reading
Boyd, R. L. (2017). Psychological text analysis in the digital humanities. In S. Hai-Jew (Ed.), Data
analytics in digital humanities (pp. 161–189). Cham: Springer. https://doi.org/10.1007/
978-3-319-54499-1_7
Boyd, R. L., & Pennebaker, J. W. (2015). Did Shakespeare write Double Falsehood? Identifying an
individual’s psychological signature with text analysis. Psychological Science, 26(5), 570–582.
Boyd, R. L., & Pennebaker, J. W. (2015). A way with words: Using language for psychological
science in the modern era. In C. Dimofte, C. Haugtvedt, & R. Yalch (Eds.), Consumer
psychology in a social media world (pp. 222–236). New York: Routledge.
Boyd, R. L., & Pennebaker, J. W. (2017). Language-based personality: A new approach to
personality in a digital world. Current Opinion in Behavioral Sciences, 18, 63–68.
https://doi.org/10.1016/j.cobeha.2017.07.017
Golbeck, J., Robles, C., & Turner, K. (2011, May). Predicting personality with social media.
In Proceedings of the 2011 ACM CHI Conference on Human Factors in Computing Systems
(pp. 253–262). ACM.
Lanning, K., Pauletti, R. E., King, L. A., & McAdams, D. P. (2018). Personality development through
natural language. Nature Human Behaviour, 1. https://doi.org/10.1038/s41562-018-0329-0
Mehl, M. R. (2006). Quantitative text analysis. In M. Eid, & E. Diener (Eds.), Handbook of
multimethod measurement in psychology (141–156). Washington, DC: American
Psychological Association.
Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Kosinski, M., Stillwell, D. J., Ungar, L. H.,
& Seligman, M. E. P. (2015). Automatic personality assessment through social media language.
Journal of Personality and Social Psychology, 108(6), 934–952.
Pennebaker, J. W., & King, L. A. (1999). Linguistic styles: Language use as an individual
difference. Journal of Personality and Social Psychology, 77(6), 1296–1312.
Yarkoni, T. (2010). Personality in 100,000 words: A large-scale analysis of personality and word
use among bloggers. Journal of Research in Personality, 44(3), 363–373.
0004655592.indd
View publication 316
stats 1-8-20 11.05.16 PM