Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

ChatGPT Can Accurately Predict Public Figures’ Perceived Personalities Without Any

Training

Keywords: personality perception, zero-shot prediction, GPT-4, large language models

Abstract
We show that ChatGPT can predict public figures’ perceived personalities without being
provided with any training data or feedback on its performance. ChatGPT and 600 human raters
evaluated 300 public figures’ personalities using the Ten Item Personality Inventory. The
correlation between ChatGPT and humans’ ratings ranged from r=.81 to .96, outperforming the
models specifically trained to make such predictions. We discuss the implications of these
findings for both personality psychology and AI research, underscoring the increasing ability of
LLMs to discern latent psychological traits, such as perceived personality, without any additional
training.

Main text
The analysis of big samples of language data using machine learning and artificial intelligence
algorithms plays an increasingly important role in social science research. Particularly prominent
is their use to assess and predict psychological constructs, such as personality (1, 2) and views
and attitudes (3, 4). In a typical approach, researchers collect large quantities of labeled data to
train models predicting a psychological construct (e.g., personality) from some input data (e.g.,
Tweets or essays). While these supervised-learning approaches offer many advantages, they are
not without shortcomings. Collecting labeled data, typically from hundreds or thousands of
human participants, is a costly and time-consuming process. Furthermore, given language data’s
high dimensionality (typically thousands of variables), such models are prone to overfitting (5).
While cross-validation, regularization, and other techniques can mitigate this issue, it cannot be
entirely eliminated (5).

The advent of generative Large Language Models (LLMs) heralds a paradigm shift. Modern
LLMs like GPT can engage in few-shot or zero-shot learnings, predicting or classifying instances
without explicit training, relying instead only on the foundational knowledge encoded during
their initial training (6). The capacity of an LLM to make zero-shot predictions of psychological
constructs would signify a substantial advancement in psychometric assessment by eliminating
the need for extensive training data. Also, it would demonstrate that the model can interpret the
complex, latent dimensions of human personality and reflect them accurately, a task that hitherto
necessitated human cognition and judgment. Our study explores this possibility by employing
ChatGPT to make zero-shot predictions of personality perceptions of public figures.

People are consistently judged by others, and—increasingly—by algorithms (1). Such


perceptions shape their social standing, health, wealth, academic and occupational achievements,
and many other significant outcomes (7). Particularly consequential are perceptions of public
figures' personalities. People's perceptions of politicians’ personalities influence elections (8, 9)
and geopolitics (10). Their perceptions of CEOs’ personalities affect the reputation, valuation,
and performance of the companies they lead (11, 12). Perceived personalities of celebrities
endorsing brands affect consumers’ attitudes and purchase intentions (13). Musicians’ perceived
personalities affect their music’s popularity (14). It is thus unsurprising that much effort goes
into studying and shaping such perceptions (15, 16).

To understand and shape public figures’ perceived personalities, one first needs to measure them.
The conventional approach, surveying experts or the public, tends to be slow and expensive (8,
12). Thus, there has been a growing interest in extracting perceptions from opinions and
comments expressed online, in blogs, tweets, Wikipedia entries, press articles, or e-books.
Initially, this has been mainly achieved by collecting data in a targeted fashion, such as recording
Tweets mentioning a given public figure (3). The recent emergence of LLMs opened new
opportunities (17) . Relevant comments and opinions increasingly find their way into the vast
text corpora used in LLMs’ training. Consequently, people’s perceptions are reflected in and can
be extracted from LLMs’ semantic spaces (18). For example, Cao and Kosinski (19) predicted
perceptions of public figures’ personalities from their names’ location in the model’s semantic
space with an accuracy of about r = .75. Bhatia and colleagues used a similar approach to predict
perceived leadership skills (20). Yet, the development of these models still necessitates extensive
training data.

Here, we introduce an alternative approach, eliminating the need for labeled training data. We
focus on Big Five personality traits (openness, conscientiousness, extraversion, agreeableness,
and emotional stability) which were shown to capture much of the variance in individual
differences and reliably predict a broad range of real-life outcomes (21). Our sample included the
most popular 300 of 11,341 public figures listed in the Pantheon 1.0 dataset (22). Their
popularity was approximated from their Wikipedia page views between 2008 and 2013. As
artists tended to be more popular, we capped their number at 100 to include figures from other
categories: business and law, exploration, humanities, institutions, science and technology,
sports, and others. Public figures born before 1900 were excluded, as the raters may have been
less familiar with them.

ChatGPT-3.5 and ChatGPT-4 rated public figures’ perceived personalities using the Ten-Item
Personality Inventory (TIPI)(23). The following prompt was repeated for each TIPI question (in
italics) and each public figure (underscored):
“Here is a characteristic that may or may not apply to Donald Trump. Please indicate the
extent to which most people would agree or disagree with the following statement: I see
Donald Trump as extraverted, enthusiastic.
1 for Disagree strongly, 2 for Disagree moderately, 3 for Disagree a little, 4 for Neither
agree nor disagree, 5 for Agree a little, 6 for Disagree moderately, 7 for Agree strongly.
Answer with a single number.”
ChatGPT’s responses to 10 TIPI items were aggregated to compute five perceived personality
scores for each public figure. The model was reset after each question. To minimize the variance
in their responses and maximize our findings’ replicability, the temperature parameter of
ChatGPT was set to 0.

ChatGPT was not provided with any training data or feedback on its performance. To evaluate
the accuracy of its perceptions, they were correlated with those of 600 raters recruited on
Prolific.com. Each rater evaluated a random subset of 10 public figures using TIPI (23). Raters
could skip targets they were unfamiliar with. Public figures were rated by 18.89 raters on
average (SD=10.38). To ensure the reliability of our measure, 75 public figures rated by fewer
than 10 raters were removed. Human ratings’ split-half reliability was estimated by randomly
dividing raters into two equal-sized groups and correlating their respective averaged personality
ratings. To minimize the role of chance, this procedure was repeated 1,000 times, and the results
were averaged using Fisher’s z transformation. The resulting reliabilities equaled .81,.80,.81,.88,
and .85 for openness, conscientiousness, extraversion, agreeableness, and emotional stability,
respectively.

Like all measures, human ratings include some error, as expressed by their high but imperfect
reliability. As we are interested in the accuracy of ChatGPT when predicting actual perceived
personalities, rather than imperfect approximations of perceived personalities, correlations were
divided by the square root of a given scale’s reliability (i.e., correction for attenuation; (24)). For
clarity, we also report the raw, uncorrected values.

The results displayed in Figure 1 shows that both ChatGPT-4 (red bars) and ChatGPT-3.5 (green
bars) could accurately predict public figures’ perceived personalities, despite not being provided
with training data or examples. The more sophisticated and more recent of the models, ChatGPT-
4, was somewhat more accurate, considerably outperforming ChatGPT-3.5 across all traits. All
perceived personality traits were highly predictable, with ChatGPT -4’s accuracy ranging from r
= .81 (Openness) to r=.96 (Conscientiousness). The accuracies observed here were comparable
to those achieved by models trained to predict perceptions from text data or LLMs’ semantic
spaces (blue bars) (19). This is remarkable given that those models were specifically trained to
maximize their correlations with human ratings, while ChatGPT’s predictions were made
without any training.
Figure 1. The accuracy of ChatGPT-4, ChatGPT-3.5, and more conventional embeddings-based
regression models (19), when predicting perceived personalities of public figures. ChatGPT were
not given any training data or feedback on their performance. Confidence intervals equal 95%.
Values in parentheses represent raw accuracy (uncorrected for attenuation). All correlations are
significant at the p<.001 level.

Our findings show that simply prompting ChatGPT to rate public figures’ personalities is
equivalent to surveying hundreds of online respondents and averaging their ratings. To achieve
this, a model must not only contain information on people’s perceptions, but also comprehend
the task (i.e., what it means to rate someone’s personality) and, more crucially, have some form
of representation of the underlying latent personality dimensions.

Importantly, we’re not asserting that this form of representation parallels human mental models
of personality or that AI models exhibit human-like consciousness. Although the perception of
personality is a latent psychological construct, GPT -4’s training data incorporates public
opinions about the subjects in question, rendering the task of predicting public figures’
personality perceptions partially an exercise in information retrieval.

Nevertheless, we should note that these perceptions are measured using a Likert scale, a format
not commonly used in everyday discussions about others. To generate nearly impeccable
predictions in psychometric terms, the model must not only gather a sizable amount of pertinent
data but also aptly weigh and amalgamate it into a singular numerical value. This capacity
transcends the purposes of the model’s original training, signifying an advanced data
interpretation and synthesis level.

Further research could focus on prompt engineering to further improve LLMs’ accuracy, expand
the range of targets and psychological traits, investigate the ethical implications of using LLM’s
emergent predictive capabilities (25), and the potential applications of LLM-generated
personality profiles in fields such as recruitment, politics, marketing, or entertainment.

Our results have consequences not only for personality research but also for LLM development
and human-computer interaction. Predicting people’s responses to a personality questionnaire is
just one of many potential benefits of LLMs being able to model personality. LLMs that can
recognize, emulate, and manifest different personality types could significantly improve their
user experience. For instance, adjusting chatbots’ responses to the personality traits or
preferences of their users could lead to a more personalized, engaging, and human-like
experience (26). LLM’s output, such as stories, would likely be more attractive if it portrayed
characters with distinct, consistent, and believable personalities.

References

1. W. Youyou, M. Kosinski, D. Stillwell, Computer-based personality judgments are more accurate


than those made by humans. Proc Natl Acad Sci U S A. 112, 1036–1040 (2015).
2. M. Kosinski, D. Stillwell, T. Graepel, Private traits and attributes are predictable from digital
records of human behavior. Proc Natl Acad Sci U S A. 110, 5802–5805 (2013).
3. A. Tumasjan, T. O. Sprenger, P. G. Sandner, I. M. Welpe, "Predicting elections with Twitter:
What 140 characters reveal about political sentiment" in ICWSM 2010 - Proceedings of the 4th
International AAAI Conference on Weblogs and Social Media (2010;
https://ojs.aaai.org/index.php/ICWSM/article/view/14009), pp. 178–185.
4. B. O’Connor, R. Balasubramanyan, B. R. Routledge, N. A. Smith, "From tweets to polls:
Linking text sentiment to public opinion time series" in ICWSM 2010 - Proceedings of the 4th
International AAAI Conference on Weblogs and Social Media (2010;
https://ojs.aaai.org/index.php/ICWSM/article/view/14031), pp. 122–129.
5. J. Hastie, Trevor, Tibshirani, Robert, Friedman, The Elements of Statistical Learning: Data
Mining, Inference, and Prediction (2009).
6. T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P.
Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child,
A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray,
B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei, "Language
models are few-shot learners" in Advances in Neural Information Processing Systems (2020).
7. E. Goffman, The presentation of self in everyday life (2021).
8. J. D. Klingler, G. E. Hollibaugh, A. J. Ramey, What I like about you: legislator personality and
legislator approval. Polit Behav. 41, 499–525 (2019).
9. A. Bittner, Platform or personality?: the role of party leaders in elections (2011).
10. D. Kellner, Celebrity diplomacy, spectacle and barack obama. Celebr Stud. 1, 121–123 (2010).
11. J. S. Harrison, G. R. Thurgood, S. Boivie, M. D. Pfarrer, Perception is reality: How CEOs’
observed personality influences market perceptions of firm risk and shareholder returns.
Academy of Management Journal. 63, 1166–1195 (2020).
12. C. A. O’Reilly, D. F. Caldwell, J. A. Chatman, B. Doerr, The promise and problems of
organizational culture: CEO personality, culture, and firm performance. Group Organ Manag.
39, 595–625 (2014).
13. D. Pradhan, I. Duraipandian, D. Sethi, Celebrity endorsement: How celebrity–brand–user
personality congruence affects brand attitude and purchase intention. Journal of Marketing
Communications. 22, 456–473 (2016).
14. D. M. Greenberg, S. C. Matz, H. A. Schwartz, K. R. Fricke, The self-congruity effect of music. J
Pers Soc Psychol. 121, 137–150 (2021).
15. K. McGraw, "Political impressions: Formation and management." in Oxford handbook of
political psychology, D. O. Sears, L. Huddy, R. Jervis, Eds. (Oxford University Press, 2003), pp.
394–432.
16. C. C. Chen, J. R. Meindl, The construction of leadership Images in the popular press: The case of
Donald Burr and People Express. Adm Sci Q. 36, 521 (1991).
17. T. Mikolov, K. Chen, G. Corrado, J. Dean, "Efficient estimation of word representations in
vector space" in 1st International Conference on Learning Representations, ICLR 2013 -
Workshop Track Proceedings (2013).
18. R. Richie, W. Zou, S. Bhatia, Predicting high-level human judgment across diverse behavioral
domains. Collabra Psychol. 5, 1–12 (2019).
19. X. Cao, M. Kosinski, “Large language models know how the personality of public figures is
perceived by the general public” (PsyArXiv, 2023)
20. S. Bhatia, C. Y. Olivola, N. Bhatia, A. Ameen, Predicting leadership perception with large-scale
natural language data. Leadership Quarterly (2021),
21. D. J. Ozer, V. Benet-Martínez, Personality and the prediction of consequential outcomes. Annu
Rev Psychol. 57, 401–421 (2006).
22. A. Z. Yu, S. Ronen, K. Hu, T. Lu, C. A. Hidalgo, Pantheon 1.0, a manually verified dataset of
globally famous biographies. Sci Data (2016)
23. S. D. Gosling, P. J. Rentfrow, W. B. Swann, A very brief measure of the Big-Five personality
domains. J Res Pers (2003)
24. C. Spearman, The proof and measurement of association between two things. Am J Psychol. 100,
441 (1987).
25. M. Kosinski, Facial recognition technology can expose political orientation from naturalistic
facial images. Sci Rep (2021)
26. Q. Qian, M. Huang, H. Zhao, J. Xu, X. Zhu, Assigning personality/profile to a chatting machine
for coherent conversation generation. IJCAI International Joint Conference on Artificial
Intelligence. 2018-July, 4279–4285 (2018).

You might also like