Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

RESEARCH ARTICLE | PSYCHOLOGICAL AND COGNITIVE SCIENCES

Do some languages sound more beautiful than others?


Andrey Anikina,b , Nikolay Aseyevc , and Niklas Erben Johanssond,1

Edited by Kenneth Wachter, University of California, Berkeley, CA; received October 29, 2022; accepted March 25, 2023

Italian is sexy, German is rough—but how about Páez or Tamil? Are there universal
phonesthetic judgments based purely on the sound of a language, or are preferences Significance
attributable to language-external factors such as familiarity and cultural stereotypes?
We collected 2,125 recordings of 228 languages from 43 language families, including 5 Despite the abiding popular
to 11 speakers of each language to control for personal vocal attractiveness, and asked interest, there is hardly any
820 native speakers of English, Chinese, or Semitic languages to indicate how much empirical research on whether
they liked these languages. We found a strong preference for languages perceived as some languages sound more
familiar, even when they were misidentified, a variety of cultural-geographical biases, beautiful than others and
and a preference for breathy female voices. The scores by English, Chinese, and Semitic whether some phonetic features
speakers were weakly correlated, indicating some cross-cultural concordance in phon- are universally attractive. We
esthetic judgments, but overall there was little consensus between raters about which
carefully controlled for language
languages sounded more beautiful, and average scores per language remained within
familiarity and cultural biases in
±2% after accounting for confounds related to familiarity and voice quality of individ-
ual speakers. None of the tested phonetic features—the presence of specific phonemic the first large-scale, cross-cultural
classes, the overall size of phonetic repertoire, its typicality and similarity to the listener’s comparison of hundreds of
first language—were robust predictors of pleasantness ratings, apart from a possible languages and did not find any
slight preference for nontonal languages. While population-level phonesthetic prefer- widely shared preferences for
ences may exist, their contribution to perceptual judgments of short speech recordings specific languages or phonetic
appears to be minor compared to purely personal preferences, the speaker’s voice quality, features. While some types of
and perceived resemblance to other languages culturally branded as beautiful or ugly. human voices may be generally
attractive, the languages
language attitudes | phonesthetics | cross-cultural | voice
themselves were surprisingly
It has long been debated which languages are esthetically pleasing, and why. Phonesthetics— uniform in terms of their esthetic
Downloaded from https://www.pnas.org by 115.76.48.113 on March 10, 2024 from IP address 115.76.48.113.

the perception of beauty in spoken language that is independent of meaning—is men- appeal to the average person in
tioned already in the Talmud: “Four languages are pleasing for use in the world: Greek our sample. This initial finding
for song, Latin for battle, Syriac (Aramaic) for dirges, Hebrew for speech” (1). Popular promotes an egalitarian view of
discussions have continued in more recent times, from Tolkien’s mellifluous Elvish and extant world languages,
the Black Speech of Mordor (2) to speculations about the perfect language for singing. demonstrates the feasibility of
In contrast, and until very recently (3, 4), academic research has been all but silent on the cross-cultural phonesthetic
topic of phonesthetics. Labeling some languages as intrinsically beautiful and others as
research, and raises important
ugly is politically incendiary, and there are difficult methodological challenges because
questions about the role of
the perception of beauty in a language depends on idiosyncratic factors such as previous
exposure. Nevertheless, a scientific investigation of phonesthetics is becoming more feasible esthetics in language evolution.
due to improved accessibility both of recordings from less familiar world languages and
of international samples of raters for perceptual studies, and it offers valuable theoretical
insights beyond mere claims that language X is more beautiful than Y. Speech is of para-
mount importance to human societies, and a proper understanding of its esthetic prop-
erties will be a crucial addition to the active research on the perceptual features behind
the esthetics of visual arts (5) and music (6, 7). Furthermore, an esthetic evaluation of
spectrotemporal features in speech, such as specific phonemes or prosodic patterns, may Author contributions: A.A., N.A., and N.E.J. designed
research; A.A., N.A., and N.E.J. performed research; A.A.
affect their typological prevalence, making phonesthetics a potentially relevant factor in contributed new reagents/analytic tools; A.A. analyzed
language evolution. We therefore took advantage of a newly available corpus of recordings data; and A.A., N.A., and N.E.J. wrote the paper.

from hundreds of world languages (https://live.bible.is) to test whether the phonetic The authors declare no competing interest.

structure of some languages makes them universally appealing, and if so, what phonetic This article is a PNAS Direct Submission.

features are responsible for this effect. Although PNAS asks authors to adhere to United Nations
naming conventions for maps (https://www.un.org/
To test whether phonesthetic preferences exist, it is essential both to understand what geospatial/mapsgeo), our policy is to publish maps as
makes this a theoretical possibility and to consider what other factors may be involved provided by the authors.
when a subject in a perceptual experiment reports liking or disliking the sound of a par- Copyright © 2023 the Author(s). Published by PNAS.
This article is distributed under Creative Commons
ticular language. Starting from the lowest level of auditory perception, speech is only one Attribution-NonCommercial-NoDerivatives License 4.0
type of input to a neurological system that evolved to process sound in general. In fact, (CC BY-NC-ND).
the human auditory brain has changed very little compared to our primate ancestors (8): 1
To whom correspondence may be addressed. Email:
niklas.erben_johansson@ling.lu.se.
if anything, speech itself may be adapted for exploiting the sensitivity of the auditory
This article contains supporting information online at
system (9, 10). When we hear “How do you do?”, the initial stages of processing are the https://www.pnas.org/lookup/suppl/doi:10.1073/pnas.​
same as for any environmental sound. As a result, the same primitive acoustic features 2218367120/-/DCSupplemental.
that we enjoy or dislike in environmental sounds should affect our esthetic perception of Published April 17, 2023.

PNAS 2023 Vol. 120 No. 17 e2218367120 https://doi.org/10.1073/pnas.2218367120 1 of 7


speech. Unfortunately, Feynman’s pessimistic view that “in under- perhaps even specific phones or phone combinations such as dis-
standing why only certain sounds are pleasant to our ear.... [we tinct consonant clusters. In fact, a vague sense of familiarity may
are] probably no further advanced now than in the time of be enough to affect preferences as explicit recognition is not a
Pythagoras” (11) still rings true. There are notoriously unpleasant prerequisite of a mere exposure effect (23). We therefore predicted
sounds, such as fingernails on blackboard (12), and the aversive- that languages should be judged more beautiful if they phonolog-
ness of industrial noises is an important practical concern, but the ically overlap with the listener’s first language and contain typo-
fundamental reasons for why these sounds are so disturbing logically common phones (1) due to sounding both more familiar
remain mysterious. While it is doubtful that any phones (elemen- and more prototypical, just as averaged faces and voices are often
tary sounds used in human speech, different from phonemes in judged to be more beautiful (21, 26).
that they do not need to distinguish between words) are strongly Finally, we must consider sociocultural factors that affect the
aversive as acoustic primitives, some classes may well be more desirability or “prestige” of languages. Just as dialects of the same
pleasing than others. For instance, there are claims that consonants language are often preferred or disliked (27–29), languages asso-
l and m and high vowels are overrepresented in the words that ciated with a particular geographical region, country, or social
native English speakers consider beautiful (13), while German category (e.g., migrants or ethnic minorities) may be perceived as
speakers perceive short vowels, voiceless consonants, and hissing socially more or less desirable, potentially complicating the effect
sibilants as affectively negative (14). Word meaning is a hopeless of familiarity. For instance, among the 16 European languages
confound when speakers evaluate the “melody” of words in their rated by European listeners in ref. 3, the seldom-recognized
own language, but the hypothesis that speech sounds lie on a Icelandic was rated as more likable than the always-recognized
phonesthetic continuum is testable. If so, the presence of specific German, although both languages are quite close phonetically. A
phones, such as clicks or retroflexes, could make an unfamiliar language does not need to be identified correctly for this effect to
language more or less pleasant to hear. occur: actual or perceived resemblance to a marked cultural cate-
As individual phones are strung together, the spectrotemporal gory may suffice.
complexity of the resulting speech signal may have some esthet- Considering all these confounds, the optimal design for stud-
ically optimal level at which the signal is neither too predictable ying phonesthetics would be to record multiple speakers from a
nor too complex, based on the widely accepted general principle large number of phonetically diverse and completely unrecogniz-
that preferred sensory input should sufficiently activate the cor- able languages, to be rated by listeners from several linguistic-cultural
responding brain areas without being exceedingly difficult to groups. In our best attempt to approach this design, we obtained
process (6, 7, 15). Just as music performed with some minor recordings of 228 languages from 43 language families (Fig. 1),
irregularities by a human performer is more pleasant than the including between 5 and 11 speakers per language to control for
same piece played impeccably by a machine (7), algorithmically personal vocal attractiveness. These recordings were then rated on
perfect, rule-based speech is not appealing (15), but neither is pleasantness by speakers of English, Chinese, or Semitic languages.
hard-to-follow speech that is poorly enunciated or masked by To provide a measure of familiarity, listeners were also asked to
Downloaded from https://www.pnas.org by 115.76.48.113 on March 10, 2024 from IP address 115.76.48.113.

noise (16). The implication to phonesthetics is that the overall indicate if they recognized the language and, if so, in which part
phonetic complexity of a language could have an inverted-U-shaped of the world it was spoken (SI Appendix, Fig. S1). The study
relation to its esthetic appeal: for example, the number of vowels addressed three main questions: 1) How strong are phonesthetic
and consonants in a language should be high enough to encode effects—that is, how pronounced are the differences between
semantic information but not so high as to overtax the processing world languages in their intrinsic esthetic appeal? 2) Do speakers
capacity of the auditory system. Alternatively, the limit on the of different languages rank other languages similarly, demonstrat-
number of phonemes may be set by vocal production rather than ing some cross-cultural concordance in phonesthetic preferences?
perception, or listeners might even fail to notice whether an unfa- 3) What phonetic characteristics make a language more or less
miliar language is phonetically rich or simple because categorical pleasant?
perception of phonemes is a trained skill (17). The voice itself
also has an important esthetic dimension. Voices are more appeal- Results
ing if they sound healthy and sex-typical (18), presumably
because we have evolved to look for signs of fitness in the voice, Extralinguistic Confounds. The average pleasantness score was
creating some universal standards of auditory beauty analogous 12.2% higher (i.e., 12.2 points on a scale of 0 to 100, 95% CI [11.1,
to the appeal of population-typical, symmetrical faces, and 13.3]) in the 14% of trials in which participants explicitly indicated
unblemished skin (19–21). Such voice-specific preferences do that they recognized a language. In line with this strong and
not translate into any concrete predictions about which phonetic culture-specific (SI Appendix, Fig. S2) familiarity effect, the most
features should be esthetically appealing, but they constitute beautiful languages according to Chinese speakers were Mandarin,
important confounds in phonesthetic research, necessitating the English, and Japanese, whereas speakers of Semitic languages
inclusion of multiple speakers per language or within-speaker preferred Spanish, English, Italian, and Arabic (SI Appendix,
comparisons (2), ideally with fully fluent bilingual speakers, who Fig. S3). Any attempt to estimate intrinsic phonesthetic appeal of
can provide recordings of multiple languages with identical voice different languages therefore requires that such obvious confounds
quality (4). be identified and controlled for. Interestingly, languages were
Moving from auditory perception to meaning, words and utter- misidentified more than half the time (50.3%), but the boost
ances are recognized and processed semantically. The first stage of in pleasantness was very similar regardless of whether a language
matching input to template is potentially a major source of esthetic was recognized correctly (12.7% [11.6, 13.9]) or incorrectly
experience because of the so-called mere exposure effect: we like (11.6% [10.5, 12.8]). In other words, perceived rather than actual
what has become familiar from repeated exposure (22–24). familiarity with a language made it more attractive.
Intrinsically noxious stimuli generally do not become likable with Listeners may not always report familiarity: for example, in
exposure (23), but we may learn to appreciate questionable sounds ~26% of trials, participants failed to report that English sounded
such as harsh metal music (25). In the case of speech, listeners familiar, although they performed the experiment in English. A
may recognize the language as a whole, particular words, or language may also feel vaguely familiar, but not enough to place

2 of 7 https://doi.org/10.1073/pnas.2218367120 pnas.org
Fig. 1. Included languages (N = 228) colored by language family (N = 43). See SI Appendix, Table S1 for the full list.

it on a map. To account for unreported or semirecognition, we lower-pitch variability and spectral novelty. An important caveat
defined “residual familiarity” of each language as the proportion is that some of these features may be affected by nonlinguistic
of trials in which it was recognized by a particular group of listen- content in the recordings, while others may be capturing
ers (English, Chinese, or Semitic) and then statistically controlled language-specific phonetic peculiarities: For example, tonal lan-
for it after excluding trials with explicit recognition. Indeed, resid- guages would tend to have higher pitch variability. Therefore, we
ual familiarity remained an important predictor of pleasantness replicated the analyses in the following sections both with
ratings (+16.3% [13.1, 19.4] over the observed range of residual (Figs. 2 and 3) and without (SI Appendix, Figs. S6 and S8) statis-
familiarity). tically controlling for five of the most important acoustic features
Downloaded from https://www.pnas.org by 115.76.48.113 on March 10, 2024 from IP address 115.76.48.113.

There were only negligible differences between world regions and either after excluding languages with familiarity over 20%
when the language was not recognized (SI Appendix, Fig. S4A), (Figs. 2 and 3) or after controlling for residual familiarity
suggesting that languages spoken in different parts of the world (SI Appendix, Figs. S7 and S9). Trials with explicit recognition of
do not sound intrinsically beautiful or unpleasant, regardless of a language were always excluded.
the listeners’ own first language. English speakers displayed little
or no preference for specific perceived regions and rated any Comparing Language Scores by English, Chinese, and Semitic
familiar-sounding language as more pleasant (SI Appendix, Speakers. Conditional language scores, calculated from mixed
Fig. S4B). In contrast, Chinese speakers preferred the languages models after accounting for familiarity and acoustic controls, were
that they thought were spoken in North Asia and North America weakly, but reliably correlated between English, Chinese, and
and had a bias against the (supposedly) African languages, whereas Semitic raters (Pearson’s r = 0.21 to 0.23; Fig. 2A), suggesting some
speakers of Semitic languages preferred North and South America. cross-cultural concordance in preferring specific languages. These
In sum, genuine psycholinguistic preferences may be masked both correlations were slightly higher when calculated for language
by a general familiarity effect and by culture-specific biases. Such families instead of individual languages (Fig. 2B), but overall, the
biases cannot be accounted for by residual familiarity because they majority of languages and families had fairly similar conditional
can be either positive or negative, and therefore, we replicated the scores that lay within ±2 to 3% on the rating scale of 0 to 100%,
analyses below after simply excluding all languages with substantial which is a modest difference compared to, for example, the 12%
familiarity. A cutoff of 20% was chosen based on the distribution boost due to familiarity. Interestingly, the concordance between
of reported familiarity rates, which are clearly inflated due to lis- groups of listeners increased when we omitted acoustic predictors
teners trying to guess blindly (SI Appendix, Fig. S2), resulting in (SI Appendix, Fig. S6), suggesting that cross-cultural convergence
the exclusion of 13% of languages rated by English-speaking lis- in pleasantness scores can partly be a consequence of preferences
teners, 7% for Chinese raters, and 26% for Semitic raters. for a specific voice quality and manner of speaking, rather than
Finally, some speakers’ voices may be intrinsically appealing, language-specific phonetics.
whatever the language. There was a statistically uncertain general Although we excluded all languages recognized in over 20% of
preference for female voices (+2.1% [−0.7, 4.2]), but no appreci- trials, likely effects of some unaccounted-for familiarity remained,
able main effect of listener’s sex (−0.1% [−2.4, 2.1] for female vs which is easier to see in conditional scores aggregated across English,
male listeners) or clear interaction between the speaker’s and lis- Chinese, and Semitic raters (Fig. 2 C and D). The high placement
tener’s sex. We also found a mild positive effect of background of the English-based creole Tok Pisin and the Indo-European family
music (+1.0% [0.2, 1.8]) but not of audio quality (−0.2% [−0.9, is particularly striking. Likewise, languages from the Uto-Aztecan
0.5]). In addition, we extracted 19 acoustic characteristics and family may have scored so high because they sounded vaguely
estimated their effect on pleasantness ratings (SI Appendix, Fig. S5 familiar due to strong Spanish influences. The effect of familiarity
and Table S3). The most consistent effect was preference for was not always positive: for example, Thai and Yongbei Zhuang
low-pitched and breathy voices within each sex, as well as for from the Tai–Kadai family were rated low by Mandarin speakers,

PNAS 2023 Vol. 120 No. 17 e2218367120 https://doi.org/10.1073/pnas.2218367120 3 of 7


Downloaded from https://www.pnas.org by 115.76.48.113 on March 10, 2024 from IP address 115.76.48.113.

Fig. 2. Pleasantness scores by speakers of English, Chinese, and Semitic languages are weakly correlated. (A and B) Conditional scores of 228 languages (A) and
43 language families (B) after excluding all languages with familiarity >20% per group and controlling for five robust acoustic predictors of the ratings: cepstral
peak prominence, entropy, spectral novelty, pitch, and pitch variability (SI Appendix, Fig. S9 and Table S5). Pearson’s correlations with 95% CI and blue regression
lines are calculated pointwise from posterior distributions of centered conditional language scores from two separate mixed models, not merely from the most
credible point estimates (solid points = medians of posterior distributions). (C and D) Conditional scores averaged across all three groups of listeners highlight
some outliers among languages (C) and families (D). The x-coordinate is only added to reduce clutter. All scores are on a scale of 0 to 100.

who usually recognized both of these languages, in an alternative controls preserved clear effects. The few detected group-specific
model with all languages included (SI Appendix, Fig. S7). Some of effects were small and hard to interpret. Thus, surprisingly, speakers
other families with low scores were represented by only one (e.g., of tonal languages in the Chinese group rated other tonal languages
Ticuna–Yuri in the Amazon) or two (e.g., Kru in West Africa) as 1.5% [0.6, 2.4] less pleasant than nontonal languages. These
languages, so unpleasant voices of individual speakers or other findings were replicated without controlling for acoustic measures
extralinguistic factors could have affected the scores. Generally, of voice quality to ensure that the effects of phonetic features
however, it seems harder to explain why some languages and fam- were not masked by acoustic predictors, except that the negative
ilies were considered unattractive, and familiarity cannot be the effect of tonality became more robust for all groups of listeners
only reason. Avar and Chechen from the Nakh-Daghestanian fam- (SI Appendix, Fig. S8). Notably, the diversity of phonemic repertoire
ily in northern Caucasus were hardly ever recognized, yet they was not associated with pleasantness ratings: The overall number
received unusually low scores, and so did Karakalpak (spoken in of vowels in a language was a vanishingly weak negative predictor
Uzbekistan) from the otherwise above-average Turkic family. of pleasantness scores (Chinese group −0.4% [−0.8, 0.0], English
−0.4% [−0.9, 0.1], Semitic +0.2% [−0.2, 0.6]), and no pronounced
The Effect of Phonetic Features. The case for universal phone­ effect was found for the number of consonants in any group.
sthetic preferences can become much stronger if we can not only Phonemic typicality of a language was a marginal positive predictor
demonstrate that listeners around the world agree about which only in the Semitic group (+0.6% [−0.1, 1.3]), nor did we find any
languages sound more or less beautiful but also pinpoint the effect of phonemic similarity between the listener’s first language
phonetic or prosodic features responsible. However, none of the and the rated language (−0.1% [−0.4, 0.2] overall). Considering the
tested phonetic features predicted pleasantness scores in all three theoretical possibility of U-shaped relationships, we also estimated
groups of listeners (English, Chinese, or Semitic; Fig. 3). This is the effect of quantitative predictors (e.g., the number of vowels or
unlikely to be a consequence of simultaneously considering too the typicality index) with generalized additive models, but did not
many predictors in multiple regression as voice-specific acoustic find marked nonlinear effects (SI Appendix, Fig. S10).

4 of 7 https://doi.org/10.1073/pnas.2218367120 pnas.org
Fig. 3. Phonetic features do not have a consistent effect on pleasantness ratings. All shown predictors are tested simultaneously in one multilevel multiple
regression model per listener group after excluding all languages with familiarity over 20% per group and controlling for acoustic predictors. Each point shows
the predicted effect of changing one phonetic feature, while holding all other predictors constant, on the pleasantness score of a single recording. Medians of
posterior distribution and 95% CIs from four mixed models. To focus on the most robust effects, we grayed out the points with <95% of posterior probability
to one side of zero. N (trials/languages) = 16,792/198 for English, 15,928/210 for Chinese, 12,457/164 for Semitic, and 46,928/199 for all groups combined. The
phonetic features are explained in SI Appendix, Table S3.

Discussion as well as a general negative skew in the distribution of average


scores of unfamiliar languages. Unless this is an artifact caused by
If there is something intrinsically beautiful about the sound of unpleasant voices of individual speakers, it suggests that phones-
certain languages, even listeners who are unfamiliar with these thetic research might obtain better traction by focusing on the
languages should reliably rate them as more pleasant. However, negative pole—that is, on the inherently unpleasant acoustic and
once we have accounted for familiarity and preferences for specific phonetic features that languages normally avoid.
Downloaded from https://www.pnas.org by 115.76.48.113 on March 10, 2024 from IP address 115.76.48.113.

voice types, the scores of all unfamiliar languages varied within Third, we attempted to determine what phonetic characteristics,
just a few percentage points. The 228 world languages that we if any, make some languages beautiful and others unpleasant. In
tested, with all their phonetic and prosodic diversity, thus sounded particular, we tested the contribution of several discrete phonetic
comparably attractive to the average listener in our sample as long features, overall phonetic complexity (number of vowels and con-
as they were not familiar. Of course, individual listeners may have sonants), phonetic typicality, and the overlap with the listener’s
their personal preferences, and we found both positive and nega- mother tongue. None of these features noticeably affected pleas-
tive cultural biases as well as a general preference for languages antness scores, with the possible exception of a slight preference
perceived as familiar, confirming the crucial role of sociolinguistic for nontonal languages. In a few cases, there were not enough
factors (3, 27–29) and the mere exposure effect (22–24). Beyond languages with a particular feature to estimate its effect reliably
that, however, there was little agreement between listeners about (e.g., only two languages had clicks), but the effects of most pho-
what languages or phonetic features they found attractive. The netic features were estimated quite precisely and were very close
listeners clearly attended to the task and did not answer at random to zero, so we can be confident that they did not strongly affect
as familiarity and several acoustic features had consistent and listeners’ preferences. Phonetic overlap with the listener’s mother
strong effects on pleasantness ratings. Thus, if genuine phones- tongue was not associated with pleasantness, and speakers of tonal
thetic differences between languages exist, they appear to be rela- languages expressed no general preference for tonality. This is sur-
tively small at the population level. prising, considering the strong familiarity effects, and suggests
The second question addressed by the study was the degree to that listeners may recognize words or supra-segmental prosodic
which phonesthetic preferences are consistent across cultures. patterns rather than individual phonemes or lexical tones. Of
Conditional scores of languages and families by English, Chinese, particular significance to our theoretical predictions, neither the
and Semitic groups of listeners, calculated after accounting for overall phonetic complexity of a language nor the typicality of its
familiarity- and speaker-dependent preferences, aligned better phonetic repertoire predicted pleasantness ratings. The number
than expected by chance, suggesting some cross-cultural agreement of phonemes may be an inadequate measure of complexity, and
on which languages are intrinsically more beautiful. However, the the recordings may not have been sufficiently long to be phonet-
correlation between the scores by English, Chinese, and Semitic ically representative of each language, so it is possible that phonetic
raters was low and sensitive to outliers—a handful of languages complexity is indeed a relevant factor, but the present study was
or families with particularly high or low scores—which means not powerful enough to detect its effect. Another interesting pos-
that the apparent cross-cultural concordance in phonesthetic judg- sibility is that the complexity of all languages may be kept within
ments might be caused by indirect familiarity effects such as the optimal zone defined by the need to encode and extract infor-
lexical-phonetic resemblance to widely recognized languages with mation efficiently. It has been suggested that all languages have
strong cultural connotations. An intriguing finding was the greater similar information carrying capacity per second, compensating
cross-cultural agreement about which languages were particularly for differences in phonological complexity by the corresponding
unattractive compared to which ones were uncommonly beautiful, variation in grammatical complexity or the rate of syllable

PNAS 2023 Vol. 120 No. 17 e2218367120 https://doi.org/10.1073/pnas.2218367120 5 of 7


production (30). However, the amount of variation should not synthetic speech. Second, by setting an upper boundary on
be exaggerated: There are no natural languages with only two or population-level esthetic preferences, we have emphasized the
three phonemes, which would use the communication channel fundamental phonetic and esthetic unity of world languages.
inefficiently, or with 10,000, which would be impossible to pro-
duce and discriminate reliably. Instead, the languages in our sam-
Materials and Methods
ple had between 14 and 51 phonemes, counting as in (31). Thus,
continuous cultural selection may ensure that the size of phonetic Stimuli. Recordings were obtained from the soundtrack of a religious film pub-
repertoire remains reasonably consistent, making all languages licly available in hundreds of languages (https://live.bible.is/jesus-film). This is
comparable in terms of both information-carrying capacity and part of the bible.is project, which is increasingly used for linguistic research due
esthetic appeal. to its unique breadth in terms of the number of included languages and families
While the attractiveness of voices, rather than languages, was not (e.g., refs. 33 and 34). We included all languages in which the film was available
the main focus of this study but rather a confound, it was interesting provided that there were several male and female voice actors and the soundtrack
to observe that all acoustic effects were very similar in English, was of sufficient quality, using clean dubbing rather than a voice-over with two
Chinese, and Semitic groups of listeners, confirming that preferences languages audible. We identified 11 scenes with relatively noise- and music-free
for specific voice types are not culture-specific (18). This has impor- monologues by ten actors and the narrator (normally four female and seven
tant implications for future studies of phonesthetics: While the male voices). Importantly, these were the same scenes from the same film for all
attractiveness of individual voices must be accounted for, the relevant languages; thus, the context, type of speech (neutral narration, two friends con-
acoustic confounds probably do not have to be estimated separately versing, a speaker addressing a crowd, etc.), expressed emotion, and for the most
part, the nonlinguistic noises were identical across the compared language. The
for each tested group of listeners. An important caveat is that acoustic
scene number was then taken into account when analyzing the data, providing a
measures may be affected not only by a speaker’s voice quality but
better-controlled comparison of the sampled languages. The audio was prepared
also by recording conditions (e.g., background noises and the dis- in Audacity (https://www.audacityteam.org/) by trimming long pauses and remov-
tance to the microphone) as well as language-specific phonetics (e.g., ing occasional low-frequency noise; all recordings were then normalized for rms
the number of fricatives) and prosody (e.g., lexical tones), so these amplitude. A.A. and N.E.J. rated audio quality and the presence of background
effects need to be interpreted with caution. Likewise, cinematic por- music, removing ~5% of poor-quality recordings. The final sample consisted of
trayals of gender stereotypes may impact the observed slight prefer- 2,125 recordings from 228 languages (Fig. 1 and SI Appendix, Table S1). Each lan-
ence for female voices. However, we replicated the analyses with and guage was represented by 5 to 11 recordings, and thus up to 11 unique speakers
without controlling for acoustic predictors, and the main conclusions per language. The recordings were 5 to 19 s in duration (mean ± SD = 10.7 ±
remained unchanged with regard to both cross-cultural concordance 3.2 s), so the total duration of audio per language was 55 to 127 s (100 ± 13 s).
and the effect of phonetic features.
While this is the largest cross-cultural study on phonesthetic Participants. We recruited 820 raters (514 women, 298 men, 8 other/unspec-
preferences to date, it has a number of important limitations. It ified; mean age = 35, range 18 to 77) on the online platform Prolific (https://
will be important to obtain longer and more controlled speech prolific.co/). With this sample size, each of 2,125 unique recordings was rated
Downloaded from https://www.pnas.org by 115.76.48.113 on March 10, 2024 from IP address 115.76.48.113.

stimuli for future phonesthetic research, enabling more nuanced on average 29 times, range [12, 68]. Three target groups were tested separately:
acoustic and phonetic analyses of the recordings. For instance, we native speakers of English (predominantly British English, filtered on Prolific as
used standard phonetic inventories that described each language “first language = English + geographical location = UK”), Chinese (“first lan-
guage = Chinese/Mandarin/Hakka/Cantonese”), or Semitic languages (“first
in general, but with more standardized recordings, it could be
language = Arabic/Hebrew/Maltese”). These three groups were chosen as they
possible to phonetically transcribe the actual rated passages. Other are culturally influential, with different writing systems and profound phonetic dif-
potentially relevant phonetic measures can then be obtained, ferences, yet are well represented in the available pool of participants. We asked
including specific consonant clusters (1), prosodic features such each participant to report their first or best language and other languages that
as speech rate and dynamic range of loudness, more robust meas- they speak fluently. The English group consisted only of those participants who
ures of voice quality, and improved estimates of phonetic typicality. explicitly reported English as their first/best language (SI Appendix, Table S4). We
Other types of speech recordings can also be tested, including considered anyone who reported being fluent in Chinese or Cantonese as belong-
unstaged conversations representative of the language heard in ing to the Chinese group, including 72 individuals who reported to be fluent in
everyday life. Likewise, while we did include three linguistically a Sinitic language, but listed English as their first language. Likewise, anyone
and culturally distinct groups of listeners, they were all recruited who reported fluency in a Semitic language was placed into the Semitic group.
and tested on a British online testing platform. Thus, raters in all
three groups were literate and exposed to European languages. In Procedure. The main perceptual experiment was written in javascript and per-
future, it will be important to extend phonesthetic experiments to formed online. The study was exempt from ethical approval in accordance with the
monolingual, non-WEIRD (32) samples more representative of Swedish Ethical Review Act (2003:460). Participants were first informed about the
humanity as a whole. Another approach to explore in future studies nature and goals of the study, provided informed consent by agreeing to terms
and conditions, and filled in a questionnaire on linguistic background and general
would be to look for evidence of sound symbolism in words like
demographics. In each trial, a participant heard a spoken phrase and was asked:
“beautiful” and “ugly”. If particular phonetic features are univer-
“How much do you like the sound of this language?” Responses were given on
sally overrepresented in these words, this could indicate a phon-
a horizontal Visual Analog Scale (VAS) marked from Not at all to Very much. If a
esthetic preference for these features. No such effect has been found participant checked the box “I think I recognize this language”, they were asked
so far (31), although there is emerging language-specific evidence to identify in which region the language was spoken (SI Appendix, Fig. S10).
that certain phonemes are associated with affective meanings The recording could be replayed, and there was no time limit for responding.
(13, 14). Thus, the likely connection between sound symbolism Participants completed 50 trials each, where each trial contained one randomly
and phonesthetics ought to be fertile ground for future studies. chosen recording from a randomly chosen language. To validate the perceptual
Despite these limitations, the present study has made two rating scale, we also performed three additional experiments using a subset of 50
important contributions. First, we now know that the expected recordings from 50 relatively unfamiliar languages (average familiarity <10%),
phonesthetic effects are very strongly affected by real or even selected at random in inverse proportion to the typicality of their pleasantness
imagined familiarity, which means that large numbers of diverse ratings so as to include recordings with both high and low average ratings. Three
languages must be tested. An interesting alternative is to use arti- alternative wordings were used: 1) How much do you like the sound of this lan-
ficial languages, such as versions of Elvish (2) or meaningless guage? Not at all... Very much (as in the main study); 2) How beautiful do you find

6 of 7 https://doi.org/10.1073/pnas.2218367120 pnas.org
this language? Ugly... Beautiful; 3) How beautiful do you find this language—the individual trial as a function of population-level predictors, such as language
language itself, not the voice? Ugly... Beautiful. We recruited 20 native speakers of familiarity, and random or group-level effects such as language, family, clip
British English on Prolific for each validation experiment, so each sound was rated number (one of 11 scenes in the source film), and subject. We allowed the effect
20 times on each version of the response scale. The average ratings per recording of language to vary across scenes because scene-specific sound effects, such as
in these three experiments and the original study correlated with Pearson’s r echo and background noise, were not identical for all languages. Each language
between 0.84 and 0.92, with an overall Intraclass Correlation Coefficient of 0.73, and each sound were thus assigned a unique group-level intercept, the effect of
95% CI [0.63, 0.82] over the aggregated ratings (SI Appendix, Fig. S11). Thus, language on ratings was assumed to vary across subjects, and the variance of
listeners understood the question as intended, namely as referring to the intrinsic responses (phi) was assumed to vary across participants to account for individual
beauty of a language, regardless of the precise formulation. differences in using the response scales. Posterior distributions of model param-
eters and fitted values were summarized by their medians and 95% credible
Acoustic and Phonetic Features. Each recording was analyzed acoustically with intervals (CIs). The audio, datasets, and R code for audio manipulation and data
R package soundgen (35) to extract voice pitch and its variability, breathiness, analysis are available in online supplements (https://osf.io/nhxkv/).
and other commonly used measures of voice quality (SI Appendix, Table S5).
Information about tonality, the number of vowels and consonants, the presence Data, Materials, and Software Availability. Data (audio, datasets, and R
of specific phonemes such as clicks, and so on was obtained from standard ref- scripts) have been deposited at https://osf.io/nhxkv/ (39).
erences, such as Phoible (36), and a range of phonological descriptions derived
ACKNOWLEDGMENTS. We are grateful to Gabriel Vogel for his comments on
from language grammars. We also estimated the phonetic typicality of each
language and its similarity to each listener’s first language from full phonetic the study.
inventories, which were available for 226 out of 228 languages. Full details are
available in the SI Appendix, Supplementary text and Tables S2 and S3.
Author affiliations: aDivision of Cognitive Science, Department of Philosophy, Lund
University, Lund 22362, Sweden; bÉquipe de Neuro-Ethologie Sensorielle Bioacoustics
Data Analysis. Unaggregated responses were analyzed using Bayesian multi- Research Laboratory (ENES) Bioacoustics Research Laboratory, Center for Research in
level models fit with the R package brms (37). The outcome variable was a single Neuroscience in Lyon (CRNL), University of Saint Étienne, Saint-Etienne 42100, France;
c
Institute of Higher Nervous Activity Neurophysiology of the Russian Academy of Sciences,
rating of a recording on a continuous scale (0 to 1), which was modeled with Moscow 117485, Russia; and dDivision of Linguistics and Cognitive Semiotics, Center for
zero-one-inflated beta distribution (38). Each model predicted the rating in an Languages and Literature, Lund University, Lund 22362, Sweden

1. G. Deutscher, Through the Language Glass: Why the World Looks Different in Other Languages 20. J. H. Langlois et al., Maxims or myths of beauty? A meta-analytic and theoretical review. Psychol.
(Metropolitan books, 2010). Bull. 126, 390 (2000).
2. C. Mooshammer, H. Hornecker, M. C. Walch, Q. Xia, The influence of the mother tongue on the 21. G. Rhodes, The evolutionary psychology of facial beauty. Annu. Rev. Psychol. 57, 199–226 (2006).
perception of constructed fantasy languages. Phon. Phonol. Im Deutschsprachigen Raum (2022). 22. R. B. Zajonc, Attitudinal effects of mere exposure. J. Pers. Soc. Psychol. 9, 1 (1968).
https://www.linguistik-in-frankfurt.de/pundp/#pll_switcher. 23. R. F. Bornstein, Exposure and affect: Overview and meta-analysis of research, 1968–1987. Psychol.
3. S. M. Reiterer, V. Kogan, A. Seither-Preisler, G. Pesek, “Foreign language learning motivation: Bull. 106, 265 (1989).
Phonetic chill or Latin lover effect? Does sound structure or social stereotyping drive FLL?” in 24. R. M. Montoya, R. S. Horton, J. L. Vevea, M. Citkowicz, E. A. Lauber, A re-examination of the mere
Psychology of Learning and Motivation (Elsevier, 2020), pp. 165–205. https://www.linguistik-in- exposure effect: The influence of repeated exposure on recognition, familiarity, and liking. Psychol.
Downloaded from https://www.pnas.org by 115.76.48.113 on March 10, 2024 from IP address 115.76.48.113.

frankfurt.de/pundp/#pll_switcher. Bull. 143, 459 (2017).


4. N. H. Hilton, C. Gooskens, A. Schüppert, C. Tang, Is Swedish more beautiful than Danish? Matched 25. R. Ollivier, L. Goupil, M. Liuni, J.-J. Aucouturier, Enjoy the violence: Is appreciation for extreme music
guise investigations with unknown languages. Nord. J. Linguist. 45, 30–48 (2022). the result of cognitive control over the threat response system? bioRxiv [Preprint] (2019). https://
5. M. M. Marin, A. Lampatz, M. Wandl, H. Leder, Berlyne revisited: Evidence for the multifaceted nature doi.org/10.1101/510008 (Accessed 14 August 2020).
of hedonic tone in the appreciation of paintings and music. Front. Hum. Neurosci. 10, 536 (2016). 26. L. Bruckert et al., Vocal attractiveness increases by averaging. Curr. Biol. 20, 116–120 (2010).
6. J. Delplanque, E. De Loof, C. Janssens, T. Verguts, The sound of beauty: How complexity determines 27. J. Edwards, Refining our understanding of language attitudes. J. Lang. Soc. Psychol. 18, 101–110
aesthetic preference. Acta Psychol. (Amst.) 192, 146–152 (2019). (1999).
7. P. Brattico, E. Brattico, P. Vuust, Global sensory qualities and aesthetic experience in music. Front. 28. V. Fridland, K. Bartlett, Correctness, pleasantness, and degree of difference ratings across regions.
Neurosci. 11, 159 (2017). Am. Speech 81, 358–386 (2006).
8. W. T. Fitch, The biology and evolution of speech: A comparative analysis. Annu. Rev. Linguist. 4, 29. A. C. Cargile, H. Giles, Understanding language attitudes: Exploring listener affect and identity.
255–279 (2018). Lang. Commun. 17, 195–217 (1997).
9. E. C. Smith, M. S. Lewicki, Efficient auditory coding. Nature 439, 978 (2006). 30. F. Pellegrino, C. Coupé, E. Marsico, A cross-language perspective on speech information rate.
10. F. E. Theunissen, J. E. Elie, Neural processing of natural sounds. Nat. Rev. Neurosci. 15, 355–366 (2014). Language 87, 539–558 (2011).
11. R. P. Feynman, R. B. Leighton, M. Sands, The Feynman Lectures on Physics, Vol. I: The New 31. N. Erben Johansson, A. Anikin, G. Carling, A. Holmer, The typology of sound symbolism:
Millennium Edition: Mainly Mechanics, Radiation, and Heat (Basic books, 2011). Defining macro-concepts via their semantic and phonetic features. Linguist. Typology 24,
12. D. L. Halpern, R. Blake, J. Hillenbrand, Psychoacoustics of a chilling sound. Percept. Psychophys. 39, 253–310 (2020).
77–80 (1986). 32. J. Henrich, S. J. Heine, A. Norenzayan, Most people are not WEIRD. Nature 466, 29–29 (2010).
13. D. Crystal, Phonaesthetically speaking. Engl. Today 11, 8–12 (1995). 33. A. W. Black, “Cmu wilderness multilingual speech dataset” in ICASSP 2019-2019 IEEE International
14. A. Aryani, M. Conrad, D. Schmidtke, A. Jacobs, Why’piss’ is ruder than’pee’? The role of sound in Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2019), pp. 5971–5975.
affective meaning making. PLoS One 13, e0198430 (2018). 34. E. Salesky et al., A corpus for large-scale phonetic typology. ArXiv [Preprint] (2020). https://doi.
15. M. Schröder, “Emotional speech synthesis: A review” in Seventh European Conference on Speech org/10.48550/arXiv.2005.13962 (Accessed 14 August 2020).
Communication and Technology (2001). 35. A. Anikin, Soundgen: An open-source tool for synthesizing nonverbal vocalizations. Behav. Res.
16. M. Dragojevic, H. Giles, I don’t like you because you’re hard to understand: The role of processing Methods 51, 778–792 (2019).
fluency in the language attitudes process. Hum. Commun. Res. 42, 396–420 (2016). 36. S. Moran, D. McCloy, PHOIBLE 2.0 (Max Planck Institute for the Science of Human History, Jena,
17. A. M. Liberman, K. S. Harris, H. S. Hoffman, B. C. Griffith, The discrimination of speech sounds within 2019).
and across phoneme boundaries. J. Exp. Psychol. 54, 358–368 (1957). 37. P.-C. Bürkner, brms: An R package for Bayesian multilevel models using Stan. J. Stat. Softw. 80, 1–28
18. K. Pisanski, D. R. Feinberg, “Vocal attractiveness” in The Oxford Handbook of Voice Perception (2019), (2017).
pp. 606–626. 38. R. Ospina, S. L. Ferrari, A general class of zero-or-one inflated beta regression models. Comput. Stat.
19. M. R. Cunningham, A. R. Roberts, A. P. Barbee, P. B. Druen, C.-H. Wu, “Their ideas of beauty are, on Data Anal. 56, 1609–1623 (2012).
the whole, the same as ours”: Consistency and variability in the cross-cultural perception of female 39. A. Anikin, N. Aseyev, N. Erben Johansson, “Do some languages sound more beautiful than others?”.
physical attractiveness. J. Pers. Soc. Psychol. 68, 261 (1995). The Open Science Framework. https://osf.io/nhxkv/. Deposited 9 January 2023.

PNAS 2023 Vol. 120 No. 17 e2218367120 https://doi.org/10.1073/pnas.2218367120 7 of 7

You might also like