Is Queen's English Drifting Towards Common People's English? - Quantifying Diachronic Changes of Queen's Christmas Messages (1952-2018) With Reference To BNC

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

Journal of Quantitative Linguistics

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/njql20

Is Queen’s English Drifting Towards Common


People’s English? —Quantifying Diachronic
Changes of Queen’s Christmas Messages
(1952–2018) with Reference to BNC

Xinlei Jiang, Yue Jiang & Cathy Ka Weng Hoi

To cite this article: Xinlei Jiang, Yue Jiang & Cathy Ka Weng Hoi (2022) Is Queen’s English
Drifting Towards Common People’s English? —Quantifying Diachronic Changes of Queen’s
Christmas Messages (1952–2018) with Reference to BNC, Journal of Quantitative Linguistics,
29:1, 1-36, DOI: 10.1080/09296174.2020.1737483

To link to this article: https://doi.org/10.1080/09296174.2020.1737483

Published online: 18 May 2020.

Submit your article to this journal

Article views: 776

View related articles

View Crossmark data

Citing articles: 4 View citing articles

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=njql20
JOURNAL OF QUANTITATIVE LINGUISTICS
2022, VOL. 29, NO. 1, 1–36
https://doi.org/10.1080/09296174.2020.1737483

Is Queen’s English Drifting Towards Common


People’s English? —Quantifying Diachronic Changes
of Queen’s Christmas Messages (1952–2018) with
Reference to BNC
a a b,c
Xinlei Jiang , Yue Jiang and Cathy Ka Weng Hoi
a
School of Foreign Studies, Xi’an Jiaotong University, Xi’an, China; bFaculty of Education,
University of Macau, Macau, China; cCollege of Education, University of Alabama,
Alabama, USA

ABSTRACT
Queen's English (QE), a linguistic symbol of the royal or upper class, is a particular
variety or an aristocratic form of English. However, QE has been dethroned by a
surprising finding that it shifted phonologically towards common people's
English (CE) between the 1950s-1980s, arousing a debate on its existence.
Based upon Queen's Christmas Messages (1952-2018) and BNC, this study quan-
titatively investigated whether QE has experienced diachronic changes and
drifted towards CE. Our PCA analysis shows QE's fluctuating lexical richness,
increasing lexical complexity and synthetism, and steady syntactic features during
the six decades. Piecewise regression and statistical results indicate 1) QE is
drifting towards CE in lexical richness and complexity between the 1950s-1980s;
2) QE exhibits an interaction between a "drifting force" and a "deviating force"
towards or from CE between the 1950s-1980s in syntactic features; 3) QE main-
tains a synthetic form distinct from the analytical one of CE over the 66 years.
These phenomena are likely related to the collapsing social structure between the
1950s-1980s, identity building in Queen's early reign and age factor. This study
firstly quantify the drift of QE towards CE lexically and syntactically, which may
shed some light on quantitative investigation of diachronic language changes.

1. Introduction
Taken literally, Queen’s English (QE) or King’s English, originally refers to
the way the reigning British monarch, writes or speaks English (Hornby,
2015). As a linguistic sign of royal or upper-class, it is a particular variety or
a more aristocratic form of English (Harrington et al., 2005). In 1972,
a Queen’s English Society was even founded to uphold and defend the
precision, subtlety and marvellous richness of QE against debasement, ambi-
guity and other forms of misuse. However, QE was dethroned by the
surprising findings published in Nature in 2000 (Adam, 2000; Harrington

CONTACT Yue Jiang yuejiang58@163.com


© 2020 Informa UK Limited, trading as Taylor & Francis Group
2 X. JIANG ET AL.

et al., 2000a), which aroused the speculation that the Queen, as the pre-
eminent speaker of QE, may not resist influences as expected. Acoustic
analysis of Queen’s Christmas broadcasts by Harrington et al. revealed the
shift of vowels in QE towards the mainstream Received Pronunciation (RP),
which is typically associated with younger speakers and/or speakers lower in
the social hierarchy. Later, a series of extended analysis of diphthong trajec-
tories (Harrington et al., 2005), monophthongal vowel space (Harrington
et al., 2000b) and happY vowels (vowels tensing of the final vowel in words
like ‘happy’) of QE (Harrington, 2006) show sound change towards main-
stream RP, a more modern and less aristocratic form of English between the
1950s and the 1980s. Their conclusion undoubtedly dismays those who cling
to QE as the correct way to speak English. The chief defender, QE Society,
was wound up in 2012 due to under-present members for filling the positions
in committee (Williams, 2012). Sensation was also caused in the academic
community. Some linguists hold that they don’t think there is anything in the
accent of Prince William, the future King’s English, that marks him out as
royal or even as an upper class (Adam, 2000), whereas others dismiss the
idea. However, the discussion of whether QE moves towards common
people’s English is not closed since the diachronic changes of a variety of
English, in this case, QE, may display at various linguistic levels, viz. not only
at phonetic level but also at lexical and syntactic levels. To be brief, the
existing evidence from phonetic variation research does not suffice to draw
a sweeping conclusion of the ‘hypothesis of drift’ that QE drifts towards
common people’s English.
Besides, central to different linguistic disciplines such as psycholinguistics
and sociolinguistics, the study of language variation and change investigates
not only the linguistic variation diachronically or synchronically but also
how the change comes about (Chambers & Schilling, 2013). A number of
potential underlying determiners have been put forward, including internal
factors, social factor, as well as cognitive and cultural factors (Chambers &
Schilling, 2013; Gong et al., 2014; Laks, 2013; Labov, 1994, 2001, 2010).
However, Harrington et al. (Harrington, 2006, 2007; Harrington et al.,
2000a, 2000b, 2005) attributed the observed acoustic shift in QE mainly to
phonetic change. Therefore, to back up the ‘hypothesis of drift’, more
evidence is needed for a diachronic change of QE at other linguistic levels
than only phonological level, and is to be associated with other possible
underlying factors.
Actually, a few researches on QE have been conducted but are limited to
traditional linguistic features. Kredátusová (2009) did a detailed discourse
analysis of 52 Queen’s Christmas messages (QCM) at different levels and
a comparative analysis to describe the changes over time. However, as he
didn’t find significant diachronic change, he claimed that the queen kept
a very balanced standard of her speeches, with homogeneously delivered
JOURNAL OF QUANTITATIVE LINGUISTICS 3

messages. What’s more, Li (2014) analysed syntactic features of 59 QCM but


found no obvious change consistent with time and increase of Her Majesty’s
age. Interestingly, from the perspective of mathematics, Fry and Evans (2016)
found the queen’s most frequently used words are pretty boring and only
4,366 distinct words appeared in her 64 QCM. Given the stable lexical
features, they used a Markov Chain to predict the queen’s incoming mes-
sages. Notably, the fore-mentioned efforts failed to capture the diachronic
change of QE at different linguistic levels, let alone further analysis of its
movement towards common people’s English. To sum up, all this suggests
that traditional linguistic features may not be able to detect such changes.
Therefore, attention should be turned to some novel indicators, quantitative
indicators in quantitative linguistics in this study, which may provide a new
perspective and hard evidence for the diachronic changes of QE and the
‘hypothesis of drift’.
Characterized by accuracy and scientificity (Liu, 2017), quantitative linguis-
tics combines linguistic methods with mathematical ones, which may prove to
be a better analytical mean to all sciences and a sharper instrument for deeper
insights into linguistic issues (Altmann, 1997; Hou et al., 2017). Equipped with
diverse quantitative techniques, quantitative linguistics concerns itself with
various phenomena, structures and structural properties of language in order
to discover the governing laws and driving forces behind these phenomena and
dynamics of language evolution (Liu, 2017). It has focused extensively on
synchronic variability of linguistic features in stylistic analysis (Liu & Xiao,
2019; Melka & Místecký, 2019), genre analysis (Hou et al., 2014), authorship
attribution (Chen et al., 2012), comparative analysis of different speakers (Y.
Zhang, 2014), esp. among political figures (Wang & Liu, 2008), whereas, the
amount of research on diachronic change of languages or language varieties is
sparse, except for a few scholars (Fan, 2012; Zhang & Liu, 2019). This pre-
ference was also reflected in their (Dai & Liu, 2019) quantitative analysis of
QCM. They compared vocabulary richness and thematic words between
Christmas speeches by American presidents and that by the queen. However,
they didn’t report any diachronic changes of the speeches per se. Therefore,
a diachronic quantitative analysis of QE may help trace its evolution and test
the ‘hypothesis of drift’. It may also open up a new vista on the quantitative
investigation of language variation and changes.
With recourse to miscellaneous quantitative indicators, the present study
is intended to explore QE’s diachronic changes and its potential tendency
based on QCM from 1952 to 2018 with reference to BNC corpora. Two
research questions are addressed in this study, including

(1) How did QE change at lexical and syntactic levels across the 66 years?
(2) Is QE drifting towards common people’s English?
4 X. JIANG ET AL.

2. Materials and Methods


2.1. Materials
This study chose Queen’s Christmas messages (QCM) as corpus. To capture
the diachronic changes of QE across the decades, the corpus collected all the
Christmas messages delivered by the queen between 1952 and 2018. It is for
the following reasons that QCM is suitable for longitudinal investigation of
QE. First, the messages, drafted by the queen herself, are those of a few
occasions where the very pinnacle of the Establishment as well as both the
pre-eminent speaker and personification of QE is entitled to speak (Dai &
Liu, 2019). Second, for historic linguistic studies, the messages are unique,
given that there is hardly any material recorded annually for over a 50-year
period produced by the same person with a broadly similar communicative
intent (a message to people all over the world) in any variety of English
(Harrington, 2006). Thus, many of the common confounding influences due
to variation in spoken language between speakers or within a speaker could
be eliminated in this study. Last, consisting of annual Christmas messages
with the same speaker, same audience, similar content and size, this corpus
also meets the very homogeneity and comparability required by corpus
linguistics. All the texts of the messages were downloaded from the official
website of British Monarchy (https://www.royal.uk/history-christmas-
broadcast). For detailed information of the 66 messages see Appendix A.
In keeping with our intention to investigate whether QE moves towards
common people’s English, the present study used British National Corpus
(BNC) as its reference corpus and a representative of common people’s
English. BNC is a 100-million-word collection of samples of written and
spoken language from a wide range of sources, designed to represent a wide
cross-section of British English from the later part of the twentieth century
both spoken and written (Burnard, 2000). With earliest texts dating back to
1960 and the latest texts collected in 2007 (Leech et al., 2014), BNC covers
almost the similar time span of QCM. Not specifically restricted to any
particular register or speaker, BNC is a microcosm of current British
English in its entirety for 50-year period (Burnard, 2000). Not only large
enough (100 times larger than the Brown Corpus) but also varied enough
(Leech et al., 2014; Xiao & Tong, 2007), BNC has been widely used as the
representative of ‘English’ (Grant, 2005) ‘general English’ (Hsu, 2018),
‘British English’ (Xiao & Tong, 2007) or ‘contemporary British English’
(Kilgarriff, 1997), all largely synonymous with British English of common
people. Such BNC-based or BNC-driven studies have prevailed in different
linguistic fields as stylistic analyses (Hoover, 1999; Leech & Short, 2007),
sociolinguistic researches (Reichelt, 2017; Saily, 2011; Xiao & Tong, 2007),
ESP (English for specific purposes) research (Hsu, 2018), genre analysis (Lee,
2001; Wang & Liu, 2008), historical linguistics (Marquez, 2007), language
JOURNAL OF QUANTITATIVE LINGUISTICS 5

teaching researches (Grant, 2005), and second language teaching researches


(Luo & Deng, 2009; Wang & Wang, 2008). Thus, in a sense, BNC can well
represent British English of common people in this study. Using quantitative
indicators retrieved from BNC as a baseline for comparison, the present
study attempts to find some clues to the movement of QE towards or away
from common people’s English.
With the prevalence of mixed media in everyday life, there seems to be no
strict line that could separate discourses into written or spoken (Crystal &
Davy, 1969). Speeches are a prominent example of a mixed medium as they
are written to be read aloud (Crystal & Davy, 1969). Due to this ‘dichotomy’
of speeches, two constituent sub-corpora of BNC, Written BNC and Spoken
BNC are used as reference corpora, representing common people’s written
English and spoken English, respectively. All the files for analysis were
retrieved from BNCweb at Lancaster University, one of the interfaces for
BNC online services (http://bncweb.lancs.ac.uk). The whole BNC was tagged
with improved word-class tagging, totalling 91 tags comprising 61 BNC basic
tags (known as ‘C5’ tagset) and with 30 ambiguity tags included. Ambiguity
tag, for example, AJ0-AV0, indicates that the choice between adjective (AJ0)
and adverb (AV0) is left open but CLAWS automatic tagger prefers adjective
reading to adverb reading (Leech & Smith, 2000). This study takes the
ambiguity tags as their more probable ones, for example, AJ0-AV0 as AJ0,
and counts them only once.

2.2. Methods
With a number of indicators drawn in from quantitative linguistics, the
present study attempted to describe quantitatively the annual QCM at
lexical and syntactic levels. For the first research question, Principal
Component Analysis (PCA) was carried out not only to reserve the
rich diversity of linguistic indices but also to find a simple structure
that could best account for the total variance in the data (Oakes & Ji,
2012). Following PCA, regression analysis was conducted to trace any
diachronic changes which may take place in the QCM corpus across the
66 years. For the second research question, given the disparity of corpus
size between QCM and BNC, five quantitative indicators, independent of
text length, were selected for further analysis of the trend of QE (by
stages) as compared with BNC. Generalized Additive Modelling (GAM)
and piecewise regression were applied to capture the moving direction of
QCM with reference to the three baselines calculated based on BNC,
Spoken-BNC (BNC-S), and Written-BNC (BNC-W) respectively. The
above quantitative indicators and measurements are elaborated as
follows.
6 X. JIANG ET AL.

2.2.1. TTR
While the number of tokens (N) in a corpus refers to the total number of
words, the number of types (V) refers to the total number of unique words
(Baker et al., 2006). Type-token ratio (TTR) is a classic and most widely used
indicator of vocabulary richness (Liu, 2017).
Substantially dependent on the text-length (Kubát et al., 2014; Melka &
Místecký, 2019), TTR is used to analyse longitudinally annual QCM (for the
first research question), but not to compare QCM with BNC (for the second
research question). The count states:
V
TTR ¼ (1)
N

2.2.2. H-point
First proposed by (Hirsch, 2005) for scientometrics, h-point was introduced
into text analysis by Popescu (2007). Reflecting vocabulary richness and
linguistic typological features, it is applied variously in quantitative linguis-
tics (Liu, 2017; Kubát et al., 2014). It is taken as a boundary point on rank-
frequency distribution, where the rank is equal to the frequency. Susceptible
to text size (Kubát et al., 2014), h-point is employed for the first research
question only and is defined as:
(
r; if there is an r ¼ f ðrÞ
h¼ rj f ðiÞri f ð jÞ (2)
rj ri þf ðiÞf ð jÞ if there is no r ¼ f ðr Þ

where r is rank, f(r) frequency of the rank, i < j, ri < f(i), rj > f(j), i and j the
positions of two adjacent words in a word list, and f(i) and f(j) the frequen-
cies of the two words.

2.2.3. Entropy
Borrowed from information theory, Shannon’s entropy (H) measures uncer-
tainty or diversity (Manning & Schütze, 1999; Liu, 2016; Shannon, 1948,
1951). In linguistics, entropy expresses the degree of vocabulary dispersion,
also interpreted as its monotony (Kubát et al., 2014; Liu, 2017; Melka &
Místecký, 2019). The smaller the H is, the more concentrated the vocabulary
is and the less rich the vocabulary is. Sensitive to text size too (Kubát et al.,
2014), entropy is used only for the first research question. Its formula is as
follows:
XK
H¼ P ldPi
i¼1 i
(3)

fi
pi ¼
N
JOURNAL OF QUANTITATIVE LINGUISTICS 7

where K is the inventory size, pi the relative frequency of a given word, and fi
the absolute frequency.

2.2.4. R1
R1 estimates the proportion of the content words, calculated based on the
h-point (Xiao & Sun, 2018). While it reduces the impact of text length to
some extent (Kubát et al., 2014), it is used only for the first research question
and then excluded in the further comparison between QCM and BNC to
avoid the potential confounding effect. Its basic formula reads:
  Ph !
h2 r¼1 fi h2
R1 ¼ 1  F ðhÞ  ¼1  (4)
2N N 2N

where F(h) is the cumulative relative frequency up to the h-point, also


representing the h-coverage.

2.2.5. Repeat Rate (RR) and RRmc


RR shows the degree of vocabulary concentration. (Yule, 1944, 2014)
‘Characteristic K’ indicates through inversion that the richer the text is, the
smaller the repetition of words is. It is defined as:
XV
RR ¼ P2
r¼1 i
(5)

where Pi is the individual probabilities. If estimated by means of relative


frequencies, Pi = fi/N, where fi is the absolute frequencies and N is number of
tokens.
Repeat rate was normalized by (McIntosh, 1967), yielding:
pffiffiffiffiffiffi
1  RR
RRmc ¼ pffiffiffiffi (6)
1  1= V
Thanks to this amendment to the original formula, the results of RRmc fall in
the interval <0;1>, making it comparable across different texts and languages.
Independent of text length (Kubát et al., 2014), RRmc is thus employed to
address both research questions.

2.2.6. Arc Length and Lambda (Λ)


Arc length (L) along the ranked frequency characterizes a rank-frequency
sequence (Popescu et al., 2009) and indicates vocabulary richness. It is
defined as the sum of Euclidean distances (Dr) between all neighbouring
frequencies:
XV1 XV1 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2

L¼ r¼1
Dr ¼ r¼1
ðf ðrÞ  f ðr þ 1ÞÞ þ 1 (7)
8 X. JIANG ET AL.

To liberate researchers from the weight of text length, Lambda (Λ) was
proposed as a stable indicator of frequency structure (Popescu et al., 2011,
2009). Expressing the structure which emerges as a result of language
usage, Lambda indicates a more synthetic form (with a higher value) or
a more analytical form of the given language (with a lower value) (Popescu
et al., 2011). The index can help investigate the evolution of a language,
historical development of a writer, and quantitative comparison between
texts, authors, genres or languages (Popescu et al., 2011, 2009, 2010).
Lambda (Λ) is thus profitable for the two research questions and can be
computed as:
 
L log10 N
Λ¼ (8)
N

2.2.7. Adjusted Modulus (A)


This is another frequency structure indicator (Kubát et al., 2014). Though
marginally influenced by text size, it is only used to address the first research
question. It is defined as:

M
A¼ (9)
log10 N

   2 !1=2
f ð1Þ 2 V 1 1=2
M¼ þ ¼ f ð1Þ2 þ V 2 (10)
h h h

with f(1) as the frequency of the most frequent word and M as the modulus.

2.2.8. Hepax Legomena Percentage (HL)


HL is a ratio between the number of tokens (N) and number of hapax
legomena (Nh) in a text. Hepax legomena are words that occur in a text
only once (Popescu et al., 2009). Sometimes used as a measure of vocabulary
richness, it shows the form-richness of the language it characterizes (Popescu
& Altmann, 2008; Popescu et al., 2008, 2009). Due to its significant correla-
tion with text size (Kubát et al., 2014), it is employed only for the first
research question, which deals with homogeneous texts of moderate length.
The formula is as follows:

Nh
HL ¼ (11)
N
with Nh as the number of hepax legomena.
JOURNAL OF QUANTITATIVE LINGUISTICS 9

2.2.9. Average Token Length (ATL)


ATL is the arithmetic mean of word size in characters. The index may be
directly linked to complexity or style (Kubát et al., 2014; Liu, 2017), and
shows the independence of text length (Kubát et al., 2014). Applying ATL to
both research questions, we obtain:
1 XN
ATL ¼ x
i¼1 i
(12)
N
with x as individual word size/length.

2.2.10. Writer’s View


Connected to the golden ratio, writer’s view reflects writers’ control over
function words and content words in their writing process and his aesthetic
pursuit (Pan et al., 2018; Popescu & Altmann, 2007a; Popescu et al., 2012;
Tuzzi et al., 2010; Xiao & Sun, 2018). The view provides us with a deeper
insight into the structure of texts. Baptized in this way, it helps one imagine
the writer or speaker ‘sitting’ at this point and controlling the equilibrium
between autosemantics (content words) and synsemantics (function words)
(Popescu & Altmann, 2007b).
If we regard the last-ranked word with the lowest frequency as P1(V; 1),
the first-ranked word with the highest frequency as P2(1; f (1)) and h-point as
P3 based on the rank-frequency distribution curve, the angle α at the crossing
of P3P1 with P3P2 can be termed as ‘writer’s view’, with its cosine value (see
Formula (13) below) converging to the golden section (~1.618.). Given its
dependence on text length to some extent, this quantitative indicator is
employed only for the first research question.

½ðh  1Þðf1  hÞ þ ðh  1ÞðV  hÞ


cos α ¼  1=2  1=2 (13)
ðh  1Þ2 þ ðf1  hÞ2 ðh  1Þ2 þ ðV  hÞ2

2.2.11. Lorenz Curve and Gini Coefficient (G)


Frequently used in economics or sociology as a well-known measure of
statistical dispersion (Kubát et al., 2014; Melka & Místecký, 2019; Popescu
et al., 2009), Gini coefficient (G) is also applicable to textological concepts
like coverage and richness (Popescu & Altmann, 2006; Popescu et al., 2009).
It is based on the Lorenz curve, which is the stepwise adding of relative
frequencies beginning from the lowest up to the highest (Kubát et al., 2014;
Popescu & Altmann, 2006; Popescu et al., 2009). It indicates the distance
between the diagonal and the sequence of cumulative frequencies (the
Lorenz curve). G shows the position of the text between maximal and
minimal vocabulary richness (Popescu et al., 2009), computed as:
10 X. JIANG ET AL.

 
1 2 XV 1
G¼ V þ1 rf ðr Þ ¼ ðV þ 1  2m1 Þ (14)
V N r¼1 V
PV
rf ðrÞ
m1 ¼ r¼1
N
with m1 as the average frequency distribution.
With its dependence on text size taken into consideration (Kubát et al.,
2014), the indices are not involved in investigating the second research
question but for the first one only.

2.2.12. Activity
Word classes may give a text a special character in that verbs emphasize the
activity while adjectives may be characteristic of the descriptive expression.
Activity (Q) or active-descriptive (dis) equilibrium is measured in terms of
Busemann’s coefficient (Busemann, 1925; Melka & Místecký, 2019; Zörnig
et al., 2015), which is the only concrete computable activity indicator avail-
able in linguistic literature (Zörnig & Altmann, 2016). Its formula is ren-
dered as:
V
Q¼ (15)
V þA
with V and A denoting the number of verbs and adjectives, respectively.
Activity has been used in psychology and linguistics for text, style,
characterization of persons as well as historical analysis (Zörnig et al.,
2015). Dealing with a dichotomic situation, descriptive vs. active, or
A-V equilibrium, may express the interaction between these two ‘forces’
(Popescu et al., 2014). If Q > 0.5, the text can be regarded as ‘active’; if
smaller than 0.5, it is regarded as ‘descriptive’ (Zörnig et al., 2015). Prior
literatures on textual activity shows favour with various interpretation in
terms of the activity change or activity differences. Some argue that high
activity values may indicate a comprehensible language that avoids rich
adjectival embellishments, and low values may indicate missing anima-
tion, related to the nominal (substantive-based) character of the texts
(Melka & Místecký, 2019; Zörnig & Altmann, 2016). In contrast to these
‘internal’ properties, others argue that activity may correspond to histor-
ical, sociological or other important ‘external’ facts (Zörnig & Altmann,
2016). Independent of text length (Kubát et al., 2014; Zörnig et al., 2015),
Activity is applied to both research questions.

2.2.13. Verb Distances (VD)


Verb distances (VD) count how many tokens on average there are between
two successive verbs, computed as:
JOURNAL OF QUANTITATIVE LINGUISTICS 11

1 XNv
VD ¼ ðVi  Vi1  1Þ (16)
Nv  1 i¼2

with i as the order of the appearance of the verb among all the verbs in the
text, Vi the linear position of the verb in the text, and NV the number of all
the verbs.
Given the big corpus such as BNC in our case, it is cumbersome to
recognize and label the linear order of every verb in the bulks of language
data. Therefore, Verb distances can be directly obtained as:
N  Nv
VD ¼ (17)
Nv

Albeit with little relevant investigation within the previous empirical studies,
VD has considerable potentials for characterizing properties of languages, texts
and style (Liu, 2017). As a generalization of the theory of runs, the theory of
distances concerns itself with the distances between two identical elements
(word forms, letter or other text units) of a sequence, presenting a view of text
development not merely a simple evaluation of frequency (Zörnig et al., 2015).
Combined with the numbers of their occurrences, the sequences of verbs, if
scrutinized, can help disclose some aspects of the text dynamics (Zörnig et al.,
2015). Besides, in systemic functional linguistics, verbs, including both infinite
and finite ones, are used to count clauses, which are regarded as direct
constituents of the sentence (Halliday, 2004). Mathematically, clause length
equals verb distance plus one. In this vein, verb distances can both exhibit the
syntactic features and detect the sequential text organization in a quantitative
context. Freed from the constraint of text length (Kubát et al., 2014), verb
distances are suitable for addressing both research questions.
The indicators elaborated above can constitute an overall quantitative
picture of QCM and BNC, thus opening up a new vista for tracing diachronic
changes of QE at lexical and syntactic levels. Every indicator serves as one of
the many possibilities to account for the property of a text or language,
which, however, singly exploited, may not work. Thus, by means of these
many tools together, that is, miscellaneous indicators, it becomes possible to
characterize texts or languages, holistically.
Once formatted in txt files, the 66 messages were processed with QUITA
(Quantitative Index Text Analyser) (Kubát et al., 2014) to automatically
output the results of the quantitative indicators. To retrieve quantitative
indicators based on the tagged texts, the 66 QCM were processed with Free
CLAWS web tagger in C5 beforehand and calculated in Excel. Five quanti-
tative indicators of BNC, BNC-S and BNC-W were also output with Excel
based on language data retrieved from BNCweb. It is worth mentioning that
punctuations (tagged as PUL, PUN, PUQ, PUR) and non-English unclassi-
fied items (tagged as UNC) were excluded in the calculation of RRmc,
12 X. JIANG ET AL.

Lambda, Writer’s view and ATL. To obtain Verb distances, punctuations


were eliminated but with the non-English unclassified items retained,
because the latter occupy places in the sequential text.
To sum up, to address the first research question, the values of TTR,
h-Point, Activity, Entropy, R1, ATL, Lambda, Writer’s view, Verb distances,
Adjusted modulus, Hepax percentage, Gini coefficient and RRmc of 66 QCM
were used for Principal Component Analysis, followed by regression ana-
lyses on the factor scores of each dimension. To address the second one, five
size-independent quantitative indicators were selected, including RRmc,
ATL, Lambda, Verb distance and Activity. GAM and piecewise regression
were employed to compare the five indices of QCM with their counterparts
in BNC and BNC sub-corpora. Information for all the indices of QCM, BNC
and BNC sub-corpora is listed in Appendices B and C.

3. Results and Discussions


3.1. Diachronic Changes of QE
Table 1 shows the result of the statistical exploration of the trend of QE as
time goes by. PCA extracted three dimensions (or components) from 13
quantitative indices, as indicated in the left column.
Cumulatively, these three dimensions explain large amounts (82%) of
total variance in the original data set (also shown in scree plot Figure 1).
Though the statistical components do not exist in the real world, they
represent in essence weighted combination of quantifying linguistic features,
which are constructed to model and explain the variance among the obser-
vational variables.
The varying contributions of the quantitative indicators to the three
dimensions are measured, as listed in Table 2. Small coefficients with an
absolute value below 0.5 are suppressed to give a well-defined view of indica-
tors with high loadings, and a full display format is attached as Appendix D.
The compositional structure of the three dimensions brings clear insights.
The first dimension is sustained mainly by TTR, h-point, Entropy, R1,
Writers’ view, Hapax percentage, Gini coefficient, and RRmc, all of which
are indicative of vocabulary richness to some extent. Also attributed to
the second dimension, Entropy, Writes’ view, together with Lambda and

Table 1. PCA of QE across the 66 years.


Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings
Component Total % of Variance Cumulative % Total % of Variance Cumulative %
1 5.800 44.614 44.614 5.574 42.877 42.877
2 3.554 27.335 71.949 3.141 24.163 67.040
3 1.340 10.307 82.256 1.978 15.217 82.256
JOURNAL OF QUANTITATIVE LINGUISTICS 13

Figure 1. Scree plot of PCA.

Table 2. Rotated component matrix.


Component
Indices 1 2 3
TTR .986
h-Point −.918
Activity .817
Entropy −.725 .516
R1 .747
ATL .575
Lambda .849
Writer’s view .558 −.724
Verb distances −.848
Adjusted modulus .925
Hapax percentage .924
Gini coefficient −.990
RRmc .580

Adjusted modulus concern vocabulary dispersion or frequency structure of


a text. Interestingly, ATL, an indicator of lexical sophistication and style,
makes the second dimension more heterogeneous than the third dimension,
which consists of only two syntactic indicators, viz. Activity and Verb
distance. By dimension reduction, PCA builds a new subspace that can
14 X. JIANG ET AL.

best account for diachronic variation in QE across the 66 years. Factor scores
of the three dimensions are retained as variables for further analysis.
Curve Fitting Toolbox 3.5.9 in MATLAB R2019a was employed to fit the
factor scores on the three dimensions respectively in order to delineate the
evolution of QE in terms of vocabulary richness, lexical sophistication and
dispersion, and syntactic features as shown in Tables 3–5 respectively, along
with best-fitted curves, usually the quartic model, in Figures 2–4.
Figure 2 shows that Dimension 1 exhibits a wave-like pattern. To be
specific in illustration, there is a rise from 1952 to 1963 and a slight fall in
the following 30 years, but from 1992 on, a considerable rise occurs.
Given the contributing indicators of Dimension 1, it can be seen that the
vocabulary became more diverse in the first decade since Her Majesty’s
accession. Facing a postwar shattered country with uncertain future and
public health crisis due to the Great Smog, the queen delivered her
Christmas Messages loaded with diverse diction and rich information, prob-
ably hoping to reassure her people and build a close rapport with her people,
as well as to establish her identity in her first 10-year reign. The following
decades of upheaval and transition at home and abroad brought an era of
rapid change, to which the Royal Family managed to adapt (DK, 2015).
During that period, the vocabulary richness of QE kept moving close to that
of common people, which will be discussed in the later part of this paper.
1992 marked the 40th year of her reign, when she reformed the monarch
substantially, for example, paying tax on her private wealth and opening her
official residences. Changes also took place in her English, with more diverse
words used.
Table 4 and Figure 3 show that the past 66 years have witnessed a sig-
nificant increase in Dimension 2. Given its compositional structure, the
fitting result suggests a tendency of lexical sophistication and a sign of
a stronger synthetism of QE. Notably, climbing its way up, QE adjusted its
pace of text development in 1958 and in 2002, both of which betoken a sharp
growth in the years that followed.
The queen’s 1957 Christmas Broadcast was an historic event as it was the
first QCM to be televised and also marked the 25th anniversary of the first
Christmas Broadcast on the radio (DK, 2015). As a young monarch keen to
enter the spirit of the new era, the queen was eager to use the latest
technology to connect even more directly with the public (DK, 2015). Via

Table 3. Fitting model of Dimension 1.


Model Constant b1 b2 b3 b4 R square F Sig.
Linear .033 −.001 .000 .023 .880
Quadratic .582 −.049 .001 .056 1.911 .156
Cubic −.357 .111 −.005 5.719e-5 .163 4.098 .010**
Quartic −1.157 .332. −.020 .000 −2.401e-6 .217 4.290 .004**
JOURNAL OF QUANTITATIVE LINGUISTICS 15

Figure 2. Curve fitting of Dimension 1.

Table 4. Fitting model of Dimension 2.


Model Constant b1 b2 b3 b4 R square F Sig.
Linear −.777 .023 .198 16.074 .000***
Quadratic −.684 .015 .000 .200 7.992 .001***
Cubic −1.130 .091 −.003 2.715e-5 .224 6.058 .001***
Quartic −.291. −.141 .012 .000 2.52e-6 .283 6.115 .000***

Table 5. Fitting model of Dimension 3.


Model Constant b1 b2 b3 b4 R square F Sig.
Linear −.369 .011 .045 3.042 .086
Quadratic −.460 .019 .000 .046 1.551 .220
Cubic −.740 .067 −.002 1.709e-5 .056 1.241 .302
Quartic −.240 −.072 .007 .000 1.503e-6 .077 1.288 .285

new media, she changed her language usage, favouring complex lexicon and
synthetic forms of text organization after then. In 2002, the queen celebrated
her Golden Jubilee, but also lost her mother and sister within a few weeks of
each other. Despite the deep shadow, she went ahead with her planned visits
to all parts of the globe and looked forward to the challenges of the future.
Weathering all the storms and carrying out her role as the monarch, she
showed her willingness to adapt to change (DK, 2015), as partly evidenced by
her dramatic preference to the complex words and synthetic expressions
since 2002.
16 X. JIANG ET AL.

Figure 3. Curve fitting of Dimension 2.

According to Figure 4 and Table 5, Dimension 3 basically levels off,


accompanied by ups and downs. The result unveils an overall flat trend of
syntactic features of QE across the 66 years.
Although over 90 years old, the queen remains an active monarch with
a busy schedule. Apart from her willingness to adapt to changes, her 66-year

Figure 4. Curve fitting of Dimension 3.


JOURNAL OF QUANTITATIVE LINGUISTICS 17

dedication to duty and personal quality of steadfast strength is also reflected


by the growing public respect (DK, 2015). Compared with a diachronic
variation of text development, lexical richness and complexity, the syntactic
features of QCM remain stable and unchanged during the years, probably
attributed to her perseverance. The result is consistent with the existing
literature (Li, 2014), which reports no obvious diachronic changes regarding
syntactic features.

3.2. Shift of QE Towards or Away from Common People’s English


As explained above, five quantitative indicators independent of text size
were used to compare the diachronic evolution of QE with BNC corpus as
reference. The changes in QE were mapped in terms of these five indices
across the 66 years, with the BNC baselines added for detailed
comparison.
GAM (Hastie & Tibshirani, 1990) was applied to identify possible knots
(breakpoints) on the year in which the relation between the year and
quantitative indicators changes. No matter whether the relation is linear or
non-linear, GAM offers a nonlinear method for estimating the evolution of
QE along the timeline and identifying any possible thresholds. GAM is an
additive regression model of the form:
Xp  
Y ¼ β0 þ j¼1
g j Xj þ ε (18)

where Y is the outcome, β0 the intercept, and gj smooth function (Hastie &
Tibshirani, 1990) (which can be plotted to illustrate the marginal relation
between the predictor and the response) and Ɛ is random error.
GAM represents graphically the non-linear regression trend between
independent variable ‘year’ and dependent variables, the five indices of QE,
namely, ‘Verb distance (VD), Activity, RRmc, Lambda, and ATL’. The piece-
wise regression approach was then employed to validate GAM-derived
breakpoints. Piecewise regression, also known as the spline regression, can
be used to test whether the regression slopes vary across the different regions
defined by the thresholds (Marsh & Cormier, 2002), thus proving their
validity (Hu et al., 2017; Le et al., 2015; Setodji et al., 2013).
For each quantitative indicator, the results from the simple linear regres-
sion model (assuming one constant slope) and the piecewise regression
model (assuming varying slopes as defined by the breakpoints) could be
compared statistically. Table 6 presents adjusted R2 (higher values indicating
better fit) of the two models for five indices, showing significant improve-
ments on variance explained.
For the five indices, adjusted R2 indicates a much better fit for the
piecewise regression model than the linear regression model does,
18 X. JIANG ET AL.

Table 6. Results of Linear and Piecewise Regression Analyses for Predicting Indices of
QE.
β (SE) Adj. R2
M1 M2 M1 M2
RRmc 1952–1958 −0.0969(0.000) −0.129(0.009) −0.005 0.484
1959–1976 −1.014(0.007)***
1977–1981 −1.128(0.010)**
1982–2018 1.551(0.007)***
ATL 1952–1958 0.170(0.001) −0.242(0.178) 0.014 0.331
1959–1968 0.515(0.161)
1969–1985 0.626(0.147)
1986–2008 0.678(0.142)
2009–2018 −0.783(0.161)*
Lambda 1952–1963 0.352(0.000)** 0.054(0.077) 0.110 0.234
1964–1991 0.209(0.069)
1992–2018 0.411(0.070)
Verb distance 1952–1953 −0.0249(0.0038) 0.086(0.653) −0.0148 0.418
1954–1965 −0.520(0.548)**
1966–1975 −0.406(0.584)
1976–1981 1.247(0.667)*
1982–2018 −1.996(0.487)***
Activity 1952–1953 0.180(0.0003) −0.183(0.054) 0.0175 0.416
1954–1970 0.507(0.043)**
1971–1979 −1.632(0.048)***
1980–2001 0.914(0.042)*
2002–2018 0.069(0.043)
Note: M1 = Linear regression model; M2 = Piecewise regression model.
Adj R2 = Adjusted R2.
* p < 0.05.
** p < 0.01.
*** p < 0.001.

empirically supporting the GAM-derived breakpoints. Thus, the evolution of


QE could be analysed in comparison with BNC corpora by different stages of
time. Besides, for a closer look at the varying distance between QE and
common people’s English, One Sample t-tests of the absolute values of
difference between every indicator of QCM and their counterpart from
BNC corpora are conducted, respectively, with a horizontal line (= 0) as
test value.
With all the above information integrated, Figures 5–9 depict the dia-
chronic changes of QE regarding five quantitative indicators, providing clues
to the shift of QE towards or away from common people’s English. Notably,
the figures display the baselines of BNC corpora, time segments, and apply
two colours to the indices of QCM in different periods according to the
results of piecewise regression analysis. Specifically, the red line stands for
‘going-up’ (positiveβ) and the blue line for ‘going-down’(negativeβ), indicat-
ing different moving directions. Besides, the t-test results are shown as the
area between two lines in different degrees of greyness in the figures.
In general, RRmc of QCM is higher than all the three baselines of BNC
corpora, closest to BNC-S and farthest away from BNC-W (see Figure 5).
JOURNAL OF QUANTITATIVE LINGUISTICS 19

Figure 5. QCM vs. BNC, BNC-W and BNC-S regarding RRmc.

RRmc presents a U-shaped development across the 66 years, rising after the
initial drops from 1952 to 1982. It is clear that RRmc has been approaching
the baseline for the first about 30 years, esp. from 1977 to 1982, when RRmc
dramatically plummets to the baseline. Afterwards, RRmc starts to climb and
20 X. JIANG ET AL.

deviate from the baseline. This result shows that the queen uses significantly
more diverse words than common British people do, which contradicts the
mathematician’s findings (Fry & Evans, 2016), where they hold that the
queen uses limited number of distinct words and a new speech could be
generated just by picking old words because Her Majesty’s diction seems
boring (Fry & Evans, 2016). Besides, QE is found to drift continuously
towards common people’s English in terms of vocabulary richness, esp.
from 1952 to 1982 as indicated by the RRmc of QCM. After that, the queen
picks up a wider vocabulary, returning back to Her Majesty’s original lexical
distinctiveness. The findings corroborate the ‘hypothesis of drift’ at the
lexical level. More interestingly, this moving tendency coincidently happens
to fall between the 1950s and the 1980s, when Queen’s phonetic shift towards
mainstream accent took place (Harrington et al., 2000a, 2000b, 2005). As
Harrington (2006) puts, since the 1950s, there have been dramatic social
changes involving a collapsing class structure, esp. between the early 1960s
and 1980s (Cannadine, 1998), which is likely to be related to the accent shift
of QE. We thus suggest that the lexical richness of QE may be also susceptible
to these social changes, thus moving away from the aristocratic form to
a mainstream one.
In Figure 6, generally speaking, Average Token Length (ATL) of QCM is
lower than the baselines of BNC and BNC-W, and higher than that of BNC-S.
Albeit with undulating movements, ATL of QCM basically stays steady between
the three baselines all along the past 66 years, except for a noticeable slip into the
BNC from 1952 to 1969. During that period, ATL of QCM makes its way up to
the BNC baseline, thus shortening its statistical distance from BNC.
ATL represents lexical complexity, another quantitative lexical feature in
addition to lexical richness. Figure 6 shows that the queen’s diction is far
more complicated than that of common people’s spoken English, but less
complicated than that of common people’s written English. Therefore, lexical
complexity of QE is closest to that of BNC, which is a mixture of spoken and
written English, confirming the ‘dichotomous’ nature of Queen’s speech
(Crystal & Davy, 1969; Kredátusová, 2009). Besides, the queen’s diction
became increasingly so complicated that it almost reached the BNC baseline
from 1952 to 1969, supportive to a drifting tendency towards common
people’s English at the lexical level. The distance between QE and common
people’s English from 1969 to 1985 is also shorter than the period that
followed 1985.This drift might be associated with the collapsing class struc-
ture in UK between the 1950s and 1980s. However, the queen seemed to set
herself clearly apart from common people since 1986 when she celebrated
her 60th birthday and may try to show a unique image in the new chapter of
her life.
Figure 7 demonstrates that QCM Lambda, an indicator of frequency
structure, is higher than all the three baselines of BNC, among which
JOURNAL OF QUANTITATIVE LINGUISTICS 21

Figure 6. QCM vs. BNC, BNC-W and BNC-S regarding ATL.

BNC-W is the closest. Ever since 1952, QCM Lambda maintains its distinc-
tiveness, staying statistically away from the baselines of BNC all along
the way.
22 X. JIANG ET AL.

Figure 7. QCM vs. BNC, BNC-W and BNC-S regarding Lambda.

Lambda expresses the global distribution of the rank-frequency sequence,


which is the structure emerging out of language usage. The higher the
Lambda the more synthetic the given language, and the lower the Lambda
JOURNAL OF QUANTITATIVE LINGUISTICS 23

Figure 8. QCM vs. BNC, BNC-W and BNC-S regarding Verb distances.

the more analytical the given language (Popescu et al., 2011). QE yields
a significantly higher Lambda than BNC does, suggesting a strong synthetism
of language usage by the queen for the past 66 years. No obvious drift of QE
in Lambda towards common people’s English is detected, which is consistent
24 X. JIANG ET AL.

Figure 9. QCM vs. BNC, BNC-W and BNC-S regarding Activity.

with the regression result of Dimension 2 discussed above. In other words,


QE itself has become increasingly synthetic for the past 66 years. Thus, it
holds water that QE remains distinct from common people’s English that
features clearly analytical form.
JOURNAL OF QUANTITATIVE LINGUISTICS 25

In terms of Verb Distances (VD), it is noteworthy that QCM is the closest


to BNC-W is among the three corpora. VD of QCM is significantly longer
than that of BNC-S, but fluctuates around the baselines of BNC and BNC-W
across the 66 years. The increasing degree of greyness of the area between
BNC-W and QCM illustrates their greater statistical distance.
On one hand, it is revealed that QE entails significantly longer clauses than
common people’s spoken English, reflecting Queen’s distinct attitude towards
speech development from common people. Besides, QE seems to gradually move
away from common people’s English and written English. On the other hand,
despite its deviation, QE on the whole is closer to common people’s English
between the 1950s and the 1980s than that after 1980s, which is consistent with
both lexical shift analysed above and phonetic shift in previous studies
(Harrington, 2006, 2007; Harrington et al., 2000a, 2000b, 2005). They might be
related to dramatic social changes during the three decades. At that period, the
forces of ‘deviating from’ and ‘drifting towards’ the common people’s English
seemed to compete with one another. Notably, QE seemingly attempted to set
itself apart from common people’s English and finally became evidently distinctive
as of 1982. From that time on, the queen was close to her sixties of age and entered
the fourth decade of her reign.
Similar to VD, Activity or Basemann’s coefficient of QCM is the farthest from
the baseline of BNC-S, and fluctuates around the baselines of BNC-W and BNC
(see Figure 9). However, Activity of QCM shifts towards BNC baselines in the
1970s.
With word classes taken into consideration, Activity, considered as a quasi-
syntactic indicator, reflects a special linguistic property by characterizing its
A-V equilibrium (Kubát et al., 2014; H. Liu, 2017). On one hand, given that
the ‘descriptive’ force gets the upper hand, QE may be less comprehensible
and relatively lack animation than common people’s spoken English. On the
other hand, the drift of QE in Activity towards common people’s English in
the 1970s is in line with QE’s changing direction of lexical richness, lexical
complexity and VD. What deserves the attention is the distinctive Activity of
QE between 1954 and 1971, possibly linked to the tumult and glory brought
by the postwar years. Inheriting the crown at a critical time as a young
monarch, the queen probably wanted to establish her identity by delivering
messages unique in linguistic features. Thus, even though under the ‘drifting
force’ as a result of the collapsing social structure at that period, Activity of QE
was mainly driven by a ‘deviating force’ away from common people’s English.

4. Conclusions
The present study, based upon a corpus of the Queen Elizabeth II’s 66 annual
Christmas messages and BNC corpora, quantitatively looks into the diachronic
changes of QE from 1952 to 2018 in an attempt to explore the evolution of QE
26 X. JIANG ET AL.

per se and whether it is drifting towards common people’s English over the
past 66 years. The regression study of three dimensions extracted by PCA from
a set of quantitative indicators from QCM shows fluctuating lexical richness,
increasing lexical complexity and synthetism, as well as stable syntactic features
in QE. Our comparative analysis based on piecewise regression and statistical
results suggests that 1) QE is drifting towards common people’s English in
lexical richness and lexical complexity between the 1950s and 1980s; 2) there
has been an interplay between the ‘drifting force’ and a ‘deviating force’
towards or from common people’s English between the 1950s and 1980s in
syntactic features of QE 3) QE has been persistently distinctive in its synthetic
form for the 66 years. These phenomena might be associated with the collap-
sing social structure between the 1950s and 1980, the queen’s intention to
establish individual identity in her early reign and her ageing in recent years.
Conclusively, this quantitative investigation of the diachronic changes of QE
demonstrates for the first time that QE is drifting lexically and syntactically
towards common people’s English between the 1950s and 1980s. The findings
coincide with the phonetic shift of QE (Harrington, 2006, 2007; Harrington et al.,
2000a, 2000b, 2005) and may serve as a new piece of evidence for the long-debated
‘hypothesis of drift’ of QE. Hopefully, the study may bring some implications to
the quantitative exploration of language change and variation. Further studies of
other syntactic and semantic indicators are needed to make the conclusion robust
and shed some light upon the cognitive or social factors underlying the drift.

Acknowledgments
The research is partly supported by the National Social Science Foundation of China
[Grant No. 17BYY007]. We sincerely thank anonymous reviewers for the insightful
comments and suggestions.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
The research is partly supported by the National Social Science Foundation of China
[Grant No. 17BYY007].

ORCID
Xinlei Jiang http://orcid.org/0000-0001-9275-830X
Yue Jiang http://orcid.org/0000-0002-0310-2657
Cathy Ka Weng Hoi http://orcid.org/0000-0002-1428-5428
JOURNAL OF QUANTITATIVE LINGUISTICS 27

References
Adam, D. (2000). The Queen’s English dethroned. Nature News. https://www.nature.
com/news/1998/001221/full/news001221-9.html
Altmann, G. (1997). The art of quantitative linguistics. Journal of Quantitative
Linguistics, 4(1-3), 13–22. doi:10.1080/09296179708590074
Baker, P., Hardie, A., & McEnery, T. (2006). A glossary of corpus linguistics.
Edinburgh University Press.
Burnard, L. (2000). The British National Corpus users reference guide. http://www.
natcorp.ox.ac.uk/docs/userManual/
Busemann, A. (1925). Die Sprache der Jugend als Ausdruck der
Entwicklungsrhythmik. Fischer.
Cannadine, D. (1998). Class in Britain. Yale University Press.
Chambers, J. K., & Schilling, N. (2013). The handbook of language variation and
change. Wiley-Blackwell.
Chen, X., Li, W., & Wang, Y. (2012). 计量特征在语言风格比较及作家判定中的
应用——以韩寒《三重门》与郭敬明《梦里花落知多少》为例 [Application
of quantitative characteristics in comparison of language style and author
judgment—Triple Gates of Han Han and Never Flowers in Never Dreams of
Guo Jingming as examples]. 计算机工程与应用 [Computer Engineering and
Applications], 48(3). https://doi.org/10.3778/j..1002-8331.2012.03.040
Crystal, D., & Davy, D. (1969). Investigating english style. Longman.
Dai, Z., & Liu, H. (2019). Quantitative analysis of Queen Elizabeth II’s and American
presidents’ Christmas messages over 50 Years (1967–2018). Glottometrics, 45,
63–88. https://www.ram-verlag.eu/wp-content/uploads/2019/04/g45zeit-1.pdf.
DK. (2015). Queen Elizabeth II and the Royal Family. Penguin Random House.
Fan, F. (2012). A quantitative study on the lexical change of American English.
Journal of Quantitative Linguistics, 19(3), 171–180. https://doi.org/10.1080/
09296174.2012.685302
Fry, H., & Evans, T. O. (2016). The indisputable existence of Santa Claus. Transworld
Publishers.
Gong, T., Shuai, L., & Comrie, B. (2014). Evolutionary linguistics: Theory of language
in an interdisciplinary space. Language Sciences, 41, 243–253. doi:10.1016/j.
langsci.2013.05.001
Grant, L. E. (2005). Frequency of ‘core idioms’ in the British National Corpus (BNC).
International Journal of Corpus Linguistics, 10(4), 429–451. https://doi.10.1075/
ijcl.10.4.03gra
Halliday, M. A. K. (2004). An introduction to functional grammar. Hodder Arnold.
Harrington, J. (2006). An acoustic analysis of ‘happy-tensing’ in the Queen’s Christmas
broadcasts. Journal of Phonetics, 34(4), 439–457. https://doi.org/1016/j.wocn.2005.08.001
Harrington, J. (2007). Evidence for a relationship between synchronic variability and
diachronic change in the Queen’s annual Christmas broadcasts. Laboratory
Phonology, 9, 125–144. https://www.phonetik.uni-muenchen.de/~jmh/research/
papers/Harrington_proofs.pdf.
Harrington, J., Palethorpe, S., & Watson, C. (2000a). Does the Queen speak the
Queen’s English? NATURE, 408(6815), 927–928. doi:10.1038/35050160
Harrington, J., Palethorpe, S., & Watson, C. (2000b). Monophthongal vowel changes
in Received Pronunciation: An acoustic analysis of the Queen’s Christmas broad-
casts. Journal of the International Phonetic Association, 30(1/2), 63–78.
doi:10.1017/S0025100300006666
28 X. JIANG ET AL.

Harrington, J., Palethorpe, S., & Watson, C. (2005). Deepening or lessening the
divide between diphthongs: An analysis of the Queen’s annual Christmas broad-
casts. In W. J. Hardcastle & J. M. Beck (Eds.), A figure of speech (pp. 227–262).
Lawrence Erlbaum Associates, Inc., Publishers.
Hastie, T., & Tibshirani, R. (1990). Generalized additive models. Chapman & Hall.
Hirsch, J. E. (2005). An indicator to quantify an individual’s research output.
Proceedings of the National Academy of Sciences of the USA, 102(46),
16569–16572. doi:10.1073/pnas.0507655102
Hoover, D. (1999). Language and style in the inheritors. University Press of America.
Hornby, A. S. (2015). Oxford advanced learner’s dictionary. Oxford University Press.
Hou, R., Huang, C., Do, H. S., & Liu, H. (2017). A study on correlation between Chinese
sentence and constituting clauses based on the Menzerath-Altmann law. Journal of
Quantitative Linguistics, 24(4), 350–366. https://doi.org/10.1080/09296174.2017.1314411
Hou, R., Yang, J., & Jiang, M. (2014). A study on Chinese quantitative stylistic features and
relation among different styles based on text clustering. Journal of Quantitative
Linguistics, 21(3), 246–280. https://doi.org/10.1080/09296174.2014.911508
Hsu, W. (2018). The most frequent BNC/COCA mid- and low-frequency word
families in English-medium traditional Chinese medicine (TCM) textbooks.
English for Specific Purposes, 51, 98–110. https://doi.10.1016/j.esp.2018.04.001
Hu, B., Fan, X., Wu, Y., & Yang, N. (2017). Are structural quality indicators associated with
preschool process quality in China? An exploration of threshold effects. Early Childhood
Research Quarterly, 40, 163–173. http://dx.doi.10.1016/j.ecresq.2017.03.006
Kilgarriff, A. (1997). Putting frequencies in the dictionary. International Journal of
Lexicography, 10(2), 135–155. doi:10.1093/ijl/10.2.135
Kredátusová, M. (2009). Queen’s Christmas speeches 1952–2007: Discourse analysis.
Masaryk University.
Kubát, M., Matlach, V., & Čech, R. (2014). Quantitative index text analyzer. RAM-
Verlag.
Labov, W. (1994). Principles of linguistic change: Internal factors. Blackwell.
Labov, W. (2001). Principles of linguistic change: Social factors. Blackwell.
Labov, W. (2010). Principles of linguistic change: Cognitive and cultural factors.
Wiley-Blackwell.
Laks, B. (2013). Why is there variation rather than nothing? Language Sciences, 39,
31–53. https://doi.10.1016/j.langsci.2013.02.009
Le, V. N., Schaack, D. D., & Setodji, C. M. (2015). Identifying baseline and ceiling thresholds
within the qualistar early learning quality rating and improvement system. Early
Childhood Research Quarterly, 30, 215–226. doi:10.1016/j.ecresq.2014.03.003
Lee, D. Y. W. (2001). Defining core vocabulary and tracking its distribution across spoken
and written genres: Evidence of agradience of variation from the British National Corpus.
Journal of English Linguistics, 29(3), 250–278. https://doi.10.1177/00754240122005369
Leech, G., Rayson, P., & Wilson, A. (2014). Word frequencies in written and spoken
English based on the British National Corpus. Routledge.
Leech, G., & Short, M. (2007). Style in fiction. Pearson Education Limited.
Leech, G., & Smith, N. (2000). The British National Corpus (Version 2) with improved
word-class tagging. http://ucrel.lancs.ac.uk/bnc2/bnc2postag_manual.htm
Li, X. (2014). A quantitative study of the grammatical features of the sixty Christmas
speeches broadcast by Queen Elizabeth II. Shanghai Normal University.
Liu, H. (2017). An introduction to quantitative linguistics. The Commercial Press.
JOURNAL OF QUANTITATIVE LINGUISTICS 29

Liu, Z. (2016). A diachronic study on British and Chinese cultural complexity with
Google Books Ngrams. Journal of Quantitative Linguistics, 23(4), 361–373. https://
doi.org/10.1080/09296174.2016.1226431
Luo, W., & Deng, Y. (2009). 基于BNC语料库的英语篇际词汇重复模式研究 [A
BNC-based study on the patterns of inter-textual lexical repetition]. 外语教学与
研究(外国语文双月刊) [Foreign Language Teaching and Research (bimonthly)],
41(3), 224–229. doi:CNKI:SUN:WJYY.0.2009-03-011
Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language
processing. MIT Press.
Marquez, M. F. (2007). Renewal of core English vocabulary: A study based on the BNC.
English Studies, 88(6), 699–723. https://doi.org/10.1080/00138380701706385
Marsh, L. C., & Cormier, D. (2002). Spline regression models. Sage.
McIntosh, R. P. (1967). An index of diversity and the relation of certain concepts to
diversity. Ecology, 48(3), 392–404. doi:10.2307/1932674
Melka, T. S., & Místecký, M. (2019). On stylometric features of H. Beam Piper’s
Omnilingual. Journal of Quantitative Linguistics, 1–40. https://doi.org/10.1080/
09296174.2018.1560698
Oakes, M. P., & Ji, M. (2012). Quantitative method in corpus-based translation
studies. John Benjamins Publishing Company.
Pan, X., Chen, X., & Liu, H. (2018). Harmony in diversity: The language codes in
English–Chinese poetry translation. Digital Scholarship in the Humanities, 33(1),
128–142. https://doi.org/10.1093/llc/fqx001
Popescu, I. I. (2007). Text ranking by the weight of highly frequent words. In
P. Grzybek (Ed.), Exact Methods in the Study of Language and Text (pp.
555–566). Mouton de Gruyter.
Popescu, I. I., & Altmann, G. (2006). Some aspects of word frequencies.
Glottometrics, 13, 23–46. https://www.ram-verlag.eu/wp-content/uploads/2018/
08/g13zeit.pdf.
Popescu, I. I., & Altmann, G. (2007a). Writer’s view of text generation. Glottometrics,
15, 71–81. https://www.ram-verlag.eu/wp-content/uploads/2018/08/g15zeit.pdf.
Popescu, I. I., & Altmann, G. (2007b). Writer’s view of text generation. Glottometrics,
15, 42–52. https://www.ram-verlag.eu/wp-content/uploads/2018/08/g15zeit.pdf.
Popescu, I. I., & Altmann, G. (2008). Hapax legomena and language typology. Journal of
Quantitative Linguistics, 15(4), 370–378. https://doi.org/10.1080/09296170802326699
Popescu, I. I., Cech, R., & Altmann, G. (2012). Some geometric properties of Slovak poetry.
Journal of Quantitative Linguistics, 19(2), 121–131. https://doi.10.1080/09296174.2012.
659000
Popescu, I. I., Cech, R., & Altmann, G. (2014). Descriptivity in Slovak Lyrics.
Glottotheory, 4(1), 92–104. doi:10.1524/glot.2013.0007
Popescu, I. I., Čech, R., & Altmann, G. (2011). The lambda-structure of texts. RAM-
Verlag.
Popescu, I. I., Mačutek, J., & Altmann, G. (2008). Word frequency and arc length.
Glottometrics, 17, 18–44. https://www.ram-verlag.eu/wp-content/uploads/2018/
08/g17zeit.pdf.
Popescu, I. I., Mačutek, J., & Altmann, G. (2009). Aspects of word frequencies. RAM-
Verlag.
Popescu, I. I., Mačutek, J., & Altmann, G. (2010). Word forms, style and typology.
Glottotheory, 3(1), 89–96. doi:10.1515/glot-2010-0006
Reichelt, S. (2017). Adapting the BNC for sociolinguistic research—a case study on
negative concord. 9th International Corpus Linguistics Conference.
30 X. JIANG ET AL.

Saily, T. (2011). Variation in morphological productivity in the BNC: Sociolinguistic


and methodological considerations. Corpus Linguistics and Linguistic Theory, 7(1),
119–141. https://doi.org/10.1515/CLLT.2011.006
Setodji, C. M., Le, V. N., & Schaack, D. (2013). Using generalized additive modeling to empirically
identify thresholds within the ITERS in relation to toddlers’ cognitive development.
Developmental Psychology, 49(4), 632–645. https://doi.org/10.1037/a0028738
Shannon, C. E. (1948). A mathematical theory of communication. Bell System
Technical Journal, 27(3), 379–423. doi:10.1002/j.1538-7305.1948.tb00917.x
Shannon, C. E. (1951). Prediction and entropy of printed English. Bell System
Technical Journal, 30(1), 50–64. doi:10.1002/j.1538-7305.1951.tb01366.x
Tuzzi, A., Popescu, I. I., & Altmann, G. (2010). Quantitative analysis of Italian texts.
RAM-Verlag.
Wang, L., & Wang, T. (2008). 中国英语学习者语用标记语习得研究——一项基于
SECCL和BNC的实证研究 [A corpus-based analysis on the acquisition of prag-
matic marker s by Chinese learners of English]. 现代外语(季刊) [Modern Foreign
Languages (Quarterly)], 31(3), 291–300. http://xdwy.cbpt.cnki.net/WKD/
WebPublication/paperDigest.aspx?paperID=27fefb7f-2183-4c33-a261-
b0df56f50dbf
Wang, Y., & Liu, H. (2008). Is Trump always rambling like a fourth-grade student?
An analysis of stylistic features of Donald Trump’s political discourse during the
2016 election. Discourse & Society, 29(3), 299–323. https://doi.org/10.1177/
0957926517734659
Williams, M. (2012). The Queen’s English Society closes up shop after 40 years.
Global Lingo. https://global-lingo.com/the-queens-english-society-closes-up-shop
-after-40-years/
Xiao, R., & Tong, H. (2007). A corpus-based sociolinguistic study of amplifiers in
British English. Sociolinguistic Studies, 1(2), 241–273. https://doi.org/10.1558/sols.
v1i2.241
Xiao, W., & Sun, S. (2018). Dynamic lexical features of PhD theses across disciplines:
A text mining approach. Journal of Quantitative Linguistics, 1–20. https://doi.org/
10.1080/09296174.2018.1531618
Yule, G. U. (1944). The statistical study of literary vocabulary. Cambridge University
Press.
Yule, G. U. (2014). The statistical study of literary vocabulary. Cambridge University
Press.
Zhang, G., & Liu, H. (2019). A quantitative analysis of English variants based on
dependency treebanks. Glottometrics, 44, 16–33. https://www.ram-verlag.eu/wp-
content/uploads/2019/09/g44zeit.pdf.
Zhang, Y. (2014). A corpus based analysis of lexical richness of Beijing Mandarin
speakers: Variable identification and model construction. Language Sciences, 44,
60–69. https://doi.10.1016/j.langsci.2013.12.003
Zörnig, P., & Altmann, G. (2016). Activity in Italian presidential speeches. Glottometrics, 35,
38–48. https://www.ram-verlag.eu/wp-content/uploads/2018/08/g35zeit.pdf
Zörnig, P., Stachowski, K., Popescu, I. I., Mosavi, M., Mohanty, P., Kelih, E., Chen,
R., & Altmann, G. (2015). Descriptiveness, activity and nominality in formalized
text sequences. RAM-Verlag.
Liu, Y., & Xiao, T. (2019). A stylistic analysis for Gu Long’s Kung Fu novels. Journal of
Quantitative Linguistics, 26(4), 267-286. https://doi.org/10.1080/09296174.2018.1504411
JOURNAL OF QUANTITATIVE LINGUISTICS 31

Appendices
Appendix A. Queen’s Christmas Messages.
Year Speaker Tokens
1952 Queen Elizabeth II 690
1953 Queen Elizabeth II 900
1954 Queen Elizabeth II 641
1955 Queen Elizabeth II 760
1956 Queen Elizabeth II 873
1957 Queen Elizabeth II 833
1958 Queen Elizabeth II 780
1959 Queen Elizabeth II 127
1960 Queen Elizabeth II 511
1961 Queen Elizabeth II 514
1962 Queen Elizabeth II 582
1963 Queen Elizabeth II 206
1964 Queen Elizabeth II 644
1965 Queen Elizabeth II 586
1966 Queen Elizabeth II 478
1967 Queen Elizabeth II 955
1968 Queen Elizabeth II 509
1969 Queen Elizabeth II 264
1970 Queen Elizabeth II 623
1971 Queen Elizabeth II 341
1972 Queen Elizabeth II 682
1973 Queen Elizabeth II 491
1974 Queen Elizabeth II 628
1975 Queen Elizabeth II 572
1976 Queen Elizabeth II 629
1977 Queen Elizabeth II 432
1978 Queen Elizabeth II 1092
1979 Queen Elizabeth II 547
1980 Queen Elizabeth II 714
1981 Queen Elizabeth II 867
1982 Queen Elizabeth II 937
1983 Queen Elizabeth II 772
1984 Queen Elizabeth II 567
1985 Queen Elizabeth II 877
1986 Queen Elizabeth II 504
1987 Queen Elizabeth II 605
1988 Queen Elizabeth II 882
1989 Queen Elizabeth II 927
1990 Queen Elizabeth II 769
1991 Queen Elizabeth II 849
1992 Queen Elizabeth II 787
1993 Queen Elizabeth II 845
1994 Queen Elizabeth II 743
1995 Queen Elizabeth II 734
1996 Queen Elizabeth II 682
1997 Queen Elizabeth II 804
1998 Queen Elizabeth II 833
1999 Queen Elizabeth II 1089
2000 Queen Elizabeth II 612
2001 Queen Elizabeth II 665
2002 Queen Elizabeth II 579
2003 Queen Elizabeth II 578
2004 Queen Elizabeth II 582
2005 Queen Elizabeth II 549
2006 Queen Elizabeth II 594
(Continued)
32 X. JIANG ET AL.

Appendix A. (Continued).
Year Speaker Tokens
2007 Queen Elizabeth II 593
2008 Queen Elizabeth II 681
2009 Queen Elizabeth II 520
2010 Queen Elizabeth II 624
2011 Queen Elizabeth II 742
2012 Queen Elizabeth II 644
2013 Queen Elizabeth II 650
2014 Queen Elizabeth II 670
2015 Queen Elizabeth II 691
2016 Queen Elizabeth II 614
2017 Queen Elizabeth II 723
2018 Queen Elizabeth II 570
Appendix B. Indices of 66 QCM.
TTR h-point Entropy R1 RRmc ATL Λ Activity WV VD A Gini HP
1952 0.4464 10.0000 7.4460 0.8014 0.9472 4.2188 1.3317 0.7748 2.0434 4.8190 10.8975 0.4749 0.3014
1953 0.4289 10.5000 7.6582 0.7913 0.9401 4.4589 1.3762 0.6981 1.8507 5.2182 12.5357 0.4932 0.2922
1954 0.5008 9.2500 7.4579 0.7813 0.9358 4.5039 1.5277 0.5874 1.8594 6.5542 12.4592 0.4458 0.3713
1955 0.4487 11.0000 7.4189 0.7704 0.9294 4.4329 1.4294 0.6993 1.8651 5.4528 10.8669 0.4911 0.3276
1956 0.4238 10.5000 7.6179 0.7997 0.9429 4.3391 1.3373 0.7250 1.8992 4.8348 12.0550 0.4957 0.2852
1957 0.4226 11.3333 7.5304 0.7830 0.9411 4.2677 1.3260 0.7079 1.9586 4.4480 10.6993 0.4978 0.2809
1958 0.4346 11.0000 7.4174 0.7750 0.9324 4.3282 1.3762 0.7516 1.8955 4.6417 10.7454 0.4985 0.3090
1959 0.6299 5.0000 6.0240 0.8780 0.9675 4.1732 1.3566 0.6970 2.7312 4.5909 7.6343 0.3007 0.4488
1960 0.4951 7.0000 7.2043 0.8092 0.9391 4.6614 1.4603 0.6379 1.8307 5.9452 13.4509 0.4423 0.3620
1961 0.5156 8.0000 7.2652 0.7977 0.9369 4.4241 1.5096 0.6475 1.8818 5.3077 12.3077 0.4280 0.3755
1962 0.4863 8.8000 7.3122 0.8036 0.9365 4.3471 1.4934 0.6603 1.8300 4.5588 11.7585 0.4523 0.3505
1963 0.6214 4.5000 6.5535 0.8550 0.9515 4.4515 1.5419 0.6981 1.8946 4.2222 12.3887 0.3271 0.4660
1964 0.4488 9.0000 7.3239 0.7927 0.9424 4.4425 1.3506 0.6687 1.9211 4.8241 11.5063 0.4782 0.3137
1965 0.4556 10.0000 7.2609 0.7918 0.9427 4.5068 1.3573 0.6853 1.9941 4.8763 9.7153 0.4647 0.3038
1966 0.4937 8.3333 7.1952 0.8216 0.9482 4.4686 1.4174 0.7422 1.9773 3.8617 10.6384 0.4314 0.3389
1967 0.4304 11.0000 7.6564 0.7691 0.9292 4.6670 1.4425 0.6037 1.7789 5.8571 12.6940 0.4972 0.2932
1968 0.5069 8.5000 7.2539 0.8136 0.9418 4.6837 1.4867 0.6788 1.8979 4.3478 11.3053 0.4341 0.3752
1969 0.5530 6.0000 6.6303 0.8220 0.9480 4.2424 1.4077 0.6897 2.0701 5.6667 10.1086 0.3856 0.4091
1970 0.4478 10.0000 7.3201 0.8042 0.9445 4.6565 1.3528 0.6786 1.9498 5.5851 10.0622 0.4713 0.3002
1971 0.4839 7.5000 6.7726 0.8391 0.9512 4.1437 1.3130 0.8961 2.0608 4.0147 8.7563 0.4242 0.3167
1972 0.4751 9.0000 7.5133 0.8028 0.9423 4.6877 1.4484 0.6949 1.8947 4.5000 12.7778 0.4554 0.3270
1973 0.4725 9.0000 7.1475 0.8157 0.9459 4.5214 1.3622 0.7250 2.0249 4.6279 9.6436 0.4440 0.3136
1974 0.4713 9.5000 7.3095 0.7884 0.9361 4.5111 1.4141 0.7655 1.9344 4.6455 11.2092 0.4711 0.3519
1975 0.4703 9.0000 7.3320 0.8191 0.9491 4.3269 1.3786 0.7469 1.9821 3.7250 10.9023 0.4528 0.3234
1976 0.4595 9.0000 7.3750 0.7989 0.9433 4.7281 1.4012 0.7500 1.8685 3.7710 11.5725 0.4621 0.3100
1977 0.5069 7.5000 7.0378 0.8128 0.9353 4.6111 1.4970 0.7667 1.8184 5.1765 11.2366 0.4292 0.3634
1978 0.3736 13.0000 7.5592 0.7541 0.9243 4.2408 1.3376 0.7933 1.7578 4.8136 10.5728 0.5356 0.2308
1979 0.4790 7.5000 7.0555 0.7534 0.9171 4.7386 1.4885 0.7059 1.7636 5.3976 12.9539 0.4681 0.3547
JOURNAL OF QUANTITATIVE LINGUISTICS

1980 0.4552 9.0000 7.3346 0.7584 0.9257 4.4454 1.4472 0.6279 1.7935 5.3925 12.7972 0.4865 0.3319
1981 0.4383 10.6667 7.5912 0.7830 0.9356 4.5006 1.4129 0.7056 1.8380 4.3968 12.2300 0.4923 0.3010
1982 0.4504 10.0000 7.5655 0.7396 0.9094 4.7460 1.5793 0.6343 1.7047 7.1429 14.5198 0.4924 0.3212
33

(Continued)
34

Appendix B. (Continued).
TTR h-point Entropy R1 RRmc ATL Λ Activity WV VD A Gini HP
1983 0.4741 9.3333 7.6046 0.7831 0.9354 4.7720 1.5003 0.7158 1.8119 4.2462 13.6917 0.4611 0.3290
1984 0.4956 8.5000 7.3534 0.8062 0.9421 4.5838 1.4555 0.7402 1.9340 4.9247 12.0740 0.4406 0.3545
1985 0.4424 11.0000 7.6692 0.7896 0.9414 4.4846 1.4012 0.7133 1.8916 5.0099 12.0621 0.4868 0.3056
1986 0.4762 7.6667 7.1983 0.8262 0.9492 4.1726 1.3527 0.7436 2.0096 4.3372 11.6368 0.4460 0.3333
1987 0.4926 9.0000 7.4599 0.8141 0.9455 4.5190 1.4588 0.7123 1.9625 4.7282 11.9631 0.4387 0.3455
X. JIANG ET AL.

1988 0.4603 10.0000 7.6395 0.7710 0.9260 4.5249 1.5501 0.7209 1.7377 5.4674 13.9990 0.4832 0.3413
1989 0.4078 10.5000 7.6048 0.7930 0.9357 4.2913 1.3606 0.7616 1.7863 4.3246 12.2850 0.5023 0.2578
1990 0.4538 9.5000 7.4864 0.7830 0.9337 4.4551 1.4416 0.7305 1.8078 4.9118 12.8545 0.4839 0.3251
1991 0.4664 10.0000 7.6339 0.7750 0.9303 4.4664 1.5027 0.7152 1.8004 4.1786 13.6410 0.4762 0.3380
1992 0.4269 10.0000 7.5466 0.8094 0.9469 4.2592 1.3218 0.7538 1.9201 5.2062 11.6724 0.4859 0.2770
1993 0.4308 9.5000 7.5701 0.7990 0.9330 4.3290 1.4376 0.7400 1.7495 4.4636 13.2919 0.4900 0.2899
1994 0.4818 9.0000 7.5234 0.7611 0.9300 4.4468 1.5140 0.6875 1.8067 5.1633 13.9690 0.4621 0.3472
1995 0.4428 10.0000 7.4051 0.7834 0.9322 4.3651 1.3980 0.8028 1.8511 4.3274 11.4492 0.4854 0.3052
1996 0.4648 9.5000 7.4498 0.7934 0.9401 4.4384 1.4217 0.7023 1.8982 5.6154 11.8552 0.4654 0.3196
1997 0.4366 10.0000 7.5230 0.7774 0.9357 4.4030 1.4003 0.7090 1.8240 5.3723 12.1987 0.4865 0.2935
1998 0.4382 11.0000 7.5867 0.7989 0.9415 4.3589 1.3828 0.7192 1.9019 4.6827 11.4397 0.4912 0.3073
1999 0.3893 12.5000 7.6480 0.7742 0.9307 4.5372 1.3540 0.6985 1.7752 5.2979 11.3515 0.5296 0.2590
2000 0.4673 8.0000 7.3679 0.8007 0.9405 4.3922 1.4327 0.6824 1.8113 5.0100 12.9534 0.4537 0.3137
2001 0.4662 10.5000 7.4454 0.8017 0.9432 4.5293 1.4009 0.6842 2.0020 5.2718 10.5181 0.4626 0.3218
2002 0.4560 9.0000 7.2237 0.7850 0.9365 4.3092 1.3857 0.7008 1.8713 5.3295 10.7271 0.4626 0.3005
2003 0.4706 8.0000 7.2068 0.7872 0.9362 4.3339 1.3897 0.8240 1.9054 4.5490 12.3849 0.4703 0.3512
2004 0.4897 9.3333 7.4383 0.8188 0.9493 4.6048 1.4487 0.7039 1.9682 4.3774 11.1091 0.4352 0.3351
2005 0.4991 8.0000 7.3307 0.8033 0.9410 4.5027 1.4926 0.6767 1.8421 4.9775 12.6094 0.4368 0.3607
2006 0.4781 9.0000 7.3015 0.7854 0.9341 4.6380 1.4730 0.6818 1.8377 4.5962 11.5001 0.4554 0.3350
2007 0.4823 8.5000 7.2617 0.7776 0.9324 4.4621 1.4617 0.6667 1.8389 4.5810 12.2459 0.4620 0.3558
2008 0.4640 10.0000 7.4922 0.8076 0.9459 4.4772 1.4074 0.7133 1.9458 4.7264 11.2219 0.4636 0.3231
2009 0.5115 9.0000 7.2747 0.8010 0.9399 4.9288 1.4921 0.7054 1.9659 4.6667 10.9510 0.4330 0.3788
2010 0.5032 7.6667 7.4198 0.7827 0.9306 4.6522 1.5307 0.7014 1.8022 4.9800 14.7651 0.4395 0.3606
2011 0.4407 10.6667 7.4567 0.7883 0.9376 4.5337 1.3803 0.7029 1.8919 5.1354 10.7721 0.4807 0.2925
2012 0.4798 8.5000 7.4047 0.7968 0.9349 4.4130 1.4817 0.7050 1.8098 5.2165 13.0668 0.4576 0.3432
2013 0.5077 8.0000 7.5840 0.8215 0.9440 4.5738 1.5588 0.7429 1.7956 4.8252 14.7828 0.4351 0.3769
(Continued)
Appendix B. (Continued).
TTR h-point Entropy R1 RRmc ATL Λ Activity WV VD A Gini HP
2014 0.5104 8.5000 7.5157 0.7808 0.9307 4.5015 1.5831 0.7717 1.7907 5.1856 14.3654 0.4400 0.3821
2015 0.4530 10.0000 7.4156 0.7945 0.9391 4.3097 1.3966 0.7647 1.8919 4.9029 11.1128 0.4761 0.3169
2016 0.5489 7.5000 7.7487 0.8504 0.9546 4.6140 1.6069 0.7244 1.9123 4.4911 16.1674 0.3940 0.4072
2017 0.4827 9.0000 7.6186 0.8126 0.9445 4.4993 1.4781 0.7744 1.8726 4.6667 13.6388 0.4546 0.3485
2018 0.5421 8.0000 7.5175 0.8053 0.9421 4.4649 1.6097 0.6891 1.8477 5.7284 14.1051 0.4084 0.4070
VD = Verb Distance; WV = Writer’s view; A = Adjusted Modulus; HP = Hapax Percentage; Λ = Lambda
JOURNAL OF QUANTITATIVE LINGUISTICS
35
36 X. JIANG ET AL.

Appendix C. Indices of BNC, BNC-W and BNC-S.


Corpora VD Activity RRmc Lambda ATL
BNC 4.5140 0.7101 0.9082 0.5415 4.6439
BNC-W 4.6678 0.6934 0.9065 0.5631 4.7327
BNC-S 3.5000 0.8437 0.9096 0.3208 3.8806
VD = Verb Distance; WV = Writer’s view

Appendix D. Rotated Component Matrix.


Component
1 2 3
TTR .986 .109 −.046
h-Point −.918 −.084 −.024
Activity −.111 −.142 .817
Entropy −.725 .516 .084
R1 .747 −.305 .438
ATL .067 .575 −.321
Lambda .406 .849 −.236
Writer’s view .558 −.724 .114
VD −.111 .135 −.848
Adjusted Modulus −.031 .925 −.040
Hapax Percentage .924 .261 −.114
Gini −.990 .056 −.042
RRmc .580 −.472 .450
VD = Verb Distance.

You might also like