Professional Documents
Culture Documents
Corpus Linguistics and Critical A
Corpus Linguistics and Critical A
1 (54-120)
Debbie Orpin
University of Wolverhampton
. Introduction
For any scholar wishing to undertake a study of the relationship between lan-
guage and ideology, Critical Discourse Analysis (CDA) can provide a useful
framework (e.g. Fairclough 1989, 1992, 1995; van Dijk 1997; Wodak 1996).
One of the strengths of CDA is that by marrying a Hallidayan approach to
linguistic analysis (an approach that sees language as firmly rooted in its socio-
Debbie Orpin
linguistic context) with theories relating to the mediation of ideology and its
relation to power structures in society, the researcher can make insightful state-
ments about the socio-political implications of the instances of language use.
Although proponents of CDA advocate taking a multidisciplinary ap-
proach to the study of language and ideology, CDA itself is situated firmly
within the field of Applied Linguistics. It is therefore disturbing to see CDA
criticised precisely for weaknesses in its linguistic analytical methodology.
Sharrock and Anderson (1981) and Widdowson (1995a, 1995b, 1996) voice
concerns about academic rigour, implying that the data is analysed in such a
way as to bear out the analyst’s preconceptions. Criticisms of CDA methods
have also come from within the field of critical language study, most notably
from Fowler (1996: 8), who draws attention to methodological weaknesses in-
herent in its qualitative approach to language study, stating that, although a
range of text types have been studied, ‘they tend to be fragmentary [and]
exemplificatory’.
A critique of CDA which could be seen as offering an important contribu-
tion to the development of a more robust methodology is provided by Stubbs
(1997). Although he makes a series of criticisms, he puts forward a number
of proposals for strengthening CDA. Among his criticisms, Stubbs (1997: 107)
points out that few CDA studies compare the features they find in texts with
norms in the language. This is crucial if reliable generalisations are to be made
concerning the effects of different linguistic choices. He also takes issue with the
fact that it is often hard to argue that data sampled in CDA texts is represen-
tative, since little data is analysed and selection is normally random. Among
his proposals, Stubbs (ibid.: 107, 111) emphasises the need to compare fea-
tures of texts with language norms, and suggests using a corpus for this pur-
pose. He also stresses the necessity of using a large body of data, so that reliable
generalisations can be made about typical language use.
data. An attendant danger in using a large corpus is that the researcher may
feel swamped by the huge amount of data s/he is faced with. It is necessary
therefore to exploit whatever corpus tools are available to the best effect in
order to render the task more manageable.
Relatively few studies to date have used computer corpora to examine
language and ideology. Of these, most have looked at grammatical or lexi-
cal choice, concepts of key importance in CDA. Pronoun use is examined in
Stubbs’ (1992) study of sexism and language. Stubbs and Gerbig (1993) con-
sider transitivity choices in their study of the encoding of causation and agency
in a comparison of geography textbooks (see also Stubbs 1996), as do Galasin-
ski and Marley (1998) in their comparison of representations of the foreign
in the British and Polish press, and Jeffries (2003) in her article on the re-
porting of the 1995 Yorkshire drought. Examples of studies considering lexi-
cal choice are Caldas-Coulthard’s (1993) article on representations of women
in the news, Krishnamurthy’s (1996) study of the words ethnic, racial and
tribal, Stubbs’s (1996) work on corpus analysis and ideologically significant
language use, Hardt-Mautner’s (1995) analysis of representations of the EU in
the British press, Alexander’s (1999) work on business texts concerning ecolog-
ical issues, Bayley’s (1999) study of British parliamentary debates on European
integration, and Fairclough’s (2000) analysis of New Labour rhetoric.
Methods that are common to all of the above studies are: the compari-
son of frequencies, and the analysis of the syntagmatic environment of key
words. The basic software tool used to highlight typical collocational and syn-
tactic patterns is the concordancer, although some researchers (e.g. Louw 1993;
Krishnamurthy 1996; Stubbs 1996) make extensive use of collocational soft-
ware tools to automate the process of identifying the most significant collocates
of a word. Lists of significant collocates gathered in this way provide a seman-
tic profile of a word, and thus enable the researcher to gain insight into the
semantic, connotative and prosodic meanings of a word. This idea goes back
to Leech’s (1974) notion of collocative meaning (i.e. words have a tendency to
take on the meanings of their habitual collocates), and Sinclair’s (1991) idea of
semantic prosody (i.e. the connotative meanings of words can be coloured by
the collocates they attract, e.g. set in collocates with negative words such as rot,
decay etc.).
Krishnamurthy (1996) adds a diachronic aspect to his study by comparing
frequency data for ethnic, racial and tribal taken from the pre-1985 Birming-
ham Collection of English Texts (18 million words) with data from the post-1985
Bank of English (167 million words).
JB[v.20020404] Prn:24/01/2005; 15:21 F: IJC10103.tex / p.4 (235-272)
Debbie Orpin
The starting point for this study was a smaller piece of research I carried out in
1995 into the words sleaze and corruption. The stimulus for that research origi-
nated in the observations that: (a) sleaze shared some areas of semantic overlap
with corruption (i.e. denoting the abuse of a position of power for personal or
financial advancement), (b) sleaze was also used to refer to sexual misconduct,
(c) sleaze seemed to be the generally preferred choice, rather than corruption,
when referring to events in public life in Britain and the US (for example it
was used when talking about certain British politicians accused of accepting
bribes). For my data, I consulted the Bank of English corpus, which at that time
contained 167 million words of text. I found that use of the word sleaze was
restricted to the media data in the corpus (i.e. data from British and American
newspapers, journals, and broadcast news), and that it was more frequent in
the American data. The observation that the word sleaze was the preferred me-
dia choice over corruption when referring to misconduct in British and US pub-
lic life was borne out: of the 215 citations of sleaze, all but two instances referred
to British and US contexts. However, similar financial or political malpractice
in Southern and Eastern Europe, Africa, Asia and South America was typi-
cally referred to as corruption. Furthermore, on examining the concordances
in greater detail, it seemed that corruption had a greater negative connotation
than sleaze. Finding in my study that the words sleaze and corruption covered
some of the same semantic area, but were connotationally different, and were
used in different geographical contexts, raised questions as to what the ideolog-
JB[v.20020404] Prn:24/01/2005; 15:21 F: IJC10103.tex / p.5 (272-312)
ical implications may be, and whether other lexical choices made by the media
when talking about corruption in public life showed similar geographical re-
strictions. I decided therefore to carry out a more detailed study, extending
the set of nouns under examination, and using CDA theory to interpret the
implications of the results.
. Method
The data consulted was again drawn from the Bank of English corpus. But by
the time of this later study, the corpus had been updated and stood at 323 mil-
lion running words (it has since been updated again). All of the data dated from
between 1990 and 1996. The corpus was divided into 17 sub-corpora, each of
which could be accessed separately if so desired, and each containing data from
a different source (e.g. British spoken data, British books, American books,
British magazines, British journals, British radio news broadcasts, American
radio news, and various British, American and Australian newspapers). Four
of the sub-corpora represented data from British newspapers. These were the
Guardian, the Independent, the Times and the (now defunct) tabloid Today.
These four sub-corpora together contained over 800 texts. Owing to time con-
straints, I decided to limit the scope of the detailed study to an examination
of the lexical choices made in these four British newspaper sub-corpora. How-
ever, the general semantic profiles of the words were constructed using data
from the entire 323 million word Bank of English corpus.
In extending the research, the aim was to assemble a set of nouns which were
synonyms, near-synonyms, or hyponyms of corruption, i.e. a set of nouns
that represented choices on the paradigmatic axis. To do this, two thesauri
were consulted, and a further list of nouns was added by looking through a
computer-generated list of the most significant collocates of corruption in the
corpus. The selected nouns also needed to occur frequently in the Guardian, In-
dependent, Times and Today sub-corpora. To qualify definitively for inclusion
in the set of nouns to be examined, each noun had to have at least 15 citations
in the four British newspaper sub-corpora. The final set of 8 nouns assembled
JB[v.20020404] Prn:24/01/2005; 15:21 F: IJC10103.tex / p.6 (312-362)
Debbie Orpin
Having assembled the set of lexical items, the next task was to gather infor-
mation on their overall frequencies in the Bank of English corpus. In order to
discover whether the frequency of use of these items had changed over time, I
also compared the Bank of English frequencies with frequencies from the pre-
1985, 18 million word Birmingham Collection of English Texts. Since the 323
million word Bank of English is approximately 18 times the size of the Birm-
ingham Collection of English Texts, a reasonable comparison could be made by
simply multiplying the frequencies obtained in the earlier corpus by 18.
To gain an idea of the language variety, mode, genre, or discourse com-
munity in which each of the lexical items is typically used, their distribution
across the 17 sub-corpora of the Bank of English was examined. The distribu-
tion of a word is generally calculated as the average number of times it occurs
per million words of text in a given sub-corpus. Unfortunately, by the time of
this study, although frequency data from the Birmingham Collection of English
Texts was still available, concordances were not. It was therefore impossible to
eliminate from the pre-1985 frequency data instances of the major senses of
graft (i.e. those relating to surgical procedures and hard work), and count only
the instances for the “corruption” sense. The diachronic comparison of fre-
quency data for graft is therefore unreliable. However, the data relating to the
distribution of graft in the Bank of English data remains valid, as it is based
solely on occurrences of the “corruption” sense of graft.
For each lexical item, the concordances from the Guardian, Independent, Times
and Today sub-corpora were manually scanned, to get an initial impression of
the typical contexts in which they were used. A list of the top 50 collocates of
each item was then obtained automatically using a collocation program draw-
ing data from the entire Bank of English corpus. This program takes the col-
locates from a span of four words either side of the node (or key word), cal-
culates the collocational significance (using a statistical measure called t-score)
of each collocate, and outputs a list of collocates in order of statistical signifi-
cance. Since graft is polysemous, all concordances relating to senses other than
JB[v.20020404] Prn:24/01/2005; 15:21 F: IJC10103.tex / p.7 (362-458)
the “corruption” sense were eliminated before the collocation program was
run. Finally, all the concordances from the four British newspaper sub-corpora
were manually scanned again in order to verify the connotational impressions
and geographical references obtained from the initial manual scanning and the
collocate lists.
. Results
. Frequency
In the post-1985 Bank of English corpus, corruption is by far the most frequent
of the 8 lexical items, followed by sleaze, bribery, graft, malpractice(s), impropri-
ety/ies, nepotism, and cronyism. In the pre-1985 Birmingham Collection of En-
glish Texts, the frequency order was roughly similar: corruption, bribery, graft,
nepotism, impropriety, malpractice, and cronyism. Sleaze is remarkable in that it
was completely absent in the earlier corpus, but has 1,152 occurrences in the
later one (and is the second most frequent member of the set).
Table 1 shows the change in frequency of each of the lexical items between
the pre-1985 period and the post-1985 period. The figure shows the raw fre-
quencies of the items in the earlier corpus; a calculation of what their expected
frequencies would be in the later, larger corpus (if usage remained stable); and
the actual frequencies in the later corpus. The final column shows the rela-
tionship between the actual frequency and the expected frequency in the later
corpus, expressed as a percentage.
The data shows that all but one of the items (nepotism) underwent a
greater-than-expected increase in frequency between 1985 and 1996. The fig-
ures are not the same for all the items, though: bribery shows only a slight in-
crease in actual versus expected frequency (9.06%), while corruption and graft
have increased by just over 50% (the true picture for graft may, however, be
masked by the frequencies of references to surgery or hard work); impropri-
ety/ies shows an increase of just over 100%, and cronyism and malpractice(s)
have risen by over 300%. As mentioned earlier, sleaze was absent in the earlier
corpus, but is the second most frequent item in the later one.
JB[v.20020404] Prn:24/01/2005; 15:21 F: IJC10103.tex / p.8 (458-468)
Debbie Orpin
. Distribution
The information about distribution comes solely from the post-1985 Bank of
English corpus. Owing to constraints of space, I will not show the details of the
distribution of the 8 items across all the 17 sub-corpora. However, I can report
that all of the items were used most frequently in the media sub-corpora. This
indicates that the activities denoted by these items were of current concern in
the public domain between 1990 and 1996. Graft proved to be far more fre-
quent in the American books sub-corpus than the British books sub-corpus.
Half the citations for cronyism (the least frequent word of the set) were found
in the Economist and Australian news sub-corpora. Interestingly, sleaze (which
in my earlier study, based on the 167 million word Bank of English corpus had
been far more frequent in the American media data than in the British data)
was now found to be more frequent in the British media data. This suggests ei-
ther that concern about the subject had fallen in the US while it rose in Britain,
or that a different term was now being used in the US.
In the sub-corpora that I decided to examine in more detail, i.e. the
British newspaper sub-corpora (Guardian, Independent, Times and Today), the
raw frequencies are highest for corruption, sleaze, and bribery, and lowest for
graft and cronyism (N.B. the figure for graft refers only to citations for the
bribery/corruption sense) (see Table 2).
JB[v.20020404] Prn:24/01/2005; 15:21 F: IJC10103.tex / p.9 (468-527)
Debbie Orpin
graft
. . . an attempt by the New York city council to create an independent
agency to monitor police graft
. . . her wealth was obtained by graft and corruption, he said.
. . . graft and illegal campaign contributions have been the lifeblood of
Italy’s political system
. . . Milan’s anti-graft magistrates
. . . South Korean graft scandals taint the entire system
. . . the wholesale graft uncovered in Italy
impropriety/ies
. . . she was worried about alleged improprieties in the [White House]
travel office.
. . . Mr Aitken and Mr Howard were cleared of any impropriety
. . . General of Oflot, also denied any impropriety in accepting free flights.
. . . allegations of financial impropriety were made against him
The report finds no evidence of impropriety in the conduct of the Matrix
Churchill prosecution.
. . . deny any hint of sexual impropriety.
. . . rumours of sexual impropriety, strongly denied,
. . . nobody is suggesting impropriety.
malpractice(s)
. . . from office for alleged electoral malpractice
Company management has strenuously denied accusations in the press of
financial malpractice.
. . . 7m awarded to Trevor in a medical malpractice suit
Obstetric accident claims account for the largest individual awards in
medical malpractice suits.
. . . the government inquiry cleared the council of malpractice.
. . . a combination of police malpractice and judicial complacency
Blair has been beset by reports of malpractice in Lambeth, Birmingham,
Hackney
. . . fraud and serious malpractices in Whitehall. . .
nepotism
Corruption, incompetence and nepotism are the hallmarks of the new-
look NHS.
. . . Monklands councillors are accused of nepotism and granting sectarian
favours.
. . . Labour town halls are centres of nepotism and malpractice
JB[v.20020404] Prn:24/01/2005; 15:21 F: IJC10103.tex / p.11 (580-706)
The concordances indicate that, at the time represented by the data (1990 to
1996), bribery, corruption, graft and sleaze were the objects of allegations and
scandals. Bribery and corruption are mentioned in connection with various
countries (e.g. Britain, Pakistan and Italy), while graft appears to be strongly
connected with Italy. Sleaze is particularly associated with British politics:
the Tories are the targets of accusations of sleaze, while Labour are the ac-
cusers. There is also reference to a fundraising Sleaze Ball; and the pop singer,
Madonna, is termed the Queen of Sleaze. Nepotism also appears to be asso-
ciated with British politics (in particular Labour town councils) and the Na-
tional Health Service (NHS). There were only 16 citations for cronyism in the
four sub-corpora under investigation, and they mostly referred to American
politics; whereas citations for impropriety and malpractice seem to refer largely
to British contexts. We see that impropriety can be of a financial or sexual na-
ture, or the precise nature may be unspecified. Similarly, malpractice can be
financial, medical or unspecified.
Debbie Orpin
.. Domains
The collocates of a word can give an indication of which areas of life, people
and places the word is associated with. All the items in the set are linked with
politics, public office and officialdom (see Table 5).
Bribery is further linked with the field of business and sport, collocating
with Branson (the businessman), betting, cricket, Salim Malik (a Pakistani crick-
eter) and match-fixing; nepotism and sleaze also show connections with sport,
collocating with football.
Malpractice, on the other hand, is connected more with financial, legal,
and medical institutions or practitioners, as is evidence by collocates such as
awards, BCCI (Bank of Credit and Commerce International), claims, costs,
doctors, insurance, lawsuits, lawyers, legal, medical, physicians, premiums, suit.
Debbie Orpin
.. Connotations
The activities with which a word is associated can be highlighted by its collo-
cates.
Debbie Orpin
set. Malpractice additionally collocates with negligence, reflecting the fact that
malpractice is often used in medical contexts.
dance line, containing the words dishonesty, favours and ruthlessness is repeated
four times.
Debbie Orpin
can be seen to have the least negative connotations of all the words in the set.
The semantic area that they share with corruption is that denoting bad profes-
sional practice, and neither impropriety nor sleaze collocate with words denot-
ing crimes, nor do they have collocates that indicate a negative speaker attitude.
Furthermore, the semantic profile of impropriety proved to be made up in part
of words which were non-specific in the actions they denoted.
1. US 28%
2. UK 16.5%
3. France 7.6%
4. Germany 5.8%
5. Japan 4.5%
6. Italy 4.1%
(Galasinski & Marley 1998: 569)
My data represents national as well as foreign pages. That would account for
the UK being mentioned more often than any other country. What is signifi-
cant, however, is that Italy ranks second, above the US, with Pakistan fourth,
above France. Furthermore, in my data Italy accounts for 11.5% of the cita-
tions of bribery, 11.5% of the citations of corruption and 30.8% of the citations
of graft. These figures are particularly striking given that Galasinski and Mar-
ley (1998) found that only 4.1% of coverage in the foreign pages of the British
JB[v.20020404] Prn:24/01/2005; 15:21 F: IJC10103.tex / p.19 (1216-1218)
Location (in order of Lexical items: % of citations of each item referring to location
frequency of references) stated in left-hand column
bribery corruption cronyism graft
1. U.K. 28.7 23.8 6.3 5.8
2. Italy 11.5 11.5 0.0 30.8
3. U.S. 2.6 4.0 50.0 13.5
4. Pakistan 17.8 22.3 0.0 0.0
5. France 3.7 6.0 0.0 5.8
6. China 2.4 4.8 0.0 7.8
7. S. Korea 3.1 3.3 0.0 5.8
8. Germany 2.4 1.8 0.0 1.9
9. India 1.6 2.3 0.0 0.0
10. Malaysia 4.5 0.5 0.0 0.0
11. Belgium 2.4 1.3 0.0 0.0
12. Spain 0.3 1.8 0.0 0.0
13. Japan 1.0 0.8 0.0 0.0
Others 5.6 24.8 43.7 32.1
Location (in order of Lexical items: % of citations of each item referring to location
frequency of references) stated in left-hand column
impropriety malpractice nepotism sleaze
1. U.K. 76.8 72.4 53.7 79.0
2. Italy 1.5 1.5 1.6 0.9
3. U.S. 8.1 4.5 1.6 3.9
4. Pakistan 1.0 0.8 0.0 0.0
5. France 0.5 .0 0.8 1.6
6. China 0.0 1.5 0.8 0.0
7. S. Korea 0.0 0.8 1.6 0.0
8. Germany 0.0 0.0 0.0 0.2
9. India 0.0 0.0 0.8 0.0
10. Malaysia 0.0 0.8 0.0 0.0
11. Belgium 0.0 0.0 0.0 0.0
12. Spain 0.0 1.5 0.0 1.2
13. Japan 0.5 0.8 0.0 0.0
Others 3.5 4.4 11.3 1.4
press are devoted to Italy. Similarly, the 17.8% of the citations of bribery that re-
fer to Pakistan is notable, as is the fact that a number of Third World countries,
China, South Korea, India and Malaysia, rank more highly than Galasinski and
Marley’s (ibid.) data would lead one to expect. As predicted, the majority of
the citations of sleaze refer to British contexts, although a small proportion
JB[v.20020404] Prn:24/01/2005; 15:21 F: IJC10103.tex / p.20 (1218-1266)
Debbie Orpin
. Discussion
Compared with the earlier study I made into corruption and sleaze (using the
167 million word corpus), where all but two instances of sleaze referred to
British or US contexts and almost all of the citations of corruption referred
to Southern European or Third World countries, the data in this study did not
show quite such a marked split. The more clearly negatively connotative bribery
JB[v.20020404] Prn:24/01/2005; 15:21 F: IJC10103.tex / p.22 (1322-1383)
Debbie Orpin
and corruption (and to a lesser extent graft) are seen to be chosen to describe ac-
tions in Britain as well as sleaze. This could be evidence of a shifting awareness
in Britain, an awareness that corruption does not only happen abroad. Indeed
at the time a public inquiry, the Nolan inquiry, had been set up to investigate
standards in public life. Examples from the British news sub-corpora illustrate
this awareness:
. . . bribery of an MP should be a criminal offence
When such a system operates overseas, Tory MPs call it corruption.
Nolan cannot be used as a carpet under which the endemic corruption in
our political system is swept.
. . . until recently people did not believe MPs were involved in graft
The fact that most citations for nepotism were applied to British contexts might
be further evidence of this shifting awareness. To cite another corpus example:
There was a time when nepotism and fleecing the public purse were asso-
ciated with Third World countries, while our government and civil service
were held up as models of rectitude. Alas, now we have fallen.
The decision to choose to use the word sleaze or impropriety or bribery, corrup-
tion or graft in a given context can thus be seen to reflect an ideological stance.
Where no shift in attitude among the British press was apparent is in
its tendency to choose words with greater negative associations to talk about
events abroad. Sleaze and its near-synonym impropriety are very infrequent
choices when Italy is written about, and are not chosen at all to talk about
Pakistan, China, South Korea, India or Malaysia. Even the word malpractice,
which was seen to have a greater negative semantic profile than sleaze and im-
propriety but a lesser one than bribery, corruption and graft, was found not to
be the preferred choice for events in these countries.
This might well have the effect of reinforcing existing stereotypes. As Fair-
clough (1995: 12) makes clear, the media are instrumental in reproducing ide-
ology precisely by representing different groups in certain ways. Above all,
. . . if particular lexical and grammatical choices are regularly made, and if peo-
ple and things are repeatedly talked about in certain ways, then it is plausible
that this will affect how they are thought about. (Stubbs 1996: 92)
The data drawn on for this study dates back to the first half of the 1990s. Lan-
guage use changes over time and it is likely that, if one were to conduct a sim-
ilar study based on more contemporary data, one might find further changes
JB[v.20020404] Prn:24/01/2005; 15:21 F: IJC10103.tex / p.23 (1383-1429)
. Conclusion
Acknowledgements
References
Debbie Orpin