Applying Corpus Linguistics To Pedagogy: A Critical Evaluation

Applying corpus linguistics to pedagogy
A critical evaluation*
Lynne Flowerdew
Hong Kong University of Science and Technology
This article reviews and discusses four somewhat contentious issues in the ap-
plication of corpus linguistics to pedagogy, ESP in particular. Corpus linguistic
techniques have been criticized on the grounds that they encourage a more
bottom-up rather than top-down processing of text in which concordance lines
are examined atomistically. One criticism levelled against corpus data is that a
corpus presents language out of its original context. For this reason, some corpus
linguists have underscored the importance of ‘pedagogic mediation’ to contex-
tualize the data for the students’ own writing environment. Concerns relating to
the inductive approach associated with corpus-based pedagogy have also been
raised as this approach may not always be the most appropriate one. A final con-
sideration relates to the issue of whether a corpus is always the most appropriate
resource to use among the wealth of other resources available.
Keywords: top-down, decontextualisation, pedagogic processing, inductive,

data-driven learning
1. Introduction
Corpus linguistics is usually associated with a phraseological approach to analysis,

which takes a syntagmatic, as opposed to a purely paradigmatic, view of language.
Corpus analysis, in fact, gives both a paradigmatic and syntagmatic view of lan-
guage as concordance output can either be “read” vertically, i.e. paradigmatically, in
line with the slot and filler notion espoused by substitution tables, or horizontally,
i.e. syntagmatically, from a phraseological perspective. Following Sinclair (1999,
2004a) the lexical item has primacy, with its core meaning and semantic prosody
as obligatory categories, and collocation, colligation and semantic preference con-
sidered as optional categories. An interweaving of some or all of these categories
gives what Sinclair refers to as an ‘extended unit of meaning’, although in later
work Sinclair (2004b: 280) extends this concept to the ‘maximal approach’ which:
International Journal of Corpus Linguistics 14:3 (2009), 393–417. doi 10.1075/ijcl.14.3.05flo

issn 1384–6655 / e-issn 1569–9811 © John Benjamins Publishing Company
394 Lynne Flowerdew
“… would be to extend the dimensions of a unit of meaning until all the relevant
patterning was included — all the patterning that was instigated by the presence of
the central word. …[W]e should extend the unit until the ambiguity disappears”.
In this phraseological approach recurring patterns in concordance output have
revealed how language can follow certain tendencies according to Sinclair’s notion
of ‘an extended unit of meaning’ rather than being bound by hard-and-fast rules.
By way of example, Danielsson (2007: 18) presents a methodology using raw
frequency data as a step towards the identification of meaningful units, for which
as she points out “there are no accepted answers to simple questions such as ‘What
exactly constitutes a multi-word unit?’ or ‘Where does a multi-word unit begin
and end?’” Taking the word jam as the node, Danielsson identified its most fre-
quent collocates in the BNC, with traffic the top collocate. Concordance lines for
the node and its collocate were then generated, with a span of 4+4 words either
side of the node. This investigation showed a to be the most frequent collocate in
the lines including both traffic and jam, e.g., stuck in a traffic jam with your pulse;
man in a traffic jam who curses (ibid.: 19). A further calculation was made to find
out the most frequent collocate in the lines for traffic and jam, with the second
collocate a generated. This showed in to be the next most frequent collocate, e.g.,
stuck in a traffic jam, you might reflect (ibid.: 20). This procedure was repeated until
no other collocates were found to occur above the cut-off point of 5 (an arbitrary
point that, Danielsson admits may have to be revised). The search ended with the
unit stuck in a traffic jam: “From here, no other collocates occur with sufficient fre-
quency to reach the cut-off point, and we may claim to have achieved the maximal
unit [my italics] based on the distribution in this corpus” (p. 20). Danielsson then
considers the paradigmatic axis testing each word to see if the unit allows for any
alternatives, finding that the corpus offers sitting, waiting and caught as alterna-
tives to stuck. It is to be noted that these alternatives reflect the syntagmatic nature
of semantic prosody (the frequent co-occurrence of a lexical item with items ex-
pressing a positive or negative evaluation) and semantic preference (the frequent
co-occurrence of an item with those from a particular semantic set): “In this par-
ticular context, the words seem to be related to offer a set of verbs that create a
feeling associated with the annoying event of being held up in traffic” (p. 20).
Some researchers (cf. Tognini-Bonelli 2001) see this phraseological approach
associated with corpus linguistics as a catalyst in redefining aspects of linguistic
theory. This is regarded as the corpus-driven approach (as opposed to the corpus-
based one), in which the data is approached without any pre-conceived notions
in relation to how it should be analysed. Other corpus linguists take a less ex-
treme approach; for example, McEnery et al. (2006: 6), while considering corpus
linguistics as “a new philosophical approach to linguistic enquiry” with its own
theoretical status, do not view it as a discipline in its own right with its own theory,
Applying corpus linguistics to pedagogy 395
but rather as a methodology. The same position can be applied to data-driven

learning (DDL); it is generally agreed there is no underlying theory as such, but
rather it rests on a methodology which can uncover facts about language hitherto
unexplored.
Notwithstanding the advantages of this approach for DDL, during the last few
years some accounts in the literature have adopted a more critical stance, drawing
attention to potential drawbacks of using corpora in DDL. This paper reviews the
following key issues in the debate on applying corpus linguistics to pedagogy.
– Corpus linguistic techniques encourage a more bottom-up rather than top-
down processing of text in which truncated concordance lines are examined
atomistically.
– Corpus data are decontextualised and, for this reason, may not be directly
transferable to students’ own context of writing.
– Corpus-based learning is usually associated with an inductive approach to
learning, in which rules, or indeed patterns, are derived from multiple ex-
amples, rather than a rule-based deductive approach.1 This approach might
not be the most appropriate choice for some students.
– There are different types of corpora (general, specialized, learner) and differ-
ent types of online resources (dictionaries, grammars). Students may have dif-
ficulty in selecting the most appropriate corpus and resource for a particular
query.
The issues outlined above are not in fact discrete issues but inter-related, as the fol-
lowing discussion shows. They are examined specifically with reference to corpora
of written text.
2. Corpus linguistic techniques encourage a bottom-up processing of text
Corpus linguistic techniques have been criticized for encouraging a more bottom-
up rather than top-down processing of text in which truncated concordance lines
are examined in a somewhat atomistic fashion without recourse to the overall
discourse (Swales 2002, 2004). Like Swales, Kaltenböck & Mehlmauer-Larcher
(2005: 71) have expressed similar sentiments: “There are, however, certain parts
of a text that even a concordancer cannot reach. These are aspects of the macro-
structure of a text, such as textual moves, i.e. a unit of text that expresses a specific
communicative function”. However, in the last couple of years corpus linguistics
research has paid much more attention to these two different modes of text pro-
cessing (Flowerdew 2003, 2005, forthcoming a), with Biber et al. (2007a) explain-
ing the concept behind these two different, yet complementary, approaches thus:
396 Lynne Flowerdew
In the ‘top-down’ approach, the functional components of a genre are determined

first and then all texts in a corpus are analysed in terms of these components. In
contrast, textual components emerge from the corpus analysis in the ‘bottom-up’
approach, and the discourse organization of individual texts is then analysed in
terms of linguistically-defined textual categories.
(Biber et al. 2007a: 11)
These two different starting points can be illustrated by reference to the studies
of Kanoksilapatham (2007) and Jones (2007). Kanoksilapatham (2007) in her
corpus-based examination of rhetorical moves in biochemistry research articles
commences with a top-down analysis by first developing an analytical discourse-
based framework through identifying the move types that can occur in each sec-
tion of biochemistry research articles, before embarking on the corpus analysis. A
corpus analysis, using Biber’s multi-dimensional analysis, was subsequently car-
ried out to determine the linguistic characteristics of different rhetorical moves.
Jones’ (2007) research, on the other hand, begins with a corpus analysis to identify
the linguistic characteristics of vocabulary-based discourse units (VBDUs), using
software which specifically highlights new words not found in the preceding ad-
jacent stretch of discourse. For example, in the methods sections of biochemistry
articles, new linguistic items such as was stored, was extracted, denoting “Proce-
dural description of past actions”, were found.
As Biber et al. (2007b) point out, the different starting points of a top-down
functional analysis vs. a bottom-up linguistic analysis will yield differences: “Fif-
teen different move types were identified in the analysis of biochemistry research
articles…. In contrast, only 6 different discourse types were identified in the VBDU
study” (p. 249). However, at the same time “the inherent structure of a genre would
be reflected in analyses undertaken from both perspectives” (ibid.: 249), as there
are areas of overlap in that the VBDU type of “Procedural description of past ac-
tions” can be mapped onto one of the moves in the Methods section “Describing
experimental procedures”.
It is generally acknowledged that exploitation of corpus linguistics findings
takes time to percolate through to pedagogic applications (Braun 2005; Franken-
berg-Garcia 2006). Interestingly, it is in the area of English for Specific Purposes
where corpora are now taking on an increasingly mainstream role (Belcher 2006;
Flowerdew, forthcoming b) with the compilation of small “localised” corpora often
compiled by the class tutor, or sometimes the students (Flowerdew 2004). Indeed,
this is evidenced by the studies outlined below which report a more discourse-
based pedagogic application of corpora, combining top-down and bottom-up ap-
proaches to text analysis.
2.1 Moving from top-down to bottom-up processing
Charles (2007), mediates between top-down and bottom-up processing of aca-

demic thesis writing in order to introduce her students to the rhetorical function
of “defending your work against criticism”, a two-part rhetorical pattern, “in which
the writer first concedes the possibility of criticism and then moves to neutralize
its potentially negative effect” (p. 289). Charles achieves this by first having her stu-
dents engage in initial macro discourse-based tasks as consciousness-raising ac-
tivities of the discourse pattern. In this initial stage, students are introduced to the
rhetorical function, its purpose in the text and the different ways in which it can be
realised. Students discuss the insights that they have gained in their group discus-
sion in a whole-class feedback session. Charles then moves from this more top-
down to bottom-up analysis by having students perform corpus searches which
focus on specific lexico-grammatical structures within a discourse-based frame-
work. For instance, students concordance on salient (in the sense of noteworthy)
items with the aim of formulating a generalization about the use and positioning
of while in constructing a concession. Students are asked to examine the lexico-
grammar for constructing a concession, noting that while co-occurs with acknowl-
edge, as in Table 1 below, or that it may also appear in the context of appear/seem/
may, e.g. While this may seem contradictory to the conclusions drawn above….
Table 1. Anticipated criticisms and writers’ defence (from Charles 2007: 294)
Extract Anticipated criticism
A. Although the results of the experiments are not conclusive,
B. There is nothing essential in these categories, and they may not appear tenable to
other scholars.
C. While I acknowledge that in some cases the distinction between institutions and
groups may seem rather arbitrary,
D. Unfortunately, specimen preparation is especially laborious for the completed device
structure, which meant that
A more top-down approach, similar to that adopted by Charles, is also found in

Weber’s (2001) materials aimed at law students. First, Weber’s students were in-
ducted into the genre of legal essays by reading through whole essays taken from
the University of London LLB Examinations written by native speakers, and iden-
tifying some of the prototypical rhetorical features, e.g. identifying and/or delimit-
ing the legal principle involved in the case. They were then asked to identify any
lexical expressions which seemed to correlate with the genre features. This was
followed up by consulting the corpus of the legal essays to verify and pinpoint
regularities in lexico-grammatical expressions.
398 Lynne Flowerdew
Another corpus linguistics practitioner who commences from a top-down

perspective is Noguchi (2004). In fact, Noguchi had her science and engineering
majors build their own mini-corpora of research journal articles first before clas-
sifying the genre features and then moving on to examining prototypical lexico-
grammatical features.
It has been noted that Swales (2002: 163) has contrasted the “fragmented” world
of corpus linguistics with its tendency to adopt a somewhat bottom-up, atomistic
approach to text with the more “integrated” world of ESP material design with its
focus on top-down analysis of macro-level features. However, Weber’s tasks, and
those by Charles and Noguchi, seem to be achieving a “symbiosis” between these
two approaches, as called for by Partington (1998: 145).
2.2 Moving from bottom-up to top-down processing
Hyland’s (2007, 2008) work on genre-specific phraseological routines commenc-

es from a bottom-up perspective. Hyland (2007) tabulates the most frequent 50
4-word bundles across four disciplines (biology, electrical engineering, applied
linguistics, business studies), noting the great extent to which these are specific
to particular disciplines. These bundles are then classified into three broad foci of
research, text and participants, as outlined below (Hyland 2007: 13–14):
Research-oriented − help writers to structure their activities and experiences of
the real world, e.g.:
Procedure (the use of the, the operation of the)
Quantification (the magnitude of the, the surface of the)
Text-oriented − concerned with the organization of the text and its meaning as a
message or argument, e.g.:
Structuring signals — text-reflexive markers which organize stretches of discourse
(in the present study, in the next section)
Framing signals − situate arguments by specifying limiting conditions (in the case
of, with respect to the)
Participant-oriented − these are focused on the writer or reader of the text, e.g.:
Engagement features − address readers directly (it should be noted that, it can be
seen)
Hyland (2004: 220) notes the high productivity of the bundle the * of, which he ar-
gues justifies its inclusion in courses assisting students to write effective academic
papers in the sciences. For example, two frequent bundles in biology, the presence
of and the splicing of “would seem to offer students valuable forms for expressing
meanings relating to existence and to research processes in their writing” (ibid.).
Pedagogic applications with a bottom-up starting point moving to more top-

down processing have also been noted by Flowerdew (2006). Besides lexical bun-
dles, another type of phraseological routine is collocations. Usually, collocations
are considered as word combinations, one of the most common being adjective +
noun, which is a pairing of particular difficulty for advanced learners of English
(Nesselhauf 2003, 2004). However, such collocations can also be involved in more
top-down processing of text. For example, Flowerdew (2006) notes that in a mod-
ule on business letter writing, students were not sure which adjective from a set
of seemingly semantically synonymous adjectives was the “right” one to choose in
the following sentence.
Thank you for your kind / sincere / cordial invitation to the alumni dinner.
A search on these different combinations in a Business Letters Corpus revealed the

following patternings, shown in Figures 1 and 2 below.2
hoping, in fact, that you will accept our cordial invitation to be our guest for the length of
May we extend to you a cordial invitation to call in at White’s and make the
Please accept our cordial invitation to visit and become acquainted with
have the pleasure in extending to you our cordial invitation to visit our organization at a data
and I shall be pleased to extend to you my cordial invitation to visit our Tokyo office at your
Figure 1. Selected concordance lines for cordial + invitation
I very much appreciate your kind invitation to join the University Club and I know
Thank you very much again for your kind invitation and I hope your conference will be a
I am therefore very happy to accept your kind invitation and look forward to attending a great
as if I shall be unable to accept your kind invitation this time because of a most important
Thanking you once more for your kind invitation to address the audience, I remain
Figure 2. Selected concordance lines for kind + invitation
In order to determine the most appropriate collocation for this context, students
were required to look beyond the immediate collocation to an ‘extended unit of
meaning’, which takes in the subject + verb and direct and/or indirect object of the
sentence. In so doing, students were able to work out that cordial + invitation was
used for offering an invitation (May we extend to you a cordial invitation) whereas
kind + invitation was used for accepting an invitation or thanking the host (thank
you very much for your kind invitation), or as one of my students expressed it:
cordial is used from you to me, and kind from me to you. As Aston (personal com-
munication) has pointed out, it is important that students do not just consult the
corpus in a phrasebook type fashion but have something more substantial. Having
students “read” the corpus paradigmatically to find alternatives to the verbs extend,
400 Lynne Flowerdew
accept, thank, and their respective phraseologies would be a way of counteracting

this narrow “reading” of the corpus. Stubbs’ (1996: 36) oft-quoted principle “There
is no boundary between lexis and grammar: lexis and grammar are interdepen-
dent” is also of relevance here. Lexis and grammar have been shown to manifest
interdependency; for example, the subject of the intransitive verb set in very often
refers to “unpleasant states of affairs” such as bad weather (Sinclair 1991: 74). In a
similar fashion, collocations and functions can also be viewed as interdependent,
albeit at a more discourse-based level, as evidenced by the above analysis of the
collocations of “invitation” with kind and cordial.
Another example of more top-down processing leading on from a bottom-up
search is as follows. Flowerdew (2008b) notes that in a course on report writing
one student query related to whether the active or passive voice was used in the
following sentence:
This project focuses / is focused on the incidence of mosquitoes on campus.
A search on focus was conducted in an institutionally-compiled 7 million-word

corpus of reports, which gave the results shown in Figure 3 below.
Pattern Left sort Right Sort Frequency Sort

NOUN + VERB + PREP: e.g. “study focuses on” Show results 292
VERB + VERB + PREP: e.g. “has focused on” Show results 231
TO + VERB + PREP: e.g. “to focus on” Show results 95
ADV + VERB + PREP: e.g. “not focused on” Show results 94
PRON + VERB + PREP: e.g. “we focus on” Show results 57
CONJ + VERB + PREP: e.g. “that focus on” Show results 56
DET + VERB + PREP: e.g. “which focus on” Show results 38
VERB + VERB + ADV e.g. “has focused almost” Show results 34
NOUN + VERB + ADV e.g. “efforts focused primarily” Show results 33
TO +VERB + ADV e.g. “to focus more” Show results 31
Figure 3. Search for focus (all word forms) (Flowerdew 2008)
Besides the fact that the students were able to glean the different meanings be-
tween the active and passive forms of focus by examining the verb in a wider con-
text, accessed via “Show results” (column three of the Table), I also found that this
search encouraged a more top-down processing of text. Students’ scrutiny of the
concordance output prompted one student to ask: Why are there so many occur-
rences of focus in the present perfect? This kind of comment which I have termed
a ‘triggered query’ because it is activated by something the student has alighted on
in the corpus data, unprompted by the teacher (Flowerdew 2008b), echoes Swain’s
(1998) concept of ‘noticing’. Swain (1998: 66) remarks that there are several levels
of ‘noticing’, one of which is that: “Learners may simply notice a form in the target
language due to the frequency or salience of the features themselves”. An examina-
tion of the wider context of the present perfect forms of focus revealed that this
tense was used when previous research was introduced, to set up a critical evalu-
ation of this work signalled by however. This discourse-based function of however
is therefore being used as a key signalling item in Swales’ (1990) CARS (create a
research space) model, opening up a gap for the author’s own research e.g. Much
of this cross-cultural work to date, however, has focussed on East Asian versus An-
glo comparisons, with little attention given to the issue of cross-cultural differences
within the East Asian region.
This type of browsing is thus in the spirit of Bernardini’s philosophy as the
‘learner as traveler’ (Bernardini 2004). Although the type of serendipitous learning
advocated by Bernardini (2000, 2002) has been mildly criticized as ‘incidentalist’
(Swales 2002), an example such as the one above illustrates that this ad hoc brows-
ing can encourage students to process corpus data in a much more top-down way.
In fact, both Granger (1999) and Hahn (2000) emphasise that the teaching of tens-
es should be approached from a discourse-based perspective and that a corpus is
an ideal medium for achieving this.
Another account of searches extending from bottom-up to top-down process-
ing is reported in Lee & Swales (2006). Their innovative corpus-informed EAP
course, entitled “Exploring your own discourse world”, required students to com-
pile their own corpora after working with specialized corpora and conduct more
genre-based enquiries. For example, using the BNCweb, students were sensitized
to the different discourse environments in which for instance and for example are
found.3
… for instance is used a lot more frequently in the social sciences and humanities
(where it often introduces casual, non-essential exemplifications of points, mainly
for emphasis or color), whereas in the natural sciences for example is clearly fa-
vored (being used to illuminate and clarify a difficult or complex point through
the exemplification).
(Lee & Swales 2006: 67)
The pedagogic applications reviewed above testify to the fact that traditional class-
room corpus-based explorations which tended to centre on a ‘vertical reading’
have now been complemented by a more discourse-based approach which requires
‘horizontal reading’ for the analysis of linguistic patternings in relation to their
communicative and cultural embedding (Braun 2005), and one could also add
here in relation to the practices of different academic disciplines (see Flowerdew,
in press, for further examples of corpus-based discourse approaches to writing
402 Lynne Flowerdew
instruction). In fact, Swales has now modified his position and acknowledges this
more top-down orientation, as reported by Lee (2008).
It can be seen that utilizing a more top-down approach to processing cor-
pus data provides more co-text, and hence more contextual information on the
corpora under investigation by shedding light on different practices of different
academic disciplines, as revealed by differences in lexico-grammatical patterning.
However, whether the starting point should be with a bottom-up or top-down ap-
proach is not an easy question to answer and very much depends on the nature of
the query and composition of the corpus. Starting with the moves (which could
be coded in the corpus) may be appropriate for those genres which have clearly
defined move structures, such as law cases with four obligatory moves: facts/stat-
ing history of the case; presenting argument; deriving ratio decidendi; pronounc-
ing judgment (cf. Bhatia et al. 2004), but difficult to implement for those genres
which are mixed or which display embedded moves (Flowerdew 2004). Biber et
al. (2007b: 241) compare these two different approaches, noting that which one is
adopted depends on the primary basis of the analysis:
Functional analysis is primary in top-down approaches; functional distinctions
are determined on a qualitative basis, to determine the set of relevant discourse
types and to identify specific discourse units within texts. In contrast, linguistic
analysis is primary in bottom-up approaches; a wide range of linguistic distribu-
tional patterns are analysed quantitatively, again being used to determine the set
of relevant discourse types and to identify specific discourse units within texts.
(Biber et al. 2007b: 241)
3. Corpus data are decontextualised and may not be directly transferable
Corpus data have been viewed as decontextualised such that the findings may not
be directly transferable, lock stock and barrel, to pedagogy. This issue is discussed
below with reference to pedagogic applications in the field of ESP.
3.1 The issue of contextualisation in corpus data
Widdowson’s (2004) arguments on the decontextualised nature of corpus data are

well-rehearsed in the literature (see Flowerdew 2008a; Braun 2005; Kaltenböck &
Mehlmauer-Larcher 2005; McEnery et al. 2006), but it is worth reviewing them
again briefly. Both Aston (1995) and Widdowson (1998, 2002) have drawn atten-
tion to the decontextualised nature of corpus data, with Widdowson commenting
that corpus data are but a sample of language, as opposed to an example of authen-
tic language, because it is divorced from the communicative context in which it was
created: “the text travels but the context does not travel with it” (keynote lecture,
29 July 2002).
Whether Widdowson is correct or not would seem very much to depend on
what is being transferred. Charles (2007: 295) disagrees with Widdowson on the
issue of decontextualisation and maintains that one of the advantages of the type
of corpus work described in Section 2.1 above is that “… it allows students to gain
a greater sense of contextualization than is possible to achieve through the use of
paper-based materials”. While it is undoubtedly true that more top-down corpus
enquiries, by their very nature, provide more contextualization, the question of
the practices of different academic and professional disciplines needs to be taken
into account, as uncovered by the corpus-based enquiries of for instance and for
example in Lee & Swales (2006), which show just how finely nuanced differences
can be (also see Hyland 2000, 2002 for research studies in this area).
3.2 ‘Pedagogic processing’ of corpus data
Widdowson maintains that it may not be expedient to transfer corpus data directly
to pedagogic materials on account of the cultural or contextual inappropriacy of
the corpus data (see Cook 1998; Widdowson 1991, also cited in Seidlhofer 2003,
for a discussion on the issue of prescription vs. description, regarding the trans-
fer of corpus data to pedagogy). Widdowson therefore advocates adopting some
kind of ‘pedagogic processing’, as do other corpus linguists such as Braun (2007)
and McCarthy (2001) in order to transform samples of language into pedagog-
ically-accessible examples. This aspect of pedagogic mediation of corpus data is
discussed from the perspective of the “what” and the “how” below.
3.2.1 The “what” of pedagogic processing

Section 3.1 has shown that variation across disciplines needs to be considered in
the transfer of corpus data to pedagogy. Another aspect that needs to be con-
sidered concerns pragmatic appropriacy. Flowerdew (2008a) advises caution on
exploiting a corpus of reports, in which consultancy companies are advising ex-
ternal clients, for student report writing which requires them to write internally
to university authorities. The student writing is similar to the corpus of reports in
respect of the rhetorical Problem-Solution pattern. However, it would not be reg-
isterially appropriate for students to transfer the pattern: grammatical metaphor
noun (indicating a solution to a problem) + will + verb (signalling mitigation of
a problem) (e.g. Implementation of barriers will reduce noise) to their own report
writing in view of the different contextual features. Students would need to modify
the ‘frame’ (see Biber et al. 2004 and Stubbs 2004 for further examples of frames)
derived from the corpus of reports by supplying mitigation devices to attenuate the
404 Lynne Flowerdew
phrase to make it socio-culturally appropriate for writing to university authorities.

Thus, they would need to expand the original frame with the addition of a prefac-
ing phrase such as “we would like to suggest that…” and replace will reduce by the
more rhetorically appropriate would reduce. Corpus consultation has therefore to
be conducted with great care and it is not surprising that Widdowson (1998) sees
the need for some kind of ‘mediating process’ whereby students authenticate the
corpus data to suit the socio-cultural and linguistic parameters of their own writ-
ing in light of considerations relating to differences across disciplines and prag-
matic appropriacy.
3.2.2 The “how” of pedagogic processing

Having established in the previous sub-section that some type of pedagogic pro-
cessing may be necessary with some types of data, there still remains the question
of how this can be achieved.
In order to integrate the type of pedagogic processing Widdowson is referring
to so as to enable students to authenticate the corpus data for their own contex-
tual writing environment, Flowerdew (2008b) has adopted student peer response
activities, which draw on Vygotskian socio-cultural theories of co-constructing
knowledge through collaborative dialogue and negotiation (see O’Sullivan 2007
who gives a very insightful exposition on the role of cognitive and social construc-
tivist theories to foster corpus consultation literacy). In these peer-to-peer interac-
tion groups, weaker students were intentionally grouped with more proficient ones
to foster productive dialogue through ‘assisted performance’, thus drawing on an-
other aspect of socio-cultural theory. In this scaffolding-type of activity, more pro-
ficient students were able to offer their insights and interpretations on the corpus
data, thus assisting the weaker students to gradually develop more independence.
The author reports some success with this approach of incorporating group dis-
cussion activities revolving around the corpus data as a form of pedagogic media-
tion, resulting in consciousness-raising of register awareness, not only for the task
in hand, but also what might be appropriate phraseologies for other contexts. Peer
discussion also raised issues of what could be transferred from corpus data, i.e. the
use of nominalisations such as implementation, which led to further discussion as
to whether the gerund, implementing, would also be acceptable, and what would
not be appropriate for the context, i.e. the frame “It is recommended that…”, which
students mentioned sounded too authoritative. Students were therefore encour-
aged to engage in “collaborative metatalk” (Swain 1998: 68) to “use language to
reflect on language use” (ibid.). Gavioli & Aston (2001: 242) also advocate spoken
interaction among students in corpus consultation as “different learners will often
notice different things in concordances, and draw different conclusions”. Sugges-
tions for other types of pedagogic mediation of corpora have been given by Braun
(2005) for inclusion of video activities, by Milton (2006) for didactic written hints
built into the software, and by Vannestål & Lindquist (2007) for peer teaching.
Pedagogic mediation of corpora could well be assisted through the incorpora-
tion of contextual information in written texts to aid the transfer of corpus data to
pedagogy. Following Burnard (2004), Krishnamurthy & Kosem (2007) advocate
encoding the corpus with metadata to aid subsequent analyses. Although vari-
ous speech corpora such as the Michigan Corpus of Spoken Academic English,
MICASE, have been marked up with metadata categories such as the gender, age
range, academic position / role of the interlocutors, these are lacking in corpora
of writing.4 Corpora of business writing are especially context-sensitive and could
benefit from the inclusion of such metadata.
However, it should be noted that sometimes the co-textual environment can
provide clues to contextual information. In the business letters written by stu-
dents, the structure and use of appreciate* was found to be particularly problem-
atic across a wide range of students, with learners confusing the active and passive
forms, e.g. *I would be much appreciated if …, and omission of the object in the
active, e.g. *I would appreciate if…. The Business Letters Corpus referred to ear-
lier proved invaluable for alerting students to the correct structure. What students
were unsure of, though, was in which situations the active and passive forms were
most appropriate. Here, frequency counts and the co-text in the environment of
appreciate* provided valuable clues. The frame …appreciate it if … occurred 105
times, whereas there were only 9 instances of the frame It…appreciated if…, thus
suggesting some kind of marked use. In fact, scrutiny of the co-textual environ-
ment, i.e. the ‘extended unit of meaning’ revealed that the passive frame would
be used when the power relations between the addresser / addressee were quite
distinct and when a big favour was being asked. This example thus demonstrates
that corpora may not be completely devoid of context, which can sometimes, in
part, be recovered from the co-textual environment.
4. Corpus-based pedagogy is usually associated with an inductive

approach which may not be appropriate for all students
Both Gavioli (2005) and Meunier (2002) have noted the drawbacks of an inductive
approach, in which students extrapolate the rules, or patterning, from examples:
Despite their advantages, DDL activities have some drawbacks…. The various
learning strategies (deductive vs. inductive) that students adopt can lead to prob-
lems. Some students hate working inductively and teachers should aim at a com-
bined approach (see Hahn 2000 for a combined approach).
(Meunier 2002: 135)
406 Lynne Flowerdew
In common with Meunier (ibid.), I also believe that an inductive approach may
not appeal to students on account of their different cognitive styles (Flowerdew
2008b). Field-dependent students who thrive in cooperative, interactive settings
and who would seem to enjoy discussion centering on extrapolation of rules from
examples may benefit from this type of pedagogy. However, field-independent
learners who are known to prefer instruction emphasizing rules may not take to
the inductive approach inherent in corpus-based pedagogy. It is interesting to note
that Vannestål & Lindquist (2007: 343) state that some of the students in their
inductive corpus-based grammar course commented that “…they preferred the
more traditional way of reading about grammatical rules in the book and did not
feel that they learned anything by doing corpus exercises”.
Another reason as to whether an inductive or deductive approach is adopted
would very much seem to depend on the nature of a particular enquiry. If the
enquiry is based on a grammar rule (for example, the difference between for and
since in time expressions; see Tribble & Jones 1990), then the differences are quite
clear-cut. However, if the enquiry focuses on an aspect of phraseology, students
may find it difficult to extrapolate the tendencies associated with patterns in lan-
guage (Hunston & Francis 2000), as they may be confronted with conflicting ex-
amples which do not follow a particular pattern in all cases.
One area that posed difficulty for my students was that of ergativity. As noted
by Celce-Murcia (2002), overpassivisation of ergative verbs is an aspect that poses
particular problems for advanced learners:
With the verbs ‘increase’ and ‘decrease’ [the ergative] tends [my italics] to be used
when the inanimate subject is objectively or subjectively measurable (rather than
an animate agent/dynamic instrument object — both of which favor active voice
— or a patient subject — for the passive voice.)
(Celce-Murcia 2002: 146)
Students found it difficult to work out from a close reading of concordance lines
the correct choice of verb in the following sentence because of the probabilistic
nature of language when viewed syntagmatically:
With a very crowded schedule, students’ level of motivation was decreased / has
decreased.
Vannestål & Lindquist (2007) have commented on the difficulty students have in
interpreting corpus data and this aspect seems to be a particularly thorny issue
when phraseology comes into play. It would seem then that it is in order to supply
prompts or hints to enable students to work out the tendencies of phraseological
patterns. For example, in the case of the use of the ergative students could be given
a prompting question such as: “Do you notice any difference in the subjects for
was decreased and has decreased”?
In tackling corpus-based enquiries, Carter & McCarthy (1995) have formu-
lated the ‘3 Is’ strategy:
Illustration: looking at data
Interaction: discussion and sharing observations and opinions
Induction: (making one’s own rule for a particular feature)
However, based on the difficulties my students have encountered with induc-
ing phraseological tendencies, I would like to elaborate on the above model, by
proposing a ‘4 Is’ formulation, adding ‘Intervention’ as an optional stage between
Interaction and Induction. This would allow the inclusion of hints such as the
one mentioned above. Although in the literature on language teaching deductive
and inductive approaches are usually seen as polarities, the above discussion has
shown that clues and prompts can be used to mediate the inductive ↔ deductive
continuum. For this reason, the following dynamic paradigm for corpus investiga-
tions is proposed, which allows for finer-tuning of corpus queries.
Inductive Phraseology
(probabilities)
(Clues)
Deductive Grammar rules

Figure 4. Dynamic paradigm for corpus investigations
Implementing a more delicate approach to corpus queries would help to reduce

some of the difficulties associated with interpretation for students, especially when
they are engaged in working out phraseological tendencies. As pointed out by
Gardner (2007) it is this combinatorial nature of lexis and grammar which poses
problems:
…it is likely that only the most advanced language learners can take advantage of
the intricate semantic relationships between words that are revealed through con-
cordancing. Certainly, such an approach to language training presupposes that
learners will know most of the words (cotext) that surround a key word or phrase
in context (KWIC), and that they can connect their meanings — an assumption
that seems unreasonable for many groups of language learners (children, begin-
ning L2 learners, learners with low literacy skills etc.)
(Gardner 2007: 255)
408 Lynne Flowerdew
Corpora are useful for phraseological enquiries (cf. Granger & Meunier 2008,
Meunier & Granger 2008) as the language which falls between lexis and gram-
mar is often not easily retrievable from grammars or dictionaries. However, some
intervention in the form of clues or hints may be needed to enable students to con-
nect meanings. Conversely, while hard-and-fast grammar rules may be easier for
students to glean from corpora, a corpus, or indeed a particular sub-corpus, may
not be the best, or most efficient, resource for consultation. This issue is the focus
of the following section.
5. Which corpus and which online resource?
Chambers (2005) and Chambers & O’Sullivan (2004) have underscored the impor-
tance for students of having the ability to select appropriate electronic resources:
The concept of literacy now includes not only the knowledge and skills which are
traditionally associated with that concept, but also the ability to select, evaluate
and use the electronic tools and resources appropriate for the activity which is
being undertaken.
(Chambers & O’Sullivan 2004: 158)
In this respect, Davies (2004) reports on a program on student use of three main
corpora for examining syntactic variation in Spanish, noting that sometimes the
students’ intention was to use a corpus that was not the most appropriate for the
research question they had formulated.
In my own class of report writing referred to earlier in the article, students
wanted to know which of the verb collocations below was the most appropriate
for survey.
We plan to do / carry out / conduct a survey on the use of computers.
Students considered the 7-million word sub-corpus of reports to be ideal for

searching the noun survey and expected that it would show correct verb + noun
collocations. Although the corpus data displayed useful verbs to collocate with the
noun survey, these were not easy to discern. There was a lot of ‘noise’ as students
were required to read through quite a number of concordance lines to identify
appropriate verb + noun collocations for their context of writing, as evidenced by
the results shown in Figure 5.
This problematic example above then gave me the opportunity to remind stu-
dents of another program, JustTheWord.5 The screenshot below shows this to be
a more appropriate online tool to use, with the cluster feature of particular use as
the collocations are grouped semantically. In Figure 6 below, a glance at Cluster 1
Words Left sort Right Sort Show PoS? Frequency Sorted

Response rate to a survey from See contexts 3
And hcfa distributed a survey to See contexts 2
Response rate to a survey of See context 2
Response rates to a survey form See contexts 2
Thinking about conducting a survey to See contexts 2
$150,000 to undertake a survey and See contexts 1
1998 report on a survey by See contexts 1
2 we sent a survey to See contexts 1
Acquisition venter/footnote33/sent a survey on See contexts 1
Addition to mailing a survey of See contexts 1
And employment funded a survey of See contexts 1
And francis used a survey to See contexts 1
Figure 5. Search for a survey
V obj N*
cluster 1
carry out survey
conduct survey 132

take in survey 157
cluster 2
23
mention in survey
9
quote survey
9
cluster 3
complete survey 44
do survey 84
cluster 4
publish survey
30
10
report in survey
unclustered
46
base on survey
11
come in survey 24
commission survey 12
design survey
0 50 100 150 200

Figure 6. Search for survey in JustTheWord collocations program
410 Lynne Flowerdew
confirmed students’ initial intuitions, but some were surprised to find that the
verb do in Cluster 3 was acceptable. An examination of the concordance lines for
this collocation revealed, though, that it was mainly used in an informal setting in
speaking, as in the following: I mean, I haven’t done a detailed survey on anything.
One misconception held by students was that the Business Letters Corpus
would be useful for consulting for any aspect of their letter writing. The utility of
this corpus for answering business-related language queries such as the structure
and use of phrases with appreciate has been illustrated earlier in this article. For
other problematic areas, though, such as topic-comment (e.g. For the training pro-
gram, it will start on…), it would have been more appropriate to consult a local
reference grammar targeting common errors of Hong Kong students.
It is noteworthy that which resource (corpus, grammar, dictionary etc.) is the
most appropriate for a particular query has not been explored much to date. Ken-
nedy (2008) notes that a corpus might not be the most efficient way for students
to discover the differences in use between tall, high, upright and vertical when the
differences are made explicit in good dictionaries, but such insightful observations
are few and far between in the literature. This is an important area that Bernardini
(2002, 2004) has flagged for future development.
Here are two sets of typical examples, one from published journal articles and one from stu-
dent dissertations. What do you notice about the use of it seems in the two sets of examples?
Can you suggest why they are different?
Published articles Student dissertations
• It seems clear that as insider holding • It seems that different studies have shown
proportions increase, capitalization ratios different results.
decrease.
• It seems likely that the eighties and nineties • It seems that the practice of employing lo-
will be known as decades of large scale cal staff by multinationals is increasing.
disaggregation.
• It seems quite probable that consumers • It seems that some individual training
would not recognize such relatively small courses are below their full capacity.
degrees of difference.
Now look at the following examples of it seems that from published journal articles. How is it
used differently from student dissertations?
• It seems that consumers are more likely to use price tactic and switch stores only when
certain brands and product categories are promoted.
• It seems that the issue of privatization could become an object of a national referendum.
Figure 7. Concordance task for it seems in published articles and student dissertations
(from Hewings 2002)
Neither should it be forgotten that corpora of learner writing are another valu-
able resource in corpus-based pedagogy (see Pravec 2002 for a review), either to
inform materials (cf. Granger 2004; Gilquin et al. 2007; Mukherjee 2006) or for
exploitation by the learners themselves (Hewings & Hewings 2002; Mukherjee
& Rohrbach 2006; Seidlhofer 2000). For example, Mukherjee & Rohrbach (ibid.)
propose individualising the corpus analysis in order to compare variation in in-
dividual learners’ output. Having learners build corpora of their own writing to
compare with a reference corpus would thus increase the relevance of corpus-
based pedagogy by individualising it. The corpus-based materials of Hewings &
Hewings (2002) and Hewings (2002) on the use of metadiscoursal anticipatory
it in professional business writing, i.e. published journal articles from the field of
Business Studies, also incorporate the findings from learner corpora (MBA disser-
tations written by non-native speakers). Asking students to compare and discuss
the differences of it seems… in concordance lines selected from the two corpora, as
shown in Figure 7 overleaf, would serve to alert students to particularly problem-
atic areas for post-graduate writers, which students might not appreciate if they
were just exposed to working with expert or professional corpora.
6. Conclusion
This article has reviewed four inter-related issues concerning the application of
corpus linguistics to pedagogy and ESP in particular. It can be seen that very re-
cent pedagogic endeavours have adopted a much more discourse-based, top-down
approach to analysis (or worked from a bottom-up to a more top-down analysis),
a development that was advocated by Flowerdew (1998) over a decade ago. It has
also been illustrated that corpus pedagogy has progressed beyond looking at trun-
cated concordance lines, and is now encompassing Sinclair’s ‘units of meaning’,
outlined in the introduction of this article.
However, the issue of contextualization still remains problematic and it is en-
visaged that in future more attention will be paid to the mark-up of written text
with contextual features, as is the norm for spoken corpora nowadays. It has been
shown, though, that corpora are not completely devoid of context, and that the
co-textual environment may provide useful contextual clues. Although there are a
few accounts in the literature regarding the ‘pedagogic mediation’ of corpus data,
these are few and far between, indicating this is an area for further discussion and
expansion. Finally, it has been suggested that more attention needs to be paid to
the types of enquiry corpora are best suited for. The increasing availability of other
online resources, such as grammars, thesauri, dictionaries etc., will make it easier
for students to toggle between a multitude of online resources to decide which is
412 Lynne Flowerdew
the most relevant and useful look-up tool. Learner corpora, it is argued, are also
of value here. However, the above can only be accomplished with strategy train-
ing, not only of students but also of teachers, as called for by Frankenberg-Garcia
(2006). There is therefore still much to debate and develop in the application of
corpus linguistics to pedagogy, a field first founded with the pioneering work of
Tim Johns (1991a, 1991b) in the early nineties.
Notes
* This is a revised and extended version of a paper given at the 8th Teaching and Language
Corpora Conference, Lisbon, Portugal on 6th July 2008, and also an invited lecture given at the
Hong Kong Association for Applied Linguistics on 5th March 2007.
I wish to thank the two anonymous reviewers for their very helpful and constructive com-
ments on an earlier draft of this paper. Any shortcomings, naturally, remain my own.
1. I use ‘corpus-based’ in this article to refer to any hands-on pedagogic applications of corpora.
See Tognini-Bonelli (2001) for a discussion on her definitions of ‘corpus-based’ vs. ‘corpus-
driven’. See also Lee (2008) for additional details on ‘corpus-informed’ and ‘corpus-supported’
linguistics.
2. The BLC is a freely available corpus at http://ysomeya.hp.infoseek.co.jp/ (accessed January

2009). It comprises one million words of business letters.
3. The BNCweb is a user-friendly interface for the 100-million word BNC. See http://homepage.
mac.com/bncweb/manual/bncwebmanmain.htm (accessed December 2008) for more details,
and also Hoffmann et al. (2008).
4. Information on MICASE can be found at http://quod.lib.umich.edu/m/micase/ (accessed

July 2008).
5. JustTheWord is an online collocations program which interfaces with the 100-million-word
BNC.
References
Aston, G. 1995. “Corpora in language pedagogy: Matching theory and practice”. In G. Cook &
B. Seidlhofer (Eds.), Principle and Practice in Applied Linguistics. Oxford: Oxford University
Press, 257–270.
Belcher, D. 2006. “English for Specific Purposes: Teaching to perceived needs and imagined
futures in worlds of work, study and everyday life”. TESOL Quarterly, 40 (1), 133–156.
Bernardini, S. 2000. “Systematising serendipity: Proposals for concordancing large corpora with
language learners”. In L. Burnard & T. McEnery (Eds.), Rethinking Language Pedagogy from
a Corpus Perspective. Frankfurt: Peter Lang, 225–234.
Bernardini, S. 2002. “Exploring new directions for discovery learning”. In B. Kettemann & G.
Marco (Eds.), Teaching and Learning by Doing Corpus Analysis. Amsterdam: Rodopi, 165–
182.
Bernardini, S. 2004. “Corpora in the classroom: An overview and some reflections on future
developments”. In J. McH. Sinclair (Ed.), How to Use Corpora in Language Teaching. Am-
sterdam/Philadelphia: John Benjamins, 15–36.
Bhatia, V., Langton, N. & Lung, J. 2004. “Legal discourse: Opportunities and threats for corpus
linguistics”. In U. Connor & T. Upton (Eds.), Discourse in the Professions: Perspectives from
Corpus Linguistics. Amsterdam/Philadelphia: John Benjamins, 203–231.
Biber, D., Conrad, S. & Cortes, V. 2004. “‘If you look at…’: Lexical bundles in university teaching
and textbooks”. Applied Linguistics, 25 (3), 371–405.
Biber, D., Connor, U. & Upton, T. (Eds.) 2007a. Discourse on the Move: Using Corpus Analysis to
Describe Discourse Structure. Amsterdam/Philadelphia: John Benjamins.
Biber, D., Connor, U. & Upton, T. 2007b. “Conclusion: Comparing the analytical approaches”.
In D. Biber, U. Connor & T. Upton (Eds.), Discourse on the Move: Using Corpus Analysis to
Describe Discourse Structure. Amsterdam/Philadelphia: John Benjamins, 239–259.
Braun, S. 2005. “From pedagogically relevant corpora to authentic language learning contents”.
ReCALL, 17 (1), 47–64.
Braun, S. 2007. “Integrating corpus work into secondary education: From data-driven learning
to needs-driven corpora”. ReCALL, 19 (3), 307–328.
Burnard, L. 2004: online. “Metadata for corpus work”. Available at: http://users.ox.ac.uk/~lou/
wip/metadata.html (accessed January 2009).
Carter, R. & McCarthy, M. 1995. “Grammar and the spoken language”. Applied Linguistics, 16
(2), 141–158.
Celce-Murcia, M. 2002. “On the use of selected grammatical features in academic writing”. In
M. Schleppegrell & C. Colombi (Eds.), Developing Advanced Literacy in First and Second
Languages. Mahwah, N.J.: Lawrence Erlbaum, 143–157.
Chambers, A. 2005. “Integrating corpus consultation in language studies”. Language Learning
and Technology, 9 (2), 111–125.
Chambers, A. & O’Sullivan, I. 2004. “Corpus consultation and advanced learners’ writing skills
in French”. ReCALL, 16 (1), 158–172.
Charles, M. 2007. “Reconciling top-down and bottom-up approaches to graduate writing: Us-
ing a corpus to teach rhetorical functions”. Journal of English for Academic Purposes, 6 (4),
289–302.
Cook, G. 1998. “The uses of reality: A reply to Ronald Carter”. ELT Journal, 52 (1), 57–63.
Danielsson, P. 2007. “What constitutes a unit of analysis in language”? Linguistik online 31,
2/2007, 17–24.
Davies, M. 2004. “Student use of large, annotated corpora to analyse syntactic variation”. In G.
Aston, S. Bernardini & D. Stewart (Eds.), Corpora and Language Learners. Amsterdam/
Philadelphia: John Benjamins, 257–269.
Flowerdew, L. 1998. “Corpus linguistic techniques applied to textlinguistics”. System, 26 (4),
541–552.
Flowerdew, L. 2003. “A combined corpus and systemic-functional analysis of the Problem-So-
lution pattern in a student and professional corpus of technical writing”. TESOL Quarterly,
37 (3), 489–511.
414 Lynne Flowerdew
Flowerdew, L. 2004. “The argument for using specialised corpora to understand academic and
professional language”. In U. Connor & T. Upton (Eds.), Discourse in the Professions: Per-
spectives from Corpus Linguistics. Amsterdam/Philadelphia: John Benjamins, 11–33.
Flowerdew, L. 2005. “An integration of corpus-based and genre-based approaches to text analy-
sis in EAP/ESP: Countering criticisms against corpus-based methodologies”. English for
Specific Purposes, 24 (3), 321–332.
Flowerdew, L. 2006. “Texts, tools and contexts in corpus applications for writing”. Paper pre-
sented in invited academic session “Current Trends in Corpus Linguistics Research”. 40th
Annual TESOL Convention, Tampa, Florida, 16th March.
Flowerdew, L. 2008a. Corpus-based Analyses of the Problem-Solution Pattern: A Phraseological
Analysis. Amsterdam/Philadelphia: John Benjamins.
Flowerdew, L. 2008b. “Corpus linguistics for academic literacies mediated through discussion
activities”. In D. Belcher & A. Hirvela (Eds.), The Oral-Literate Connection: Perspectives on
L2 Speaking, Writing and Other Media Interactions. Ann Arbor, MI: University of Michigan
Press, 268–287.
Flowerdew, L. In press. “Using corpora for writing instruction”. In M. McCarthy & A. O’Keeffe
(Eds.), The Routledge Handbook of Corpus Linguistics. London: Routledge.
Flowerdew, L. Forthcoming a. “Corpus-based discourse analysis”. In J. P. Gee & M. Hanford
(Eds.), The Routledge Handbook of Discourse Analysis. London: Routledge.
Flowerdew, L. Forthcoming b. “ESP and corpus studies”. In D. Belcher, A. Johns & B. Paltridge
(Eds.), New Directions for ESP Research. Ann Arbor, MI: University of Michigan Press.
Frankenberg-Garcia, A. 2006. “Raising teachers’ awareness to corpora”. Plenary paper given at
the 7th Conference on Teaching and Language Corpora, Paris, 1–4 July.
Gardner, D. 2007. “Validating the construct of Word in applied corpus-based vocabulary re-
search: A critical survey”. Applied Linguistics, 28 (2), 241–265.
Gavioli, L. 2005. Exploring Corpora for ESP Learning. Amsterdam/Philadelphia: John Benja-
mins.
Gavioli, L. & Aston, G. 2001. “Enriching reality: Language corpora in language pedagogy”. ELT
Journal, 55 (3), 238–246.
Gilquin, G., Granger, S. & Paquot, M. 2007. “Learner corpora: The missing link in EAP peda-
gogy”. Journal of English for Academic Purposes, 6 (4), 319–335.
Granger, S. 1999. “Use of tenses by advanced EFL learners: Evidence from an error-tagged com-
puter corpus”. In S. Hasselgard & S. Oksefjell (Eds.), Out of Corpora: Studies in Honour of
Stig Johansson. Amsterdam: Rodopi, 191–202.
Granger, S. 2004. “Practical applications of learner corpora”. In B. Lewandowska-Tomaszczyk
(Ed.), Practical Applications in Language and Computers. Bern: Peter Lang, 1–10.
Granger, S. & Meunier, F. (Eds.) 2008. Phraseology: An Interdisciplinary Perspective. Amsterdam/
Philadelphia: John Benjamins.
Hahn, A. 2000. “Grammar at its best: The development of a rule- and corpus-based grammar of
English tenses”. In L. Burnard & T. McEnery (Eds.), Rethinking Language Pedagogy from a
Corpus Perspective. Hamburg: Peter Lang, 193–206.
Hewings, M. 2002. “Using computer-based corpora in teaching”. Paper presented at the 36th
TESOL Conference, Utah, March 2002.
Hewings, M. & Hewings, A. 2002. “‘It is interesting to note that…’: A comparative study of antic-
ipatory ‘it’ in student and published writing”. English for Specific Purposes, 21 (4), 367–383.
Hoffmann, S., Evert, S., Smith, N., Lee, D. & Berglund Prytz, Y. 2008. Corpus Linguistics with
BNCweb −A Practical Guide. Bern: Peter Lang.
Hunston, S. & Francis, G. 2000. Pattern Grammar: A Corpus-driven Approach to the Lexical
Grammar of English. Amsterdam/Philadelphia: John Benjamins.
Hyland, K. 2000. Disciplinary Discourses: Social Interactions in Academic Writing. London:
Longman.
Hyland, K. 2002. “Specificity revisited: How far should we go?”. English for Specific Purposes, 21
(4), 385–395.
Hyland, K. 2004. Genre and Second Language Writing. Ann Arbor: University of Michigan Press.
Hyland, K. 2007. “As can be seen: Lexical bundles and disciplinary variation”. English for Specific
Purposes, 27 (1), 4–21.
Hyland, K. 2008. “Academic clusters: Text patterning in published and postgraduate writing”.
International Journal of Applied Linguistics, 18 (1), 41–62.
Johns, T. 1991a. “From printout to handout: Grammar and vocabulary teaching in the context of
data-driven learning”. In T. Odlin (Ed.), Perspectives on Pedagogical Grammar. Cambridge:
Cambridge University Press, 293–313.
Johns, T. 1991b. “Should you be persuaded: Two examples of data-driven learning”. English Lan-
guage Research Journal, 4. Department of English: University of Birmingham, 1–16.
Jones, J. 2007. “Vocabulary-based discourse units in biology research articles”. In D. Biber, U.
Connor & T. Upton (Eds.), Discourse on the Move: Using Corpus Analysis to Describe Dis-
course Structure. Amsterdam/Philadelphia: John Benjamins, 175–212.
Kaltenböck, G. & Mehlmauer-Larcher, B. 2005. “Computer corpora and the language classroom:
On the potential and limitations of computer corpora in language teaching”. ReCALL, 17
(1), 65–84.
Kanoksilapatham, B. 2007. “Rhetorical moves in biochemistry research articles”. In D. Biber, U.
Connor & T. Upton (Eds.), Discourse on the Move: Using Corpus Analysis to Describe Dis-
course Structure. Amsterdam/Philadelphia: John Benjamins, 73–119.
Kennedy, G. 2008. “Phraseology and language pedagogy”. In F. Meunier & S. Granger (Eds.),
Phraseology in Foreign Language Learning and Teaching. Amsterdam/Philadelphia: John
Benjamins, 21–41.
Krishnamurthy, R. & Kosem, I. 2007. “Issues in creating a corpus for EAP pedagogy and re-
search”. Journal of English for Academic Purposes, 6 (4), 356–373.
Lee, D. 2008. “Corpora and discourse analysis: New ways of doing old things”. In V. K. Bhatia, J.
Flowerdew & R. Jones (Eds.), Advances in Discourse Studies. London: Routledge, 86–99.
Lee, D. &. Swales J. M. 2006. “A corpus-based EAP course for NNS doctoral students: Moving
from available specialized corpora to self compiled corpora”. English for Specific Purposes,
25 (1), 56–75.
McCarthy, M. 2001. Issues in Applied Linguistics. Cambridge: Cambridge University Press.
McEnery, T., Xiao, R. & Tono, Y. 2006. Corpus-based Language Studies. London: Routledge.
Meunier, F. 2002. “The pedagogic value of native and learner corpora in EFL grammar teach-
ing”. In S. Granger, J. Hung & S. Petch-Tyson (Eds.), Computer Learner Corpora, Second
Language Acquisition and Foreign Language Teaching. Amsterdam/Philadelphia: John Ben-
jamins, 119–141.
Meunier, F. & Granger, S. (Eds.) 2008. Phraseology in Foreign Language Learning and Teaching.
Amsterdam/Philadelphia: John Benjamins.
416 Lynne Flowerdew
Milton, J. 2006. “Resource-rich web-based feedback: Helping learners become independent

writers”. In K. Hyland & F. Hyland (Eds.), Feedback in Second Language Writing. Cam-
bridge: Cambridge University Press, 123–139.
Mukherjee, J. 2006. “Corpus linguistics and language pedagogy: The state of the art − and be-
yond”. In S. Braun, K. Kohn & J. Mukherjee (Eds.), Corpus Technology and Language Peda-
gogy. Frankfurt am Main: Peter Lang, 5–24.
Mukherjee, J. & Rohrbach, J.-M. 2006. “Rethinking applied corpus linguistics from a language-
pedagogical perspective: New departures in learner corpus research”. In B. Kettemann & G.
Marko (Eds.), Planning and Gluing Corpora: Inside the Applied Corpus Linguist’s Workshop.
Frankfurt am Main: Peter Lang, 205–231.
Nesselhauf, N. 2003. “The use of collocations by advanced learners of English and some implica-
tions for teaching”. Applied Linguistics, 24 (2), 223–242.
Nesselhauf, N. 2004. Collocations in a Learner Corpus. Amsterdam/Philadelphia: John Benja-
mins.
Noguchi, J. 2004. “A genre analysis and mini-corpora approach to support professional writing
by non-native speakers”. English Corpus Studies, 11, 101–110.
O’Sullivan, I. 2007. “Enhancing a process-oriented approach to literacy and language learning:
The role of corpus consultation literacy”. ReCALL, 19 (3), 269–286.
Partington, A. 1998. Patterns and Meanings. Amsterdam/Philadelphia: John Benjamins.
Pravec, N. 2002. “Survey of learner corpora”. ICAME Journal, 26 (1), 8–14.
Seidlhofer, B. 2000. “Operationalising intertextuality: Using learner corpora for learning”. In L.
Burnard & T. McEnery (Eds.), Rethinking Language Pedagogy from a Corpus Perspective.
Bern: Peter Lang, 207–223.
Seidlhofer, B. (Ed.) 2003. Controversies in Applied Linguistics (Section 2: Corpus Linguistics and
Language Teaching). Oxford: Oxford University Press.
Sinclair, J. McH. 1991. Corpus Concordance Collocation. Oxford: Oxford University Press.
Sinclair, J. McH. 1999. “The lexical item”. In E. Weigand (Ed.), Contrastive Lexical Semantics.
Amsterdam/Philadelphia: John Benjamins, 1–24.
Sinclair, J. McH. 2004a. “The search for units of meaning”. In J. McH. Sinclair (edited with R.
Carter), Trust the Text. London: Routledge, 24–48.
Sinclair, J. McH. 2004b. “New evidence, new priorities, new attitudes”. In J. McH. Sinclair (Ed.),
How to Use Corpora in Language Teaching. Amsterdam/Philadelphia: John Benjamins,
271–299.
Stubbs, M. 1996. Text and Corpus Analysis. Oxford: Blackwell.
Stubbs, M. 2004. “On very frequent phrases in English: Distributions, functions and structures”.
Plenary address given at ICAME 25, Verona, Italy, 19–23 May.
Swain, M. 1998. “Focus on form through conscious reflection”. In C. Doughty & J. Williams
(Eds.), Focus on Form in Classroom Second Language Acquisition. Cambridge: Cambridge
University Press, 64–81.
Swales, J. M. 1990. Genre Analysis: English in Academic and Research Settings. Cambridge: Cam-
bridge University Press.
Swales, J. M. 2002. “Integrated and fragmented worlds: EAP materials and corpus linguistics”. In
J. Flowerdew (Ed.), Academic Discourse. Harlow, UK: Longman, 150–64.
Swales, J. M. 2004. Research Genres. Cambridge: Cambridge University Press.
Tognini-Bonelli, E. 2001. Corpus Linguistics at Work. Amsterdam/Philadelphia: John Benja-

mins.
Tribble, C. & Jones, G. 1990. Concordances in the Classroom. Harlow, UK: Longman.
Vannestål, M. & Lindquist, H. 2007. “Learning English grammar with a corpus: Experimenting
with concordancing in a university grammar course”. ReCALL, 19 (3), 329–350.
Weber, J.-J. 2001. “A concordance- and genre-informed approach to ESP essay writing”. ELT
Journal, 55 (1), 14–20.
Widdowson, H. G. 1991. “The description and prescription of language”. In J. Alatis (Ed.),
Georgetown University Round Table in Language and Linguistics. Washington, DC: George-
town University.
Widdowson, H. G. 1998. “Context, community and authentic language”. TESOL Quarterly, 32
(4), 705–716.
Widdowson, H. G. 2002. “Corpora and language teaching tomorrow”. Keynote lecture delivered
at the Fifth Teaching and Language Corpora Conference, Bertinoro, Italy, 29 July.
Widdowson, H. G. 2004. Text, Context, Pretext. London: Blackwell.
Author’s address
Lynne Flowerdew
Hong Kong University of Science and Technology
Language Centre
Clear Water Bay Road
Kowloon
Hong Kong, SAR
lclynne@ust.hk

Applying Corpus Linguistics To Pedagogy: A Critical Evaluation

Uploaded by

Copyright:

Available Formats

You might also like

Applying Corpus Linguistics To Pedagogy: A Critical Evaluation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Applying Corpus Linguistics To Pedagogy: A Critical Evaluation

Uploaded by

Copyright:

Available Formats

Applying corpus linguistics to pedagogy

Keywords: top-down, decontextualisation, pedagogic processing, inductive,

Corpus linguistics is usually associated with a phraseological approach to analysis,

International Journal of Corpus Linguistics 14:3 (2009), 393–417. doi 10.1075/ijcl.14.3.05flo

but rather as a methodology. The same position can be applied to data-driven

2. Corpus linguistic techniques encourage a bottom-up processing of text

In the ‘top-down’ approach, the functional components of a genre are determined

2.1 Moving from top-down to bottom-up processing

Charles (2007), mediates between top-down and bottom-up processing of aca-

A more top-down approach, similar to that adopted by Charles, is also found in

Another corpus linguistics practitioner who commences from a top-down

2.2 Moving from bottom-up to top-down processing

Hyland’s (2007, 2008) work on genre-specific phraseological routines commenc-

Pedagogic applications with a bottom-up starting point moving to more top-

A search on these different combinations in a Business Letters Corpus revealed the

accept, thank, and their respective phraseologies would be a way of counteracting

A search on focus was conducted in an institutionally-compiled 7 million-word

Pattern Left sort Right Sort Frequency Sort

3. Corpus data are decontextualised and may not be directly transferable

3.1 The issue of contextualisation in corpus data

Widdowson’s (2004) arguments on the decontextualised nature of corpus data are

3.2 ‘Pedagogic processing’ of corpus data

3.2.1 The “what” of pedagogic processing

phrase to make it socio-culturally appropriate for writing to university authorities.

3.2.2 The “how” of pedagogic processing

4. Corpus-based pedagogy is usually associated with an inductive

Deductive Grammar rules

Implementing a more delicate approach to corpus queries would help to reduce

5. Which corpus and which online resource?

Students considered the 7-million word sub-corpus of reports to be ideal for

Words Left sort Right Sort Show PoS? Frequency Sorted

carry out survey

conduct survey 132

0 50 100 150 200

2. The BLC is a freely available corpus at http://ysomeya.hp.infoseek.co.jp/ (accessed January

4. Information on MICASE can be found at http://quod.lib.umich.edu/m/micase/ (accessed

Milton, J. 2006. “Resource-rich web-based feedback: Helping learners become independent

Tognini-Bonelli, E. 2001. Corpus Linguistics at Work. Amsterdam/Philadelphia: John Benja-

You might also like