Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

ReCALL 26(2): 243–259.

2014 © European Association for Computer Assisted Language Learning 243


doi:10.1017/S0958344014000056
First published online 14 February 2014

The use of general and specialized corpora as


reference sources for academic English writing:
A case study
JI-YEON CHANG
Education Research Institute, Seoul National University, Korea
(email: jchang200@gmail.com)

Abstract

Corpora have been suggested as valuable sources for teaching English for academic purposes
(EAP). Since previous studies have mainly focused on corpus use in classroom settings, more
research is needed to reveal how students react to using corpora on their own and what should be
provided to help them become autonomous corpus users, considering that their ultimate goal is
to be independent scholars and writers. In the present study, conducted in an engineering lab at
a Korean university over 22 weeks, data on students’ experiences and evaluations of consulting
general and specialized corpora for academic writing were collected and analyzed. The findings
show that, while both corpora served the participants well as reference sources, the specialized
corpus was particularly valued for its direct help in academic writing because, as non-native
English-speaking graduate engineering students, the participants wanted to follow the writing
conventions of their discourse community. The participants also showed disparate attitudes toward
the time taken for corpus consultation due to differences in factors such as academic experience,
search purposes, and writing tasks. The article concludes with several suggestions for better corpus
use with EAP students regarding the compilation of a corpus, corpus training, corpus competence,
and academic writing.

Keywords: corpus use, L2 writing, academic writing

1. Introduction

The number of publications in international journals plays a critical role in employment and
promotion at Korean universities, and this gives a great incentive to graduate science and
engineering students to write in English (Cho, 2009). However, writing research articles in
English is a great burden to students in Korea as EFL learners (Cho, 2009; Shin, 2010).
Furthermore, non-native English speaking (NNES) scholars are also known still to suffer
from language-related problems such as grammar and vocabulary (Hanauer & Englander,
2011; Uzuner, 2008).
In order to increase students’ lexicogrammatical knowledge, data-driven learning or DDL
(Johns, 1991, 1994) has been employed in previous research (Boulton, 2009, 2010; Chan &
Liou, 2005; Cobb, 1997; Sun & Wang, 2003; Estling Vannestål & Lindquist, 2007), with
some studies (Anthony, 2006; Charles, 2007; L. Flowerdew, 2001; Lee & Swales, 2006;
244 J. -Y. Chang

Thurston & Candlin, 1998; Tribble, 2002) also suggesting the possibility of using corpora
for improving English for academic purposes. Since these studies mainly focus on the
potential of corpora in classroom settings, further research is necessary to examine what
should be provided to support students’ independent corpus use for academic writing,
taking account of various individual and contextual factors.
In this study, two different corpora were introduced to ten graduate engineering students:
a general corpus equipped with advanced search functions, and a specialized corpus of
academic papers related to the participants’ fields. The main purpose of this study is to
examine the evaluations of the respective corpora as reference sources from the perspectives
of NNES graduate engineering students in an EFL setting. To this end, primary data were
collected via weekly interviews and final surveys and were analyzed according to a
grounded theory approach (Corbin & Strauss, 2008). Based on the findings, the study
identifies what is needed to facilitate appropriate corpus consultation for academic writing
and makes suggestions for future corpus use.

2. Corpora as reference sources for L2 writing

There has been some research on corpus use for L2 writing. While C. Yoon (2011) divides it
into two groups according to the roles of the corpora used, i.e., research tools and reference
tools, there are further differences among studies on corpora as reference tools. One
stream mainly deals with corpus consultation for error correction in writing (Chambers &
O’Sullivan, 2004; Gaskell & Cobb, 2004; O’Sullivan & Chambers, 2006; Todd, 2001).
In these studies, students consulted corpora to correct errors in their writing which had been
underlined or highlighted by their language teachers. To differentiate this method
from DDL, Gaskell and Cobb (2004) call it ‘feedback-driven learning’. To examine its
effectiveness, some of these studies focus on specific language items such as students’
grammatical errors (Gaskell & Cobb, 2004) or lexical errors (Todd, 2001). On the other
hand, in two other studies (Chambers & O’Sullivan, 2004; O’Sullivan & Chambers, 2006),
various errors were marked in students’ French writing, and error corrections by corpus
consultation were classified and compared, including grammatical errors, lexical errors, and
capitalization errors. These studies show that, even though students can correct all types of
errors, corpus consultation is particularly helpful in lessening the interference of their first
language. However, one student in the first study (Chambers & O’Sullivan, 2004) said that
“the errors need to be highlighted first. I don’t believe I could significantly improve an
unmarked piece” (op. cit.: 169). It is thus questionable whether students can benefit from
corpus consultation when they use corpora without the intervention of their instructors.
In another stream of research (Kennedy & Miceli, 2001, 2010; Park, 2012; H. Yoon,
2008; H. Yoon & Hirvela, 2004), students do autonomously use corpora as reference
sources for writing without any feedback from their instructors. Kennedy and Miceli (2001)
examined their students’ behaviors according to four steps in corpus consultation, including
formulating the question, devising a search strategy, observing the examples found and
selecting relevant ones, and drawing conclusions. Based on their findings, they drew up a
list of guidelines for each step, and these served as a basis for improving their future ‘corpus
apprenticeship’. In a follow-up study, Kennedy and Miceli (2010) provided longer and
more refined training where they introduced two corpus use strategies, i.e., pattern-hunting
and pattern-defining. They found that the participants developed their own consultation
Use of general and specialized corpora as reference sources 245

styles and had to master a variety of functions for each reference. With cutting-edge
technologies such as screen recordings and digital query logs (Park & Kinginger, 2010),
Park (2012) analyzed individual learner-corpus interactions (LCI) from the Vygotskian
microgenetic perspective. Rather than using well-known corpora, Park chose Google’s
custom search function and identified three steps in corpus consultation: needs analysis,
querying, and evaluation of the search results and decision making. Among all the inter-
actions, 47% brought about inappropriate or no changes in students’ writing, which Park
attributed particularly to difficulties with the third step.
While the aforementioned studies were conducted with only a few students, H. Yoon and
Hirvela (2004) investigated intermediate and advanced students’ opinions of corpus use via
questionnaires and interviews. The students’ evaluation was that using corpora was helpful
in increasing their general and academic writing skills as well as boosting their confidence,
even though it was time-consuming. The intermediate students showed more favorable
attitudes toward corpus use; however, as they were given more activities and more time for
corpus training, it would be a hasty conclusion to say that corpus use is more helpful to
intermediate than advanced students. H. Yoon (2008) also conducted a longitudinal case
study on changes in the perceptions of six graduate students with different majors and found
that they took more responsibility for their own writing by consulting the corpus provided,
thereby becoming independent writers. She also illustrated a variety of factors to be
considered, including academic disciplines, prior writing experience, English proficiency,
and familiarity with the corpus. In both studies, the students consulted only the Collins
COBUILD Corpus, and thus these studies are limited to the use of a general corpus for
academic writing.
One of the common features in most of these studies is that they were generally conducted
in classroom settings where students consulted corpora under their instructors’ guidance,
performing their in-class writing tasks within a certain time limit. Therefore, they do not
offer a concrete illustration of students’ experience of using corpora on their own. What is
more, in these writing classrooms, students’ majors were often so diverse (Charles 2011)
that disciplinary or contextual factors were not properly taken into account, which can
impact on corpus users’ perceptions and behaviors as in Hafner and Candlin’s (2007) study
on corpus use by law students. In addition, most of the previous studies mainly drew upon
only one type of corpus, often a general corpus. As some suggestions have been made for
using a specialized corpus for academic purposes (Anthony, 2006; Cargill & Adams, 2005;
Charles, 2007; L. Flowerdew, 2001; Lee & Swales, 2006; Thurston & Candlin, 1998;
Tribble, 2002), it is imperative to explore the effectiveness of specialized corpora combined
with general ones particularly in academic settings. Finally, as computer and internet
technologies have developed so quickly, students can now access advanced corpora
anytime and anywhere. Therefore, to reflect these recent trends, it is also necessary to
conduct research on the current use of diverse corpora in more natural settings.
To address these research issues, a case study was designed and conducted on the
autonomous use of general and specialized corpora for academic writing by NNES graduate
students with the same major, i.e., computer engineering. The research questions that gui-
ded this study are as follows:

1. What are the benefits and drawbacks of a general corpus as a reference source for
academic English writing?
246 J. -Y. Chang

2. What are the benefits and drawbacks of a specialized corpus as a reference source for
academic English writing?
3. How do the participants evaluate corpora as reference sources for academic English
writing?

3. Methodology

3.1. Setting and procedure

The present study was conducted in a computer engineering lab at a Korean university.
The researcher carried out this study, serving as an English writing instructor for the lab.
At the first meeting, the researcher provided a corpus workshop, explaining how to
consult the Corpus of Contemporary American English (COCA) and a specialized corpus
with the freeware concordancer AntConc, using several activities.1 Over 22 weeks, the
students submitted what they wrote in English to the researcher, who had an interview and a
one-to-one feedback session with individual students every week. During these sessions, the
researcher provided not only direct feedback but also data-driven feedback (Chang, 2013)
by consulting the corpora together with the participants to find correct or better expressions
and to develop their corpus consultation skills. After each session was over, the participants
consulted the corpora on their own to keep feedback logs based on the feedback provided,
which were collected one week later. In this study, search logs in which the participants
recorded their own searches were used to complement primary data. The researcher and the
participants communicated in their first language, Korean. During the research period, the
participants wrote journal articles, conference papers, revision summaries, abstracts for
Korean conference papers, project proposals, emails, and weekly report summaries in
English. The research was terminated 22 weeks later, as no new findings were noted.
While COCA was chosen as the general corpus, a specialized corpus, named Miche-
langelo, was compiled manually. Its first version contained 178 papers compiled from lists
of bibliographical references provided by those students with publication experience. After
student D5 (see Table 1) complained that Michelangelo did not include papers relevant to
his study, he was requested to submit an additional list of references, and a second version
was released a month later with 23 papers added. Michelangelo mainly consists of journal
articles and conference papers in the participants’ fields, and thus features the academic
written English language of computer science and engineering. This corpus contains both
NES and NNES writers’ publications in line with the view that students need to be exposed
to academic English as a lingua franca (J. Flowerdew, 2008; Mauranen, 2003; Wood, 2001).
At the time of the research, COCA comprised 400 million words, and Michelangelo
1,352,033 words without tagging or parsing.

3.2. Participants

The participants who volunteered for the study, five master’s and five doctoral students,
were all Koreans who had learned English as a foreign language. As shown in Table 1, for
participant identification, each student was assigned a code ranging from D1 to M5, where

1
For further information on workshop materials, see Chang (2011: 268-281).
Use of general and specialized corpora as reference sources 247

Table 1 Overview of the participants’ information

ID Gender Program enrolled Age English proficiency PE1 PE2 PK1 PK2

D1 Female Doctoral 30 Advanced 3 0 2 0


D2 Male Doctoral 29 High-intermediate 1 1 2 0
D3 Male Doctoral 32 High-intermediate 4 1 3 1
D4 Male Doctoral 28 High-intermediate 4 0 5 0
D5 Male Doctoral 26 Advanced 0 0 2 0
M1 Male Master’s 26 Mid-intermediate 0 0 0 0
M2 Male Master’s 29 Advanced 0 0 0 0
M3 Male Master’s 24 Near-Native 0 0 0 0
M4 Male Master’s 26 Advanced 0 0 0 0
M5 Male Master’s 23 Advanced 0 0 0 0

English proficiency = English proficiency measured by TEPS scores; PE1/PE2 = the number of
publications in international journals as primary/secondary authors; PK1/PK2 = the number of pub-
lications in Korean journals as primary/secondary authors.

the letters indicate the degree conferred upon graduation, and the numbers indicate the order
of entrance to the lab. Except for M2 and M4, the other students had no experience of
studying abroad. M2 and M4 had stayed in English-speaking countries for eight and ten
months, respectively.
The students were asked to submit their recent English test scores as an indicator of their
level of proficiency in English. M3 submitted the score of 106 in an internet-based TOEFL
test, which is equivalent to Level 1 according to the TEPS grading system used by all other
participants. Their scores ranged from Level 3 + to Level 1, i.e., from mid-intermediate to
near-native level, though it should be noted that TEPS does not test speaking or writing
skills. The doctoral students had all published articles in both Korean and international
journals except for D5, who had published two articles only in Korean journals. The
master’s students had not published articles in either language. Prior to the start of the study,
none of the participants had heard about a corpus. With the exception of M5, all participants
had used Google and/or Google Scholar as a language reference for English writing, though
D3 was the only one who had used its advanced search functions such as quotation marks
and wild cards.
While ten participants can be considered rather many for a single case study, it is possible
to include more than one case in a case study for comprehensive research (Duff, 2008).
Given that a case can be an individual, an institution, or even a country (Chapelle & Duff,
2003), it is more appropriate to consider that the case in this study refers to an engineering
lab consisting of ten individual members to uncover “the multiple realities” (Stake, 1995:
12), i.e., disparate views and patterns of corpus use for academic English writing.

3.3. Data collection and analysis

For this research project, a variety of data were collected including the recordings of weekly
interviews and feedback sessions, first and second surveys, search logs and feedback logs,
and the students’ written productions, along with the researcher’s fieldnotes, memos, and
248 J. -Y. Chang

email correspondence with the students. This paper mainly draws upon the transcripts of
the weekly interviews and the students’ written responses to the second survey as primary
data. Before each feedback session, the researcher had a semi-structured interview with
individual students who had submitted their writing. The researcher asked them to describe
what they had looked up in the corpora and other language references, how they felt
about them, and what difficulties they had experienced in using the corpora (see Appendix 1
for a complete list of interview questions). Each interview lasted for approximately three to
thirty minutes depending on the length of the participants’ answers and the researcher’s
follow-up questions.
An initial survey was conducted at the beginning of the project in order to obtain
demographic information and details of academic needs; a second questionnaire was dis-
tributed at the end in order to collect the participants’ final evaluations. In the second survey,
the students wrote down for what purposes they had used the corpora and in what ways they
had been helpful or not. They also freely left several comments and suggestions for better
corpus use (see Appendix 2 for the second survey questions).
Along with the written productions, the students also submitted their search logs and
feedback logs. Since neither screen recordings (Park, 2012; Park & Kinginger, 2010) nor
computer-generated logs (Gaskell & Cobb, 2004; Pérez-Paredes, Sánchez-Tornel &
Alcarez Calero, 2012; Park, 2012; Park & Kinginger, 2010) were allowed in this lab for
security reasons, manual logs were the only means to obtain valuable data showing the
processes involved in the participants’ corpus consultation. The participants recorded their
own searches in a search log and filled in a feedback log based on the feedback sessions
(Chang, 2013). During the weekly interviews, the participants’ writing and search logs were
compared and utilized as a prompt to explain how searches were done.
The data collected were analyzed qualitatively using grounded theory techniques (Corbin
& Strauss, 2008). The data analysis went through three stages including open coding, axial
coding, and selective coding (Corbin & Strauss, 2008; Daengbuppha, Hemmington &
Wilkes, 2006) with the aid of MAXQDA (Version 10). The participants’ written produc-
tions and search logs as well as the researcher’s fieldnotes and memos were compared with
the primary data for triangulation (Lincoln & Guba, 1985; Stake, 1995). The analysis
and interpretation of the data was examined by the participants themselves as well as
the researcher’s colleagues to ensure the trustworthiness of the research (Lincoln &
Guba, 1985).

4. Findings

This section presents the main benefits and drawbacks of using COCA as a general corpus
and Michelangelo as a specialized corpus, as well as of corpus use in general.

4.1. Benefits of using COCA as a general corpus

∙ The general corpus is helpful in finding collocations, synonyms and exact expressions.
COCA is a highly interactive corpus that supports part-of-speech and genre-based sear-
ches, thereby enabling its users to find exactly what they want. D3 and others (D1, D4, D5,
M2, M3 and M4) appreciated this benefit.
Use of general and specialized corpora as reference sources 249

I feel more comfortable with COCA than Google Scholar. I used COCA and it
was more accurate. I could specify parts of speech. In the case of Google, I had
to examine its results after I did a search, but COCA just showed relevant results.
(D3-Interview-0325)
In particular, D1, D4 and D5 mentioned that they had benefited from COCA in finding
synonyms which enriched their otherwise ‘monotonous’ writing. This was possible because
it is not only grammatically but also semantically tagged.

My difficulty was writing boring paragraphs, using the same expressions over and
over again. Since COCA provides different expressions with similar meaning, it
helps me a lot in overcoming this difficulty. (D5-Survey-0622)
∙ The general corpus is a credible reference.
D3 and D4 reported that the general corpus was a credible writing reference, but their
reasons were different. While D4’s opinion derived from COCA’s genre-specific frequency
information, D3 highly valued its inclusion of more native-speaker English writing.

A thesaurus dictionary shows more synonyms than COCA, but it only lists words
without further information. It doesn’t give frequency information, and I can’t
determine which one to choose because I don’t have a native speaker’s intuition.
But COCA shows frequency information and examples. (D4-Interview-0430)

Google Scholar is beginning to lose my trust. I thought its results were correct, but
I realized there are also people like me on Google Scholar who made mistakes in
their publications. We cannot guarantee everything on Google Scholar is correct
because I also made mistakes and my papers are available online. If I had not gotten
any satisfying results with COCA, I would have searched Google. But there was
nothing that drove me to search Google. (D3-Interview-0603)
∙ The general corpus shows the English language of various genres.
D2 and D5 mentioned that one of the strong points of COCA was that it included different
English genres. However, they did not greatly appreciate this benefit, at least for their writing.

I can search English used in various fields, but it seems hard to search for English in
the field that I need. (D2-Survey-0622)

4.2. Drawbacks of using COCA as a general corpus

∙ The general corpus does not show the English language of the participants’ fields
COCA includes a number of academic journals and magazines related to engineering
such as Mechanical Engineering and IBM Journal of Research & Development. Never-
theless, all of the participants, except for M1 and M2, mentioned that it did not include
enough texts related to their academic fields.

I couldn’t get search results with COCA. It showed only some of my searches like
the words that are used a lot, such as “cpu.” It showed only those simple words.
(M3-Interview-0211)
250 J. -Y. Chang

Furthermore, as not only EFL but also graduate students, they wondered about whether or
not they could use in their writing what they found through COCA.

Although some phrases I’ve found with COCA look okay in terms of grammar or
expressions, I’m not so sure I can use them in my field. (D5-Survey-0622)
∙ The general corpus is not easy to use.
Except for M3 and M5, none of the participants felt that COCA was easy to use. Although
they admitted that its highly sophisticated search functions enabled them to obtain the very
expressions they wanted, they complained that it had taken some time for them to famil-
iarize themselves with those functions. However, M2 suggested that this problem could be
overcome easily once he had consulted it a lot.

I sometimes forget how to use COCA, but that’s because I haven’t used it a lot. If I
use it a lot, it won’t be a problem. (M2-Survey-0622)

4.3. Benefits of using Michelangelo as a specialized corpus

∙ The specialized corpus shows the English language the users want to acquire.
Except for M1, who did not use any corpora at all during the research period for personal
reasons, the other participants highly valued this benefit. Coxhead and Nation (2001)
classified English vocabulary into four categories: high frequency words, academic voca-
bulary, technical vocabulary, and low frequency words. According to the participants,
Michelangelo was beneficial across all categories.

In fact, the word that I type in is not the focus. I usually look for how that word is
used in my field and how frequently it is used, so dictionaries are not helpful…
What is difficult is mainly those terms like “FTL” or “garbage collection,” which
we do not use every day. Michelangelo shows how uncommon words such as those
are used in my field, and that is its biggest advantage. (D2-Interview-0506)

I feel the usage of a word is very different depending on the context. Although a
word is frequently used, if it is not used in my field, I feel that I don’t need to use it.
That’s why I looked up words in Michelangelo. (M3-Interview-0401)
∙ The specialized corpus has a ‘summary effect’.
A summary effect is an in vivo code (Corbin & Strauss, 2008: 65), which is borrowed
from D2’s description. D2 and others (D3, D4 and M3) commented that it helped them
focus on target expressions.

I consulted Google Scholar by using double quotation marks. Unlike Miche-


langelo’s line-by-line search results, its search results were mixed together with
journal information. And hundreds of pages came up for my search word in Google.
I think Michelangelo has a summary effect. (D2-Interview-0429)
In particular, as the specialized corpus shows target expressions aligned and highlighted
in the center, D2 added that the specialized corpus was fast (see Section 4.6).
Use of general and specialized corpora as reference sources 251

4.4. Drawbacks of using Michelangelo as a specialized corpus

∙ The specialized corpus does not have any drawbacks.


In the final survey, D2 and M2, who said in the interviews that the specialized corpus was
fast, left the comment that they did not find any drawbacks to the specialized corpus.
∙ The specialized corpus needs another computer program.
D5 felt discomfort in using the concordancer for Michelangelo.

I tend to consult Michelangelo least often. I only use it for the feedback log.

(Why?)

It’s bothersome to implement another program. Other references can be opened in


one browser. (D5-Interview-0408)
In fact, D5’s discomfort was not caused by the specialized corpus per se; rather, it was
related to the use of an offline concordance program for an offline corpus. Some participants
(D3, M3 and M4) also stated that they were dissatisfied with the lack of advanced search
functions in Michelangelo, which stemmed from their comparison of the search possibilities
on an untagged corpus and on COCA. In this regard, D3 made the suggestion of compiling a
COCA-like specialized corpus.

I think it’s urgent to compile a COCA-like corpus of research papers specialized in


each research field. For example, in the case of the CS field, it would be appropriate
to compile a COCA-like corpus of CS articles published in ACM or IEEE journals.
(D3-Survey-0622)
∙ The specialized corpus does not show enough results.
Half of the participants (D1, D4, D5, M3 and M5) indicated that Michelangelo needed
more papers to be a helpful reference source in terms of both quantity and quality. They
pointed out that Michelangelo sometimes failed to show target expressions because of a lack
of relevant papers.

Because I didn’t contribute a lot to the compilation of the corpus, I sometimes found
no results for the target expressions I wanted. (D5-Survey-0622)
In fact, this drawback was due in part to the initial false assumption that all of the
lab students belonged to the same research field. It was later realized that the students
were largely divided into two research fields, mobile systems and storage systems,
and that Michelangelo included more papers related to the latter, which was D2’s
research field.

When Michelangelo was compiled, I gave you really a lot of references. I gave you
the references I had collected, so it was helpful to me, I guess. Other students did
not give you enough references, so it was not as helpful to them as it was to me.
(D2-Interview-0318)
∙ The specialized corpus is a less credible reference.
252 J. -Y. Chang

D3 and M5 considered Michelangelo to be a less credible reference, but for different


reasons. While the lack of search results led M5 to lose his trust in Michelangelo,2 D3, who
judged COCA to be credible, suggested that Michelangelo should contain more NES texts
to improve its credibility.

I thought “search time” was only used in Korea, so I searched for it in


Michelangelo, and there were a lot of results. But many of them came from
one paper. If that paper had been written by foreigners, then the results would
have been credible, but it was written by Chinese people, so I felt something
was strange because the expression was mainly used in that paper. So I searched
COCA, and it showed some results, so I came to know that that expression
is usable.
(You mean only credible papers should be included in the corpus?)
If possible, it should mainly consist of native English speakers’ papers. (D3-
Interview-0506)
∙ The specialized corpus can induce its users to commit plagiarism.
D2 and D4 indicated the possibility that they could unintentionally copy others’ expressions
by consulting Michelangelo. Unlike COCA, Michelangelo consisted of academic papers
directly related to the participants’ research areas. As a result, plagiarism might occur by
encouraging the borrowing of phrases or entire sentences rather than finding phraseological
patterns.

When I looked up target words in Michelangelo, I sometimes found sentences I


could refer to. I could get expressions similar to what I intended to use. It could be
plagiarism but was useful. (D4-Survey-0622)
In fact, at the beginning of the research, D4 avoided using corpora because he feared that
he might commit plagiarism.

If I start to consult a corpus, I can get clearer sentences, but, if I were to use them, they
would look similar to the originals, and then they would be copies. (D4-Interview-0205)
His initial fear of using corpora seems to stem from his previous writing habit of copying
expressions from other papers, i.e., using related papers as reference sources for his writing
(J. Flowerdew & Li, 2007).

Especially when I wrote difficult sentences, I searched for similar papers to mine to
find sentences similar to what I had in mind and used them in my paper. For
example, if a sentence occurred in my mind, but was difficult for me to express or
for other people to understand, and if those sentences I wanted to write were clearly
expressed in other papers, I just used them but came to follow the logic of the
original papers. (D4-Interview-0205)
The participants’ tendency to borrow expressions originated from their awareness that
they should follow the writing conventions of their discipline. Because of this obligation,

2
See Section 4.6 for further information on the characteristics of M5.
Use of general and specialized corpora as reference sources 253

combined with their lack of confidence in English, they hesitated to create their own
sentences.

The creativity of expressions is somewhat lacking in this field. As I started to use


the corpus, I came to use formulaic expressions more often. This is not copying, but
I tend to use the same expressions as the originals because of the feeling of security.
(D2-Interview-0513)

4.5. Overall benefits of corpus use

∙ Corpora help with English writing.


Some participants (D2, D4 and M3) explicitly said that they had gained confidence in
what they had written and had produced better writing with the aid of corpora. D3, D4 and
M3 mentioned that corpora were better than Google in that the web showed too many search
results in an apparently random order.

4.6. Overall drawbacks of corpus use

∙ Corpora are mere language reference sources.


M3 and M4 indicated that corpora were not helpful in constructing sentence structures.

When I have to write something in English, it’s difficult to construct sentence


structures. You know, there are several expressions with the same meaning. It takes
lots of time to think about how to express my ideas.
(Didn’t the corpora help you with that?)
It’s not that helpful in the structural aspect. (M4-Interview-0401)
As regards his writing difficulties in constructing English sentences, M3 hoped that he could
access context-aware corpora in the future which display possible sentences for given words.

I’m dreaming that a future corpus recognizes contexts in writing. For example, if
I just type in “program,” then it shows “program variation.” Rather than preposi-
tions or verbs for each word, it shows what words can be included in a sentence that
I’m going to write. After all, what I want to do is to construct a sentence, but I don’t
have much writing experience, so what I’m doing now is just listing words.
Combining words is not easy, so it might be good if a corpus suggests possible
sentence structures someday. (M3-Interview-0528)
∙ Corpora are time-consuming to use.
In line with previous studies (Chambers, 2005; Chambers & O’Sullivan, 2004; Charles,
2011; Kennedy & Miceli, 2001; Liu & Jiang, 2009; H. Yoon & Hirvela, 2004), the participants
also complained that consulting corpora was time-consuming, but again their reasons were
different. It took M5 some time because he had to read English. On the other hand, D2 spent
time consulting Michelangelo frequently. Along with M2, he said that the corpus was fast.

This is really time-consuming. It really takes lots of time.


(Is it worth using it?)
254 J. -Y. Chang

Yes. Fast. Probably that’s because I searched for very easy parts, but it helped me a
lot. I’m usually confused about those easy parts. I usually search for those con-
fusing parts, and that’s why it’s effective. (D2-Interview-0312)
These disparate attitudes can be explained by the differences between M5 and D2. M5
was a first-year graduate student serving as a teaching assistant and not engaged in any
research project, while D2 was a doctoral student with publication experience. During the
research period, D2 wrote a conference paper as well as weekly reports on his experiments,
and what he needed was to check some confusing expressions which he already knew, while
M5 mainly wrote weekly reports describing what he did as a teaching assistant.
∙ Corpora require a certain level of English proficiency.
The participants’ independent corpus use revealed that it required considerable English
proficiency, including not only reading skills but also grammatical knowledge, to perform
relevant searches and interpret them appropriately. For example, M5 explained his search
process as follows:

I looked up that verb-preposition combination “compare to” but couldn’t get


results. So I came to know that the correct expression is “compare with.” I guess
that is why it doesn’t appear in COCA. (M5-Interview-0225)
It is evident that M5 could not find any search results for compare to because the verb
compare requires an object. However, he simply concluded that compare with was the
correct expression without further searches.
Another similar problem was detected in D4’s use of Michelangelo. Without sufficient
knowledge of article usage, D4 merely judged which articles to choose based on their
frequencies in the corpus.

What was confusing was the article of the noun “temperature,” and I found both
“a” and “the” in Michelangelo. So I judged which one was appropriate by com-
paring their frequencies. Because there were more definite articles, I chose “the.”
(D4-Interview-0319)
Participants D2, D4, D5, M4 and M5 failed to use corpora for some errors as they were
unaware they represented problems. They added that they would have consulted corpora if
they had known that they were wrong.

5. Discussion and concluding remarks

This study has examined the usefulness of general and specialized corpora as reference
sources for academic English writing through a case study. Based on its findings, several
suggestions can be made for EAP students’ active and appropriate corpus use. First of all,
this study shows that the main requirement for independent corpus use is to make sure that
learners have access to a corpus that meets their needs. Although the general corpus
received positive evaluations, the participants highly valued the specialized corpus for its
direct relevance to their academic fields. Their attitudes can be attributed to their tendency to
adhere strictly to the writing conventions of their discipline as NNES graduate engineering
students. Furthermore, given that D1 and D5 lost interest in Michelangelo because it did not
Use of general and specialized corpora as reference sources 255

contain papers directly related to their research area, a more thorough and detailed needs
analysis should precede the compilation of a specialized corpus. As suggested by Charles
(2012), one of the most effective ways to ensure a more efficient use of corpora would be to
encourage students to compile their own specialized corpora. Additionally, in support of the
new norms of academic English (cf. J. Flowerdew, 2008; Mauranen, 2003), Michelangelo
included a number of papers written by NNES researchers, which some participants felt
harmed the credibility of the corpus.
The findings also point to the need for a certain degree of language proficiency for
effective corpus consultation. During independent corpus use, some participants did not
even attempt to look up their errors in a corpus because they believed that they were correct,
or they lacked necessary grammatical knowledge for appropriate corpus search and inter-
pretation. Although Park (2012) considered that learner-corpus interactions can lead to
microgenetic development, the problem is that unnoticed errors are unlikely to initiate the
interaction at all. The need for language proficiency might appear to contradict Boulton’s
(2009, 2010) research, which suggested that students with low proficiency could benefit
from DDL without prior training. However, while Boulton’s paper-based classroom
materials were adjusted to the students’ level in advance by the teacher, the participants in
this study had to handle raw corpus data, taking charge of the entire process of corpus
consultation. These contextual and material differences may account for the difficulties in
independent corpus use without the intervention of an instructor.
As regards the grey area of unnoticed errors and inappropriate corpus searches, a com-
bination of two methods can be recommended to compensate for students’ limited language
proficiency. The first, as mentioned in earlier studies (Chambers & O’Sullivan, 2004;
Gaskell & Cobb, 2004; Liu & Jiang, 2009; O’Sullivan & Chambers, 2006; Todd, 2001), is
the underlining of errors by human proofreaders to invite students’ corpus consultation.
The alternative is to enhance students’ corpus consultation skills through a “corpus
apprenticeship” (Kennedy & Miceli, 2001, 2010), and these skills should be distinguished
from mere familiarity with new tools. For example, Charles (2011) divided corpus
competence into developmental stages of corpus awareness, corpus literacy, and corpus
proficiency; and corpus consultation is known to require and foster mental and cognitive
abilities (O’Sullivan, 2007).
Previous studies (Chambers, 2005; Gaskell & Cobb, 2004; Kennedy & Miceli, 2001,
2010) have emphasized the need for more extensive training for successful corpus
use. Despite the initial workshop and continuous corpus-driven feedback, most of the
participants in this study were still not familiar with the advanced search functions of
COCA, even though they considered them to be beneficial. Therefore, a variety of more
student-centered activities, including group discussions for better corpus consultation
strategies, may increase students’ corpus competence. For example, D4 offered a con-
structive suggestion as follows:

It would be a good idea to ask more questions to students. You can ask, for example,
“For this language problem, which one is better between COCA and Michelangelo?” or
“If you decide to consult COCA, how would you consult it?” Then even for a while,
students would think about it and, if they have different solutions, they can have a
discussion. Or it would be helpful to show parts of other students’ writings and ask how
to consult corpora. (D4-Survey-0622)
256 J. -Y. Chang

Finally, the findings suggest that, if corpus users write academic papers, they should
be provided with explicit guidelines to reassure them about plagiarism. Unlike COCA,
Michelangelo consisted of research articles compiled from the participants’ lists of biblio-
graphical references. As a result, despite being “a valuable resource for finding alternative
ways of saying things, or novel variations of specific lexico-grammatical patterns” (Lee &
Swales, 2006: 68), some of the participants expressed concern about the possibility of
unintentional plagiarism or “patchwriting” (Howard, 1995; Pecorari, 2003) in corpus use.
However, this is not exclusive to corpus use: undergraduate students can copy phrases from
textbooks (Currie, 1998), and graduate students borrow expressions from journal articles
(J. Flowerdew & Li, 2007). Therefore, rather than abandoning corpora altogether, EAP
instructors should offer clear instructions and follow-up guidance on plagiarism as part of
corpus training for NNES graduate students.
Although the web can be considered a corpus (Geluso, 2011; Park, 2012; Sha, 2010; Shei,
2008a, 2008b; Wu, Franken & Witten, 2009), this study introduced principled corpora and
presented their own advantages such as the accuracy and credibility of search results, the
summary effect, and an appropriate number of examples. It is true that, unlike corpora, Google
does not require long-term training for technical knowledge (Geluso, 2011); nor does it suffer
from a paucity of search results (Sha, 2010; Shei, 2008a, 2008b; Wu et al., 2009). Pérez-Paredes
et al. (2012) found that those who used both the BNC and the web performed better and that the
students did not properly exploit corpus search functions despite their prior experience and
guidelines. Nevertheless, the present study shows that there are still some benefits which only
corpora can provide, complementing Google and other language references.
Since this study is limited to one engineering lab, its findings may not be generalizable to
a larger population or other disciplines. Furthermore, since it dealt with perceptions of
corpora by the participants as a group, it does not show in detail how individual factors, such
as to what extent they consulted each corpus, interacted with their opinions. Further research
is thus necessary to consider how individual learner factors operate in corpus use. For future
research, it would be also interesting to investigate how corpus competence can be devel-
oped to improve L2 proficiency or to compensate for the lack of language skills in L2
writing. It would be worthwhile as well to redefine the role of corpora vis-à-vis Google in
language learning and teaching.

Acknowledgements

This article is based on my doctoral dissertation. I wish to thank my advisor, Dr. Sun-Young
Oh, and committee members, Drs. Oryang Kwon, Jin-Wan Kim, Byungmin Lee, and Jin-
Hwa Lee. I also wish to thank the editors and two anonymous reviewers for their valuable
comments. Finally, my gratitude extends to the participants and their lab professor.

References
Anthony, L. (2006) Developing a freeware, multiplatform corpus analysis toolkit for the technical
writing classroom tutorial. IEEE Transactions on Professional Communication, 49(3): 275–286.
Boulton, A. (2009) Testing the limits of data-driven learning: Language proficiency and training.
ReCALL, 21(1): 37–51.
Boulton, A. (2010) Data-driven learning: Taking the computer out of the equation. Language
Learning, 60(3): 534–572.
Use of general and specialized corpora as reference sources 257

Cargill, M. and Adams, R. (2005) Learning discipline-specific research English for a world stage: A
self-access concordancing tool? Higher education in a changing world: Proceedings of the 28th
HERDSA annual conference. Sydney: Higher Education Research and Development Society of
Australasia, 86–92.
Chambers, A. (2005) Integrating corpus consultation in language studies. Language Learning &
Technology, 9(2): 111–126.
Chambers, A. and O’Sullivan, Í. (2004) Corpus consultation and advanced learners’ writing skills
in French. ReCALL, 16(1): 158–172.
Chan, T.-P. and Liou, H.-C. (2005) Effects of web-based concordancing instruction on EFL students’
learning of verb-noun collocations. Computer Assisted Language Learning, 18(3): 231–251.
Chang, J.-Y. (2011) The use of general and specialized corpora as reference tools for academic and
technical English writing: A case study of Korean graduate students of engineering. Unpublished
doctoral dissertation. Seoul National University.
Chang, J.-Y. (2013) Korean graduate engineering students’ evaluations of feedback from a specialized
corpus on academic English writing. Korean Journal of Applied Linguistics, 29(1): 245–271.
Chapelle, C. A. and Duff, P. A. (2003) Some guidelines for conducting quantitative and qualitative
research in TESOL. TESOL Quarterly, 37(1): 157–178.
Charles, M. (2007) Reconciling top-down and bottom-up approaches to graduate writing: Using a
corpus to teach rhetorical functions. Journal of English for Academic Purposes, 6(4): 289–302.
Charles, M. (2011) Using hands-on concordancing to teach rhetorical functions: Evaluation and
implications for EAP. In: Frankenburg-Garcia, A., Flowerdew, L. and Aston, G. (eds.), New trends
in corpora and language learning. London: Continuum, 26–43.
Charles, M. (2012) ‘Proper vocabulary and juicy collocations’: EAP students evaluate do-it-yourself
corpus-building. English for Specific Purposes, 31(2): 93–102.
Cho, D. W. (2009) Science journal paper writing in an EFL context: The case of Korea. English for
Specific Purposes, 28(4): 230–239.
Cobb, T. (1997) Is there any measurable learning from hands-on concordancing? System, 25(3):
301–315.
Corbin, J. and Strauss, A. (2008) Basics of qualitative research: Techniques and procedures for
developing grounded theory. Los Angeles, CA: Sage Publications.
Coxhead, A. and Nation, P. (2001) The specialised vocabulary of English for academic purposes. In:
Flowerdew, J. and Peacock, M. (eds.) Research perspectives on English for academic purposes.
Cambridge: Cambridge University Press, 252–267.
Currie, P. (1998) Staying out of trouble: Apparent plagiarism and academic survival. Journal of
Second Language Writing, 7(1): 1–18.
Daengbuppha, J., Hemmington, N. and Wilkes, K. (2006) Using grounded theory to model visitor
experiences at heritage sites: Methodological and practical issues. Qualitative Market Research: An
International Journal, 9(4): 367–388.
Duff, P. A. (2008) Case study research in applied linguistics. New York: Lawrence Erlbaum
Associates.
Estling Vannestål, M. and Lindquist, H. (2007) Learning English grammar with a corpus:
Experimenting with concordancing in a university grammar course. ReCALL, 19(3): 329–350.
Flowerdew, J. (2008) Scholarly writers who use English as an additional language: What can
Goffman’s ‘Stigma’ tell us? Journal of English for Academic Purposes, 7(2): 77–86.
Flowerdew, J. and Li, Y. (2007) Language re-use among Chinese apprentice scientists writing for
publication. Applied Linguistics, 28(3): 440–465.
Flowerdew, L. (2001) The exploitation of small learner corpora in EAP materials design. In:
Ghadessy, M., Henry, A. and Roseberry, R. (eds.), Small corpus studies and ELT: Theory and
practice. Amsterdam: John Benjamins, 363–379.
258 J. -Y. Chang

Gaskell, D. and Cobb, T. (2004) Can learners use concordance feedback for writing errors? System, 32
(3): 301–319.
Geluso, J. (2011) Phraseology and frequency of occurrence on the web: Native speakers’ perceptions
of Google-informed second language writing. Computer Assisted Language Learning, 26(2):
144–157.
Hafner, C. A. and Candlin, C. N. (2007) Corpus tools as an affordance to learning in professional legal
education. Journal of English for Academic Purposes, 6(4): 303–318.
Hanauer, D. I. and Englander, K. (2011) Quantifying the burden of writing research articles in a
second language: Data from Mexican scientists. Written Communication, 28(4): 403–416.
Howard, R. M. (1995) Plagiarisms, authorships, and the academic death penalty. College English, 57
(7): 788–806.
Johns, T. (1991) Should you be persuaded: Two samples of data-driven learning materials. In: Johns,
T. and King, P. (eds.) Classroom concordancing. English Language Research Journal, 4: 1–16.
Johns, T. (1994) From printout to handout: Grammar and vocabulary teaching in the context of
data-driven learning. In: Odlin, T. (ed.) Perspectives on pedagogical grammar. Cambridge:
Cambridge University Press, 293–313.
Kennedy, C. and Miceli, T. (2001) An evaluation of intermediate students’ approaches to corpus
investigation. Language Learning & Technology, 5(3): 77–90.
Kennedy, C. and Miceli, T. (2010) Corpus-assisted creative writing: Introducing intermediate Italian
learners to a corpus as a reference resource. Language Learning & Technology, 14(1): 28–44.
Lee, D. and Swales, J. (2006) A corpus-based EAP course for NNS doctoral students: Moving from
available specialized corpora to self-compiled corpora. English for Specific Purposes, 25(1): 56–75.
Lincoln, Y. S. and Guba, E. G. (1985) Naturalistic inquiry. Beverly Hills, CA: Sage Publications.
Liu, D. and Jiang, P. (2009) Using a corpus-based lexicogrammatical approach to grammar instruction
in EFL and ESL contexts. Modern Language Journal, 93(1): 61–78.
Mauranen, A. (2003) The corpus of English as lingua franca in academic settings. TESOL Quarterly,
37(3): 513–527.
O’Sullivan, Í. (2007) Enhancing a process-oriented approach to literacy and language learning: The
role of corpus consultation literacy. ReCALL, 19(3): 269–286.
O’Sullivan, Í. and Chambers, A. (2006) Learners’ writing skills in French: Corpus consultation and
learner evaluation. Journal of Second Language Writing, 15(1): 49–68.
Park, K. (2012) Learner–corpus interaction: A locus of microgenesis in corpus-assisted L2 writing.
Applied Linguistics, 33(4): 361–385.
Park, K. and Kinginger, C. (2010) Writing/thinking in real time: Digital video and corpus query
analysis. Language Learning & Technology, 14(3): 31–50.
Pecorari, D. (2003) Good and original: Plagiarism and patchwriting in academic second-language
writing. Journal of Second Language Writing, 12(4): 317–345.
Pérez-Paredes, P., Sánchez-Tornel, M. and Alcaraz Calero, J. M. (2012) Learners’ search patterns
during corpus-based focus-on-form activities. International Journal of Corpus Linguistics, 17(4):
483–516.
Sha, G. (2010) Using Google as a super corpus to drive written language learning: A comparison with
the British National Corpus. Computer Assisted Language Learning, 23(5): 377–393.
Shei, C.-C. (2008a) Discovering the hidden treasure on the Internet: Using Google to uncover the veil
of phraseology. Computer Assisted Language Learning, 21(1): 67–85.
Shei, C. (2008b) Web as corpus, Google, and TESOL: A new trilogy. Taiwan Journal of TESOL, 5(2):
1–28.
Shin, I. (2010) The importance of English for Korean postgraduate engineering students in the
global age. English Teaching, 65(1): 221–240.
Stake, R. E. (1995) The art of case study research. Thousand Oaks, CA: Sage Publications.
Use of general and specialized corpora as reference sources 259

Sun, Y.-C. and Wang, L.-Y. (2003) Concordancers in the EFL classroom: Cognitive approaches and
collocation difficulty. Computer Assisted Language Learning, 16(1): 83–94.
Thurston, J. and Candlin, C. N. (1998) Concordancing and the teaching of the vocabulary of academic
English. English for Specific Purposes, 17(3): 267–280.
Todd, R. W. (2001) Induction from self-selected concordances and self-correction. System, 29(1):
91–102.
Tribble, C. (2002) Corpora and corpus analysis: New windows on academic writing. In: Flowerdew,
J. (ed.), Academic discourse. Harlow: Longman, 131–149.
Uzuner, S. (2008) Multilingual scholars’ participation in core/global academic communities:
A literature review. Journal of English for Academic Purposes, 7(4): 250–263.
Wood, A. (2001) International scientific English: The language of research scientists around the
world. In: Flowerdew, J. and Peacock, M. (eds.), Research perspectives on English for academic
purposes. Cambridge: Cambridge University Press, 71–83.
Wu, S., Franken, M. and Witten, I. H. (2009) Refining the use of the web (and web search) as a
language teaching and learning resource. Computer Assisted Language Learning, 22(3): 249–268.
Yoon, C. (2011) Concordancing in L2 writing class: An overview of research and issues. Journal of
English for Academic Purposes, 10(3): 130–139.
Yoon, H. (2008) More than a linguistic reference: The influence of corpus technology on L2 academic
writing. Language Learning & Technology, 12(2): 31–48.
Yoon, H. and Hirvela, A. (2004) ESL student attitudes toward corpus use in L2 writing. Journal of
Second Language Writing, 13(4): 257–283.

Appendix 1: Interview questions

1. What was difficult when you wrote in English?


2. What references did you consult when you wrote in English? What did you look up
in them?
3. Did you consult COCA/Michelangelo? What did you search in it?
4. How was COCA/Michelangelo?
5. Did you experience any difficulty in consulting COCA/Michelangelo?
6. Was there any change in your attitude as well as your writing by consulting COCA/
Michelangelo?

Appendix 2: Questions in the second survey

1. For what purposes have you consulted the following references? (e.g. COCA,
Michelangelo, Google/Google Scholar, dictionaries, and others)
2. If you do not use COCA/Michelangelo, please tell me why.
3. In what ways is COCA/Michelangelo helpful for your writing? What difficulties
does COCA/Michelangelo help resolve?
4. In what ways do you feel discomfort with COCA/Michelangelo in your writing?
What problems have you encountered in consulting COCA/Michelangelo?
5. Please write any comments or suggestions.
Copyright of ReCALL is the property of Cambridge University Press and its content may not
be copied or emailed to multiple sites or posted to a listserv without the copyright holder's
express written permission. However, users may print, download, or email articles for
individual use.

You might also like