Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Journal of English for Academic Purposes 30 (2017) 38e52

Contents lists available at ScienceDirect

Journal of English for Academic Purposes


journal homepage: www.elsevier.com/locate/jeap

At the same time: Lexical bundles in L1 and L2 university


student argumentative writing
Tetyana Bychkovska a, *, Joseph J. Lee b
a
Department of English, George Mason University, MSN 3E4, 4400 University Drive, Fairfax, VA 22030, USA
b
ELIP Academic & Global Communication Program, Department of Linguistics, Ohio University, 383 Gordy Hall, Athens, OH 45701, USA

a r t i c l e i n f o a b s t r a c t

Article history: This corpus-based study compares L1-English and L1-Chinese undergraduate students' use
Received 11 April 2017 of lexical bundles in English argumentative essays, and identifies the most common
Received in revised form 12 October 2017 bundle misuses in L2 student writing. Data consist of two corpora of student-produced
Accepted 27 October 2017
argumentative essays: 101 high-rated essays written by L1-English students and 105
high-rated essays written by L1-Chinese students. Using Biber's (Biber et al., 1999; Biber
et al., 2004) structural and functional taxonomy, we compared the forms and functions of
Keywords:
four-word bundles used by L1-English and L1-Chinese university students. Findings
Academic writing
Argumentative essays
indicate that L2 students not only use substantially more bundle types and tokens than L1
Corpus linguistics writers, but the structural and functional patterns of bundles also differ. While L1 writers'
First-year composition bundles consist of mostly noun and preposition phrases, L2 students use significantly more
Lexical bundles verb phrase (clausal) bundles. Results also show that L2 student writers use significantly
Misuses more stance bundles than L1 writers. In addition, most of the misused bundles in the L2
writers' essays pertain to grammatical mistakes, particularly with articles and preposi-
tions. We conclude with some pedagogical implications for ESL composition.
© 2017 Elsevier Ltd. All rights reserved.

1. Introduction

Recently, a growing body of studies has focused on university student writing, as the number of second language (L2)
writers in universities in English-dominant countries, such as the US, has increased exponentially (Staples & Reppen, 2016).
Guided by the need to assist such learners in developing competence in academic writing, researchers have examined lexico-
grammatical features in these students’ timed placement tests (e.g., Hinkel, 2003), prompted TOEFL writing (e.g., Staples,

Egbert, Biber, & McClair, 2013), and disciplinary texts (e.g., Adel & Erman, 2012). Although these studies have provided
valuable insights into L2 student writing in such situations, surprisingly little research has concentrated on matriculated L2
undergraduate student writing in the context of first-year composition (FYC), despite the fact that these writing classes are
required for nearly all first language (L1) and L2 undergraduate students in the US (Aull & Lancaster, 2014). The few available
studies focusing on L2 student writing in US-based FYC have examined interpersonal resources (e.g., Lee & Deakin, 2016) or
other linguistic aspects such as phrasal and clausal features (e.g., Staples & Reppen, 2016).
Even fewer studies have investigated the use of lexical bundles, or recurrent multiword sequences (Biber, Johannson,
Leech, Conrad, & Finegan, 1999), in L2 FYC student writing, although such formulaic units play critical functions in

* Corresponding author.
E-mail address: tbychkov@gmu.edu (T. Bychkovska).

https://doi.org/10.1016/j.jeap.2017.10.008
1475-1585/© 2017 Elsevier Ltd. All rights reserved.
T. Bychkovska, J.J. Lee / Journal of English for Academic Purposes 30 (2017) 38e52 39

academic writing (Biber, Conrad, & Cortes, 2004; Biber et al., 1999; Cortes, 2004). While a few studies have examined the use
of lexical bundles in L1 and L2 university student texts (e.g., Chen & Baker, 2010; Huang, 2015; Adel € & Erman, 2012), they
consist of data drawn from timed essays or disciplinary writing. Little, however, is known about the lexical bundles used by L2
undergraduate students in FYC, and how they compare with L1 students.
Using Biber's (Biber et al., 1999, 2004) structural and functional taxonomy, this corpus-based study compares the use of
lexical bundles in English argumentative essays produced by L1-English senior-level undergraduate students and L1-Chinese
undergraduate students in an ESL FYC course in the US. Gaining a deeper appreciation of how these L1 and L2 student writers
use bundles can provide valuable insight for assisting L2 learners in acquiring better control over these crucial building blocks
of language more effectively.

2. Lexical bundles in academic writing

Lexical bundles refer to the most frequently recurring multiword sequences of three or more words in a register or genre
(Biber et al., 1999). Unlike idioms which are relatively fixed expressions, lexical bundles are extended collocations that “are
usually not complete structural units, and usually not fixed expressions” (Biber & Conrad, 1999, p. 183), such as is likely to, as a
result of, and at the end of the. In order to qualify as lexical bundles, multiword expressions must meet frequency and
dispersion criteria: occur at a frequency of at least 20e40 times per million words and across five different texts (Biber et al.,
2004; Cortes, 2004).
Over the past two decades, lexical bundles have received considerable attention in the academic writing literature. Biber
et al. (2004) compared lexical bundles across spoken and written registers: conversations, classroom teaching, textbooks, and
academic prose. They found that spoken registers not only include more types and higher frequencies of bundles than written
texts, they also differ in bundle structures and functions. Conversations and classroom teaching are comprised of mainly verb-
phrase based (VP-based) bundles while noun-phrase and preposition-phrase based (NP- and PP-based) bundles are preferred
in textbooks and academic prose. They also discovered that conversations mainly rely on stance bundles (e.g., I don't know
what, I don't want to), but textbooks and academic prose consists of a greater number of referential bundles (e.g., one of the
most, in the case of). Such differences occur, as Conrad and Biber (2005) explain, because academic texts place greater
importance on presenting primarily factual information while spoken language emphasizes interpersonal interactions.
Focusing specifically on academic writing, researchers have investigated lexical bundles in published research articles
(RAs), PhD dissertations, master's theses, and disciplinary student writing. Hyland (2008a, 2008b), for example, compared
bundles in RAs, PhD dissertations, and master's theses across four disciplines: engineering, microbiology, business, and
applied linguistics. Cortes (2004) compared RAs and student writing in history and biology. Both studies found disciplinary
variation in the use of bundles, with the hard science fields relying more heavily on bundles than humanities and social
sciences. They also found a mismatch in the structural, functional, and frequency patterns of bundles used between
student writers and disciplinary experts. Such differences, as Hyland (2008b) suggests, may be explained by the fact that
student-produced genres serve different purposes and readers. Unlike disciplinary experts, students engage in a diversity of
pedagogic genres for the purposes of reader assessment of competence. As Cortes (2004) found, most papers that
students write, especially undergraduate students, are not research papers. It could be that, while there may be some general
lexical bundles used across academic texts, genre may be an important variable in the employment of bundles (Hyland,
2008b). For instance, Hyland (2008b) found that many bundles frequently used in RAs are rarely employed in disserta-
tions and theses, and Cortes (2013) even found that specific bundles are associated with particular rhetorical moves in RA
introductions.
Additionally, Cortes (2008) compared history RAs written in English and Spanish by respective L1 writers. It was found
that Spanish RAs include a greater number of bundles, yet the bundles in both groups have similar structures and perform
identical functions. Comparisons of English RAs written by disciplinary experts from dissimilar L1 backgrounds, however,
show different trends. For example, in a comparative analysis of English Telecommunications RAs written by L1-English and
L1-Chinese professionals, Pan, Reppen, and Biber (2016) found that L2 texts not only include a greater number of bundle types
and tokens, but they also differ from L1 texts in structural patterns. Unlike L1-English RAs, texts produced by L1-Chinese
professionals were found to include more VP-based bundles than NP- or PP-based types, thus showing their preference
for clausal over phrasal bundles. Similarly, in their comparison of doctoral dissertations written in English by mainland
Chinese students and published RAs, Wei and Lei (2011) found that L2 dissertations consist of more bundle types and tokens
as well as more VP-based bundles than RAs.
These studies show that genre, discipline, writer level, and L1 may impact the use of lexical bundle in important ways.
Most research in this area, however, has focused on specialist texts and/or how bundles in published RAs differ from those in
student writing. Although such comparisons may be a crucial starting point, specialist texts may not represent appropriate
targets for students, particularly undergraduate students, as these students and specialists not only write different genres, but
their texts also differ in purpose, audience, scope, and evaluation (Lee & Casal, 2014). These studies also tell us little of the
ways in which L2 undergraduate students use such multiword sequences.
In response to better understanding L2 student writing, a few studies have examined lexical bundles in L1 and/or L2
student texts. Staples et al. (2013) investigated bundle variation across three proficiency levels in prompted TOEFL writing.
They found that lower-proficiency students used the most bundles while the highest-proficiency group used the fewest, thus
supporting second language acquisition (SLA) theory that, as learners gain proficiency in an L2, they tend to employ fewer
40 T. Bychkovska, J.J. Lee / Journal of English for Academic Purposes 30 (2017) 38e52

formulaic constructions (Ellis, 1996). They, however, reported no differences in terms of the functional types used. Chen and
Baker (2010) analyzed lexical bundles in English essays written by L1-English and L1-Chinese university students in the
context of disciplinary courses, and compared these novice academic essays with expert texts. They report that certain
bundles were more or less frequent in L2 writers' texts than in L1 students' papers, which is similar to findings of English
linguistics papers written by L1-English and L1-Swedish undergraduates (Adel € & Erman, 2012). Both studies indicate that L1
and L2 students share very few bundles, and L2 students’ texts include fewer bundle types and tokens, yet the structures and
functions of bundles used by L1 and L2 writers are proportionally equal. In her study comparing L1-Chinese English-as-a-
foreign-language (EFL) university students majoring in English and L1-English students, however, Pang (2009) found that the
L2 writers employed four times more bundles than L1 writers, and the L1-Chinese students primarily used the VP-based and
stance bundles, supporting Pan et al. (2016).
Most of these studies on L1 and L2 undergraduate student writing have focused on timed essays, EFL student texts, TOEFL
written responses, and discipline-oriented texts. Little is known specifically of English-as-a-second-language (ESL) university
students' use of lexical bundles in assessed argumentative essays written in the context of US-based FYC. In the US, both L1
and L2 undergraduates receive formal academic writing instruction in FYC courses. In the US, as Aull and Lancaster (2014)
report, “thousands of ….L1 … and L2 students enter general writing courses and are assumed to transition into university-
level discourse” through FYC courses, yet “we know little about what discursive features might characterize argumenta-
tive texts produced” by such students (p. 153). In one rare study on bundles used by university students in a US-based FYC
course, Cortes (2002) found that the students used structurally and functionally similar bundles to those found in general
academic prose (Biber et al., 1999). The students’ L1 backgrounds, however, were not specified and four genres were com-
bined in the analysis, even though previous studies have demonstrated the importance of such variables in the employment
of bundles (e.g., Biber et al., 2004; Pan et al., 2016). Further, while studies have reported that certain bundles are highly
represented in L2 university student writing while others are not (e.g., Chen & Baker, 2010), few have examined the accuracy
of bundles in L2 student texts, one exception being Huang (2015). In this study, Huang found that the overall accuracy of
bundles in short, timed essays produced by L1-Chinese EFL undergraduate writers was extremely high (over 92%), with the
few mistakes having mostly to do with grammar such as agreement and infinitives. Yet the essays were on average only 280
words, and the analysis was based on three-word bundles, where opportunities for making mistakes are presumably lower.
Therefore, limiting the analysis to short, timed essays, standardized tests, or discipline-oriented texts, which in many ways
differ from source-based essays written within US-based FYC classes (Staples & Reppen, 2016), tells us little of how ESL FYC
students use lexical bundles in their writing, and offers limited instructional guidance for FYC teachers in helping these
particular learners.
To provide greater insight into L2 English undergraduate students' use of multiword units and how they compare with
final-year undergraduate writers, considered a more realistically appropriate reference for comparison than published aca-

demic prose (Aull & Lancaster, 2014; Adel & Ro€ mer, 2012), this study reports on a comparative corpus-based analysis of lexical
bundles in English argumentative essays written by US-based L1-Chinese students in an ESL FYC course and senior-level L1-
English students. It also examines the most commonly misused bundles in L1-Chinese students’ texts. The following research
questions guided the study:

1. What differences exist in the structural types of lexical bundles used by L1-English and L1-Chinese undergraduate stu-
dents in English argumentative essays?
2. What differences exist in the functional types of lexical bundles used by L1-English and L1-Chinese undergraduate stu-
dents in English argumentative essays?
3. What are the most commonly misused bundles in L1-Chinese undergraduate students' English argumentative essays?

3. Corpora and methodology

Before going into the details of the corpora and methods used, it may be constructive to discuss the motivation for
comparing argumentative essays written by L1-Chinese undergraduate students in the context of FYC with those produced by
L1-English senior undergraduates for disciplinary courses. Such a comparative analysis, without caution, may unintentionally
lead to “a monolithic conception of good writing based on practices” of L1-English students (Heng Hartse & Kubota, 2014, p.
73), viewing deviations from such practices as deficiencies. It is important to acknowledge from the outset that these groups
of students were at different stages of both academic and linguistic development, and the essays were produced for different
sets of readers in dissimilar contexts of writing. Moreover, as discussed below, texts produced by both groups earned high
grades. Therefore, in our perspective, rather than viewing the L1-English senior undergraduates as the norm for L1-Chinese
FYC students, especially since it is widely acknowledged that academic writing is a nonnative variety for everyone, the former
in this study is merely included as one point of reference in order to shed light on differences in lexical bundle choices be-
tween the two groups of students (Leedham, 2015). Identifying such differences may lead to a better understanding of L1-
Chinese learners’ use of lexical bundles in academic writing, at least at this early stage of university education, and pro-
vide important insight for L2 writing pedagogy.
T. Bychkovska, J.J. Lee / Journal of English for Academic Purposes 30 (2017) 38e52 41

3.1. Description of the corpora

This study's corpora consist of assessed argumentative essays written by US-based L1-English and L1-Chinese ESL un-
dergraduate students. The argumentative essay was selected, as it is considered the most common assessed genre under-
graduate students write, especially in soft knowledge fields and in FYC (Aull & Lancaster, 2014). The L1-English essays are
derived from the argumentative essay subset of the Michigan Corpus of Upper-level Student Papers (MICUSP., 2009), an
approximately 2.6 million word corpus of various high-rated (i.e., A-graded) academic papers produced by L1 and L2 senior-
level undergraduate and graduate students across disciplines at the University of Michigan (see Adel € € mer, 2012; for
& Ro
details). Utilizing MICUSP Simple's filtering functions,1 all argumentative essays written by only L1-English senior-level
undergraduate students were searched. Of the 114 argumentative essays produced by senior-level undergraduate stu-
dents, the search resulted in 101 argumentative essays written by L1 students, mostly from the humanities and social sci-
ences: economics (1),2 education (1), English (48), history and classical studies (6), linguistics (2), natural resources and
environment (3), nursing (3), philosophy (10), political science (14), psychology (6), and sociology (7). These 101 MICUSP
essays comprised the L1-English corpus (220,233 words).
The assessed ESL argumentative essays included in the study are a subset of the Corpus of Ohio Learner and Teacher
English (COLTE), a large collection of ESL student writing and teacher written feedback at Ohio University in the US Midwest.3
The learner corpus (henceforth COLTE) is comprised of argumentative essays written by L1-Chinese ESL students in 20
different sections of the first of two courses in the FYC sequence. The course is designed specifically for international un-
dergraduate students by the Linguistics Department's ELIP Academic & Global Communication Program, and taught exclu-
sively by L2 writing specialists. Students are placed in this course if their TOEFL iBT writing section score is below 24,
institutional intensive English program's (IEP) composition test score is 5 out of 6,4 and/or grade is B or higher in the IEP's
advance composition course. While various nationalities are represented in this course and institution, the predominant
student group is from mainland China, similar to most US universities (IIE, 2016). The standardized curriculum is designed to
develop students' academic knowledge of and skills in organization, coherence, idea development, summarizing, para-
phrasing, grammar, vocabulary, and source use. The course is intended to not only improve students' general academic
writing abilities that can be transferred to other writing situations, but it also prepares students for the second course in the
FYC sequence, which fulfills their institutional FYC requirement for graduation.
The 105 COLTE essays (105,043 words) selected were the final assessed, source-based argumentative essay assignment
written by 42 female and 63 male Chinese ESL university students. Their average age was 21.6 years (SD ¼ 1.81), and they on
average studied English for 7.8 years (SD ¼ 3.41) in their home countries. They also on average studied in US-based IEPs for 4.1
terms (SD ¼ 1.79), and lived in the US for an average of 25 months (SD ¼ 11.37).
The essay assignment required students to select topics connected to general themes (e.g., education, economy, education,
environment, health, military, psychology), and write between 900 and 1200 words, using at least four academic sources.
Following the process-based approach, this final essay assignment required students to write a proposal, outline, and three
drafts. Each draft was assessed and provided written feedback on different dimensions using a standardized grading rubric
that all teachers of the FYC course used. The rubric included categories of content, organization, source use, and language use.
Teachers were expected to focus only on content and organization on draft one. However, they graded and provided feedback
on content, organization, source use, and language use on drafts two and three. Draft two was specifically selected for the
analysis because it was the revised version of draft one that received feedback on global concerns (i.e., content, organization)
but not on language issues (e.g., grammar, vocabulary). The essays were assessed by six different ESL writing instructors, all of
whom had at least an MA in TESOL/applied linguistics. The selected COLTE essays were highly rated, averaging 88.5%
(SD ¼ 4.93).
All essays in the two corpora were then cleaned; we removed the paper codes, titles, section headers, footers, and
reference lists. While keeping the original labels for the MICUSP essays (e.g., SOC.G0.01.1 for Sociology, Final-Year Under-
graduate), we labeled the COLTE essays according to the order in which they were collected (e.g., COLTE-1, COLTE-2). Table 1
provides descriptions of the two corpora used in this study.

3.2. Lexical bundle identification

Since the purpose of this study was to identify differences in lexical bundle use in L1 and L2 writing, we analyzed the
MICUSP and COLTE separately and identified two sets of lexical bundles. We specifically focused on four-word lexical bundles
because three-word bundles are frequently part of four-word clusters (Cortes, 2004); the number of four-word bundles is

1
http://miscusp.elicorpora.info.
2
The bracketed numbers represent the total number of texts for each field in MISCUP.
3
The Corpus of Ohio Learner and Teacher English (COLTE) is an ongoing 5-year corpus project of the English used by ESL learners and teachers currently
being compiled by the ELIP Classroom Research Unit at Ohio University. Since September 2013, we have collected thousands of samples of assessed ESL
student writing and teachers' electronic written feedback.
4
Students earning a score of 6 on the institutional IEP composition test are placed into the second course in the FYC sequence specifically designed for
international and multilingual students.
42 T. Bychkovska, J.J. Lee / Journal of English for Academic Purposes 30 (2017) 38e52

more manageable to classify and check the context in which they appear (Chen & Baker, 2010); and they are much more
common than five-word strings in discourse (Cortes, 2004; Hyland, 2008a).
To identify four-word bundles, we used common criteria: frequency and dispersion (Biber et al., 2004; Biber & Barbieri,
2007; Chen & Baker, 2010; Pan et al., 2016). Typically, normalized frequency in lexical bundle research ranges from 20 to
40 occurrences per million words (Biber et al., 2004; Chen & Baker, 2010). However, previous research has shown that
many bundles appear over 100 (Cortes, 2004) and even over 200 times per million words (Pan et al., 2016). In this study,
we set the stricter normalized cut-off frequency at 40 per million words, which is considered the standard in recent
lexical bundle research (Pan et al., 2016), for both corpora in order to compare the bundles in these two corpora of different
sizes.
Another important criterion in identifying lexical bundles is dispersion. This criterion is used as a safeguard against
idiosyncratic uses from individual writers (Biber et al., 2004; Pan et al., 2016). Although a few studies have set the dispersion

threshold to three texts (e.g., Chen & Baker, 2010; Adel & Erman, 2012), most studies have established the requirement of
bundles to occur in at least five texts (Cortes, 2004, 2008, 2013; Biber et al., 1999, 2004; Pan et al., 2016). We set the minimum
threshold requirement to at least five different texts. This threshold is suggested as optimal for a 200,000-word corpus (Biber
& Barbieri, 2007), which is applicable for MICUSP, and we also used the same strict threshold for COLTE, despite it including
fewer than 200,000 words.
Using the concordance tool Antconc (Anthony, 2014), specifically the clusters/n-gram function, potential four-word
bundles in both corpora were searched. Upon retrieving these bundles, we identified several context-dependent (e.g., in
the United States) or topic-specific (e.g., a lot of money) bundles. Because of their dependency on context and topic, we
excluded them from the analysis (Chen & Baker, 2010; Huang, 2015). We also manually checked for bundle overlaps
that can inflate the number of types and tokens. Following Chen and Baker (2010), we combined two bundle types if
they fell into one of two categories. The first type is “complete overlap,” where two four-word bundles are in fact one
five-word bundle. The bundles they do not want and do not want to both occur five times, deriving from the five-word
sequence they do not want to. This was the only instance of complete overlap found in the two corpora. The second
category, “complete subsumption,” was identified three times in MICUSP and twice in COLTE. This type of overlap
occurs when two (or more) bundles overlap and one is subsumed within the other. For example, in COLTE, in conclusion
the research occurs seven times and conclusion the research presented appears five times. Concordances show that all
five instances of conclusion the research presented follows in, thus creating in conclusion the research presented. To avoid
inflating quantitative results, the lower frequency bundle was combined into the higher frequency one: in conclusion the
research (presented). Frequencies of the resulting bundles in both corpora were then normalized to occurrences per
million words (pmw).

3.3. Lexical bundle analysis

All identified bundles were analyzed according to Biber and colleagues’ (Biber et al., 1999, 2004) structural and functional
taxonomy. The three broad structural categories include NP-based, PP-based, and VP-based bundles. NP-based bundles
include noun phrases with of-phrase (e.g., the end of the) or post-modifier fragment (e.g. the way in which). PP-based bundles

Table 1
Description of the two corpora.

MICUSP COLTE
Number of texts 101 105
Number of words 220,233 105,043
Ave. length 2181 (SD ¼ 913) 1004 (SD ¼ 151)
Range 602e4843 513e1297

Note: Ave. length ¼ average essay length; SD ¼ standard deviation; Range ¼ minimum-maximum values.

Table 2
Functional categories of lexical bundles (adapted from Biber et al., 1999).

Categories Subcategories Examples


Stance bundles Epistemic are more likely to, it is clear that
Attitudinal/modality it is important to, it is hard to
Discourse organizers Topic introduction/focus in the first place, first of all the
Topic elaboration/clarification on the other hand, as well as the
Referential bundles Identification/focus one of the most, is one of the
Quantity specification the rest of the, a great deal of
Framing attributes in the form of, in the case of
Time/place/text-deixis/multi-dimensional reference at the same time, the end of the
T. Bychkovska, J.J. Lee / Journal of English for Academic Purposes 30 (2017) 38e52 43

Table 3
Number of types and frequency of lexical bundles in the two corpora.

No. of Types Raw Frequency Normalized Frequency (pmw)


MICUSP 23 337 1530
COLTE 52 404 3846

Note: pmw ¼ per million words.

are comprised of preposition phrases with of-phrase (e.g., in the case of) or other fragment (e.g. at the same time). And VP-
based bundles refer to those multiword sequences with a verb component (e.g. it is clear that, is one of the).
The three broad functional categories include stance bundles, discourse organizers, and referential bundles. Table 2 de-
scribes these functional categories. Stance bundles express writers’ certainty (or uncertainty) or attitude. Attitude bundles can
communicate obligations/directives, judgments on abilities, or desire. Discourse organizers introduce/focus on a topic or
elaborate/clarify a topic. And referential bundles are used to identify an entity or some particularly important attribute of an
entity.
Finally, bundles identified in COLTE were manually checked for misuse. While we acknowledge that it is possible that
bundle misuses may also exist in advanced L1 student writing, no such misused instances were found in the bundles
identified in the MICUSP data used. We, therefore, concentrate exclusively on bundle misuses in L1-Chinese writers’ texts in
order to identify categories of multiword sequences that these groups of learners may need better control over in academic
writing.
Following Huang (2015), bundle misuses were divided into two broad categories: grammatical and functional. These broad
categories, according to Huang, can be further classified as morphological misuse, structural or functional incompleteness,
and situational misuse. First, morphological misuse refers to those bundles containing morphology inaccuracies; for example,
omission of an article (e.g. on other hand) or incorrect preposition (e.g., in the other hand). Using Antconc (Anthony, 2014), we
searched certain words in identified bundles to analyze morphological misuses. For instance, we searched hand with a
wildcard asterisk (i.e., hand*), and then manually examined each instance to identify possible variations of on the other hand
(e.g., on other hand). Such morphological misuses were not included in the total frequency.
Second, an incomplete bundle refers to a phrase that “fails to constitute a larger unit” (Huang, 2015, p. 17):
Text sample 1: With development of solar energy, a large amount of jobs have been created. (COLTE-38)
In this example, a large amount of is morphologically accurate, but it is considered grammatically misused when collocated
with a countable noun such as jobs, as amount is used to reference mass nouns. Such an example is an illustration of an
agreement mistake. Lastly, situational misuse is connected to a larger context (i.e., sentence or discourse level) of the bundle
under analysis, which can either be a functional or grammatical misuse:
Text sample 2: Thus, doctor robots can serve as assistants in hospitals to diagnose and treat patients. That behavior
means this new technology has powerful help doctors diagnose and treat patients when these machines have enough
information in the database. On the other hand, medical robots can cooperate with nurses to complete tasks in hos-
pitals. (COLTE-98)
Text sample 3: According to the article “Principles of Computer Technology” (2014), the editor states that after ten
years of using ENIAC, this single computer have done more arithmetic than …. (COLTE-18)
In text sample 2, the bundle on the other hand is functionally misused to add similar supporting information rather than to
contrast a point. In text sample 3, according to the article is morphologically accurate, but this is an illustration of a gram-
matical misuse in its situational context in which the writer uses a topic-comment construction consisting of a redundant
subject (the editor) referring to the noun phrase (the article) in the fronted adverbial phrase. To identify incomplete bundles
and situational misuses, we examined each multiword sequence manually in its context.
To establish intercoder agreement, each author with the help of a research assistant classified the bundles and reached
approximately 98% agreement for structural types, 84% for functional types, and 90% for the misuses. The remaining dis-
crepancies were discussed until complete agreement was reached.
Statistical analysis was performed using log-likelihood tests. Using Rayson's (n.d.) Log-likelihood calculator,5 token fre-
quencies for each structural and functional category and subcategory in the two corpora were compared to determine
whether the differences in the occurrences were statistically significant. The higher the log-likelihood (LL) value, the more
significant is the difference between the two frequency scores: an LL of 3.84 or higher is significant at p < 0.05; an LL of 6.63 or
higher is significant at p < 0.01; an LL of 10.83 or higher is significant at p < 0.001; and an LL of 15.13 or higher is significant at
p < 0.0001.

5
http://ucrel.lancs.ac.uk/llwizard.html.
44 T. Bychkovska, J.J. Lee / Journal of English for Academic Purposes 30 (2017) 38e52

Table 4
Lexical bundles shared in both corpora.

Bundle MICUSP COLTE

Rank Frequency (pmw) Rank Frequency (pmw)


(at) þ the end of the 1 177 25 67
at the same time 2 118 4 143
on the other hand 3 109 1 267
is one of the 11 59 2 248
that there is a 15 45 49 48
one of the most 17 45 3 181

4. Results and discussion

4.1. Comparison of types and frequency of lexical bundles

Table 3 shows the types and frequency of lexical bundles in the two corpora (Appendix A provides the complete lists of
bundles in both corpora). As the table shows, L1-Chinese writers used 2.3 times more bundle types than their L1 peers, and
the normalized frequency in which they employed these bundles was more than double. These findings support previous
research that found both L2 expert and student writers to use lexical bundles more frequently than their L1 counterparts (e.g.,
Hyland, 2008b; Pan et al., 2016; Wei & Lei, 2011). Through their greater use of bundles, learners may have been attempting to
produce what they viewed as academic-like texts, as using such fixed expressions might have served L2 writers in avoiding
what may be perceived as odd or uncommon academic English expressions.
Relatedly, another possible reason for the higher frequency of bundles in L2 students' texts may be their still developing
register awareness. Previous research has found that a greater number of bundles is employed in conversations than aca-
demic prose (Biber et al., 1999, 2004). In our analysis, we found that COLTE includes several expressions more commonly used
in conversations; for example, is a good choice, a huge amount of, and a lot of time. In fact, 10 bundle types (or 19%) identified
may be characterized as informal, thus suggesting learners’ still emerging understanding of academic register. As Leedham
(2015) observes, Chinese learners tend to use informal language across spoken and written registers due to “a lack of suf-
ficient discrimination between different communicative purposes” (p. 32). Another explanation for quantitative differences
between the two corpora is direct translation from Chinese language and cultural norms. Such expressions, including more
and more people and a lot of people, account for six bundle types in COLTE. A more detailed discussion of this issue is presented
in our analysis of functional types. In total, 15 bundle types consist of informal English expressions or expressions directly
translated from Chinese expressions in COLTE, which may partially explain the greater quantity of bundles in the L1-Chinese
texts. It is also likely that the quantitative differences may be due to the composition of the two corpora. While COLTE is a
much more of homogenous corpus consisting of argumentative essays on general topics in the context of a non-disciplinary
writing course, the MICUSP data used in this study is a heterogeneous corpus comprised of discipline-specific argumentative
texts. As Hyland (2008a, 2008b) has shown, bundles are intimately connected to genres and disciplines.
The comparative analysis revealed a few bundles that both groups shared. As Table 4 shows, only six bundles are present in
both corpora: 12% of bundles types and 25% of tokens. It is, however, important to point out that the top four most frequently

Table 5
Distribution of structural categories in the two corpora.

Categories Subcategories Types Tokens LL

MICUSP COLTE MICUSP (pwm) COLTE (pwm)


NP-based Noun phrase with of-phrase fragment (e.g., the end of the) 5 11 80 (363.25) 88 (837.75) 28.82****
Noun phrase with other post-modifier fragment (e.g., the fact that the) 3 5 37 (168.00) 33 (314.16) 6.65**
Other noun phrase (e.g., more and more people) e 1 e 11 (104.72) e
Total 8 17 117 (531.25) 132 (1256.63) 45.37****
PP-based Prepositional phrase with embedded of-phrase (e.g., in the case of) 5 2 65 (295.14) 15 (142.80) 7.40**
Other prepositional phrase fragment (e.g., at the same time) 5 8 78 (354.17) 88 (837.75) 30.25****
Total 10 10 143 (649.31) 103 (980.55) 9.88**
VP-based Anticipatory it þ verb phrase/adjective phrase (e.g., it is clear that) 2 5 36 (163.46) 32 (304.64) 6.39*
Copula beþnoun phrase/adjective phrase (e.g., is one of the) 1 5 13 (59.02) 50 (476.00) 59.03****
(Verb phrase)þthat-clause fragment (e.g., that there is a) 1 1 10 (45.41) 5 (47.60) 0.01
(Verb/adjective) to-clause fragment (e.g., have enough time to) e 2 e 10 (95.20) e
Pronoun/noun phraseþbe fragment (e.g., there is a huge) e 6 e 35 (333.20) e
Verb phrase with active verb (e.g., save a lot of) e 6 e 38 (361.76) e
Total 4 25 59 (267.89) 170 (1618.40) 168.99****
Others (as well as the) 1 1 18 (81.73) 5 (47.60) 1.26
Total 23 53 337 (1530.20) 410 (3903.16) 161.27****

Note: pmw ¼ per million words; LL ¼ log-likelihood value; * ¼ significant at p < 0.05 level; ** ¼ significant at p < 0.01 level; *** ¼ significant at p < 0.001 level;
**** ¼ significant at p < 0.0001 level.
T. Bychkovska, J.J. Lee / Journal of English for Academic Purposes 30 (2017) 38e52 45

used bundles in COLTE are also present in MICUSP. This finding indicates that these L2 learners were aware of some of the
most typical academic lexical bundles and used them extensively in their writing.

4.2. Comparison of structural types and tokens of lexical bundles

Table 5 presents the distribution of structural categories of bundles used by both groups. Log-likelihood tests comparing
the tokens show that COLTE includes significantly more NP-based and VP-based bundle tokens than MICUSP. For PP-based
bundles, the MICUSP writers used significantly more of the subcategory PP with of-phrase, while the COLTE writers used
significantly more of the other PP fragment.
However, if we compare the percentages of the main structural categories in both corpora (Table 6), a different pattern
emerges. Converging with previous findings (Biber et al., 1999; Pan et al., 2016), the L1-English writers utilized mostly NP- and
PP-based bundles (78.3% of types and 77.1% of tokens). In contrast, the main type in COLTE is VP-based bundles (47.1% of types
and 41.5% of tokens), which has been also identified in previous studies on both L1-Chinese professional and student writers
(Pan et al., 2016; Pang, 2009). As Biber et al. (1999) explain, phrasal bundles consist of mainly NPs and PPs, while clausal
bundles are comprised of VP-based bundles integrating main clauses. Furthermore, previous research shows that English
academic prose consists of mainly phrasal bundles while clausal types are more common in spoken registers (Biber et al.,
1999, 2004; Pan et al., 2016). Despite VP-based bundles being the most frequent in COLTE for both type and token, more
than half of the bundles in the L1-Chinese essays are NP- and PP-based bundles: 51% of types and 57.3% of tokens. These

Table 6
Proportional distribution of main structural categories in the two corpora.

Categories Types (%) Tokens (%)

MICUSP COLTE MICUSP COLTE


NP-based e Phrasal 34.8 32.1 34.7 32.2
PP-based e Phrasal 43.5 18.9 42.4 25.1
VP-based e Clausal 17.4 47.1 17.5 41.5
Others 4.3 1.9 5.4 1.2
Total 100 100 100 100

Table 7
Distribution of functional categories/subcategories in the two corpora.

Categories Subcategories Types Tokens LL

MICUSP COLTE MICUSP (pmw) COLTE (pmw)


Stance bundles Epistemic (e.g., it is clear that) 2 10 29 (131.68) 64 (609.27) 51.87****
Attitudinal/Modality 1 7 17 (77.19) 47 (447.44) 45.41****
Obligatory/directive (e.g., it is important to) 1 3 17 (77.19) 22 (209.44) 9.57**
Ability (e.g., it is hard to) e 3 e 20 (190.40) e
Desire (e.g., they do not want) e 1 e 5 (47.60) e
Total 3 17 46 (208.87) 111 (1056.71) 96.89****

Discourse organizers Topic introduction (e.g., first of all the) e 2 e 12 (114.24) e


Topic elaboration/clarification (e.g., on the other hand) 4 3 61 (276.98) 43 (409.36) 3.74
Total 4 5 61 (276.98) 55 (523.60) 11.41***

Referential bundles Identification/focus (e.g., is one of the) 3 12 33 (149.84) 103 (980.55) 107.86****
Framing attributes (e.g., as a result of) 7 3 92 (417.74) 22 (209.44) 9.65**
Quantity specification (e.g., a great deal of) 3 10 29 (131.69) 69 (656.87) 59.56****
Place/time/text-deixis (e.g., the end of the) 3 6 76 (345.09) 50 (476.00) 3.04
Total 16 31 230 (1044.35) 244 (2322.86) 74.28****

Total 23 53 337 (1530.20) 410 (3903.16) 161.27****

Note: pmw ¼ per million words; LL ¼ log-likelihood value; * ¼ significant at p < 0.05 level; ** ¼ significant at p < 0.01 level; *** ¼ significant at p < 0.001 level;
**** ¼ significant at p < 0.0001 level.

Table 8
Proportional distribution of main functional categories in the two corpora.

Categories Types (%) Tokens (%)

MICUSP COLTE MICUSP COLTE


Stance bundles 13.0 32.1 13.6 27.1
Discourse organizers 17.4 9.4 18.1 13.4
Referential bundles 69.6 58.5 68.3 59.5
Total 100 100 100 100
46 T. Bychkovska, J.J. Lee / Journal of English for Academic Purposes 30 (2017) 38e52

findings are in contrast to Chen and Baker (2010) who found more VP-based types (52.3%) than NP- and PP-based bundles
(47.5%) in L2 learners' writing (no token distribution was presented). At least with regard to lexical bundles, our findings
suggest that the COLTE students’ academic writing proficiency is relatively high, as research shows that both L1 and L2 writers
move from clausal to phrasal structures as their writing proficiency increases (Pan et al., 2016; Staples et al., 2013). At the
same time, with the greater of use of VP-based bundles in their writing, the findings also indicate that these L1-Chinese
students were still in the process of acquiring appropriate academic register.

4.3. Comparison of functional types and tokens of lexical bundles

Table 7 presents a comparison of the distribution of functional categories of lexical bundles in both corpora. Log-likelihood
tests comparing the tokens show that the COLTE writers used stance bundle significantly more frequently than the MICUSP
writers. In fact, the L1-English writers only used epistemic and obligatory/directive bundles. In terms of discourse organizers,
the L1-English writers used no topic introduction bundles, while the L1-Chinese students utilized more topic elaboration/
clarification bundles but not at a significant level. For referential bundles, although the L1-Chinese writers used significantly
more identification/focus and quantity bundles, the L1-English students employed significantly more framing attributes;
there was no statistically significant difference for deictic bundles.
Table 8 presents the distribution of percentages of types and tokens of the main functional categories in both corpora. As
the table shows, the L1-English writers employed nearly twice as many types of discourse organizers than the L1-Chinese
writers. But, in terms of tokens, these bundles are only slightly underrepresented in COLTE. As shown in Table 7, the
COLTE writers used these types more frequently, yet they primarily employed the bundle on the other hand (266.56 pwm), the
only bundle of this category shared by both groups. It is worth mentioning that, together with this expression, the L1-Chinese
writers also utilized on the one hand frequently, but this bundle occurs only in three MICUSP texts, and it is one that has been
found to be highly infrequent in other academic texts (Biber et al., 2004; Cortes, 2004).
Table 8 also shows that both groups used referential bundles the most frequently, supporting previous research on aca-
demic prose (e.g., Biber et al., 2004), yet diverging from Chen and Baker (2010), who found that the L1-Chinese student
writers in their study used referential bundles and discourse organizers about equally. As mentioned above, the COLTE writers
used all subcategories more frequently than the MICUSP writers, except for framing bundles. However, several referential
bundles in COLTE contain vague words such as people: people who do not (47.60 pmw), and people do not have (47.60 pmw);
more and more: more and more people (104.72 pmw), and nowadays more and more (47.60 pmw); and a lot of: a lot of people
(85.70 pmw), (will) save a lot of (57.12 pmw), and a lot of time (47.60 pmw). In addition to differences in stages of students'
academic and linguistic development, the abundance of such bundles in L1-Chinese students’ texts could be attributed to the
influence of Chinese language and culture (Huang, 2015; Pang, 2009). As Pang explains, Chinese is considered a collectivist
culture, where collective thinking is highly valued and preferred. Therefore, this may be one of the reasons why there are
many bundles with people in COLTE. In fact, people is the ninth most frequent word in COLTE. Furthermore, more and more and
a lot of are also considered to be directly translated from Chinese. More and more corresponds to yue lai yue duo (Huang, 2015;
Pang, 2009), and a lot of is equivalent to hen duo (Huang, 2015). Both expressions are extremely frequent in Chinese; similarly,
it is highly represented in COLTE.
Other frequent referential bundles in COLTE are all over the world (85.68 pmw) and according to the article (76.16 pmw).
While not necessarily attributed to direct translation, both bundles have been frequently associated with L1-Chinese student
writers (e.g., Chen & Baker, 2010; Lee & Chen, 2009). One plausible reason for the abundance of all over the world in L1-
Chinese students' writing might be their “general tendency … to be categorical and to over-generalize” (Chen & Baker,
2010, p. 41). Furthermore, the high frequency of according to the article might be due to L1-Chinese student writers’

Table 9
Stance bundles in the two corpora.

MICUSP COLTE
Epistemic it is clear that is a good choice
the fact that the an important role in
is the most important
it is a good
of the most important
the best way to
an effective way to
it is evident that
is an effective way
and it is not
Attitudinal/modality it is important to þ (note that) pay more attention to
a) Obligatory/directive do not need to
b) Ability it is hard to
c) Desire can be used
it is easy to
they do not want to
T. Bychkovska, J.J. Lee / Journal of English for Academic Purposes 30 (2017) 38e52 47

Table 10
Types and categories of misuse in COLTE.

Type Category Subcategory Raw Frequency


Grammatical Article Missing article 25
Misplaced article 6
Misused article 3
Total 34
Preposition Misused preposition 10
Missing preposition 4
Total 14
Noun Agreement (singular/plural) 9
Missing noun 2
Total 11
Verb Verb choice 1
Verb form 1
Missing verb 2
Total 4
Adjective/Adverb Misused quantifiers 10
Missing adverb 3
Total 13
Others Syntax 6
Word order 1
Typo 2
Total 8
Total 85

Functional Inappropriate use 10

“simply trying too hard to sound more formal or professional” (Lee & Chen, 2009, p. 159). Leedham (2015) offers a somewhat
different perspective, however; she suggests that phrases such as these serve as lexical teddy bears for learners (Hasselgren,
1994), as students find comfort in using them across situations.
As Table 8 further shows, COLTE contains proportionally much more stance bundles in both type and token than MICUSP.
In fact, while MICUSP only includes a total of three types and 46 instances (208.87 pmw) of stance bundles, COLTE is
comprised of 17 types and 111 tokens (1056.71 pmw). These findings do not support previous studies that found L1-Chinese
student writers to use a restricted range of stance expressions in comparison to L1-English writers (e.g., Chen & Baker, 2010;
Hyland & Milton, 1997). Unlike these studies, COLTE includes a wider variety of stance bundles to express certainty and doubt
and to mark their attitudes, similar to L1-Chinese professionals (Pan et al., 2016). While COLTE exhibits a greater range of
stance bundles, most of these bundles include highly evaluative adjectives displaying students' personal opinions (e.g., is a
good choice, the best way to, it is easy to), as Table 9 shows. Extensive use of such subjective adjectives may lead readers to
perceive L1-Chinese writers as simply projecting their personal opinions without providing convincing evidence, thus
perhaps “diminish[ing] the author's credibility” as an academic writer (Pan et al., 2016, p. 69).

4.4. Bundle misuses in COLTE

Finally, this section presents the misused bundles in form and function in COLTE. We identified a total of 95 misused
instances. As Table 10 shows, 85 of these are grammatical mistakes and 10 are functionally inappropriate. Dissimilar to Huang
(2015), who found that agreement mistake was the most common grammatical type in Chinese EFL learners' bundles, over
50% of all mistakes in COLTE are with articles and prepositions. These include missing articles (e.g. on other hand), misplaced
articles (the one of most), misused articles (e.g., the large amount of), missing prepositions (e.g., according the article), and
misused prepositions (e.g., in the same time). This may not be surprising since Chinese is an article-less language, and
prepositions have been found to be notoriously challenging for L2 learners to acquire (Pang, 2009). One possible explanation
for the differences in Huang's (2015) and our findings may be due to the fact that we analyzed four-word bundles while Huang

Table 11
Top five most frequently misused bundles in COLTE.

Lexical Bundle Raw Frequency (pmw)


on the other hand 15 (142.80)
a large amount of 10 (95.20)
according to the article 8 (76.16)
is one of the 7 (66.64)
at the same time 7 (66.64)

Note: pmw ¼ per million words; a bundle can be misused according to several
categories simultaneously.
48 T. Bychkovska, J.J. Lee / Journal of English for Academic Purposes 30 (2017) 38e52

examined three-word bundles, where fewer opportunities for committing mistakes may be present. Other grammatical types
include mistakes with nouns (e.g., will save a lot of on the environment), verbs (e.g., put more attention to), adjectives/adverbs
(e.g., over the world), and others (e.g., one the other hand).
In total, 22 bundle types (approximately 42%) in COLTE are to varying degrees misused (see Appendix B for a complete list
of bundle types misused). Table 11 presents the top five most frequently misused bundles in COLTE. As can be seen, three (on
the other hand, is one of the, at the same time) are also among the top five most frequently used bundles in COLTE (see Appendix
A). On the other hand is both the most commonly used and misused bundle in COLTE, and the only one functionally misused.
While five of these are grammatical misuses, the remaining 10 are functional misuses; for example:
Text sample 4: Due to these reasons, they [people who suffer from obesity] are vulnerable in daily life; they are always
excluded and ridiculed by classmates or colleagues; they would gradually become to the people who are uncommu-
nicative, depressive, and dysphoric. On the other hand, in order to protect their self-esteem, they rarely participate in
the social activities (Strauss, 2000). (COLTE-13)
Rather than using the bundle to contrast the previous proposition, it is used as an additive device to further support the
idea that people suffering from obesity have social problems.
The second most frequently misused bundle is a large amount of. While a few misuses have to do with article omission or
inaccurate article usage, seven instances are related to agreement, where the bundle is collocated with countable rather than
mass nouns; for instance:
Text sample 5: Moreover, there is a huge difference between Chinese and American workers' salaries, which makes the
costs of outsourcing labors in China to be relatively low. It results in a large amount of the U.S. citizens lost their jobs.
(COLTE-9)
Since citizens is a plural noun, the writer could have used the bundle a large number of, as this bundle collocates with
countable nouns. Finally, according to the article is the third most frequently misused bundle. Surprisingly, most instances of
this bundle in COLTE are misused. A few misuses are preposition mistakes (e.g., according the article), but the majority relate to
topic-comment structure, where the noun phrase (the article) in the sentence-initial adverbial is repeated as the sentence
subject (the author), as illustrated in the following example:
Text sample 6: According to the article, “Driver override system,” the author claims that the driver override system is
one of the most important and advanced technology amounting to the anti-lock braking system … (COLTE-75)
While this redundancy is considered “stylistically anomalous” in conventional academic writing, Li (2017) reports that this
particular type of construction is a common feature in developing L1-Chinese student academic texts, as it is an acceptable
structure in Chinese. Despite the few instances of functional misuses, the bulk of bundle misuses in COLTE are grammatical,
thus pointing to the challenges L1-Chinese university students continue to face, especially with articles and prepositions,
even at higher levels of English proficiency.

5. Concluding remarks

In this study, we compared the structural and functional patterns of four-word lexical bundles in English argumentative
essays produced by US-based senior-level L1-English writers and L1-Chinese undergraduate students in FYC, as well as
examined commonly misused bundles in L2 writers' texts. Diverging from previous studies comparing L1 and L2 student

writers (Chen & Baker, 2010; Adel & Erman, 2012), we found that the study's L1-Chinese students not only used a broader
range of bundles but also at a higher frequency than the senior-level L1-English university students. Regarding proportional
distribution of the main structural categories, the two groups are similar in their employment of NP-based bundles but differ
in the use of PP- and VP-based bundles. Despite the higher frequency of VP-based bundles in L2 students' texts, their writing is
still slightly more phrasal. Our findings seem to support Biber, Gray, and Poonpon's (2011) proposal that academic writers,
both L1 and L2, move from a clausal to phrasal style of writing as their proficiency develops. At least in terms of structure,
these L1-Chinese students' texts seem to show signs of transitioning from an oral to literate style. In terms of the proportional
distribution of function categories, stance bundles are higher in frequency in both type and token in COLTE, and the token
frequency is higher than MICUSP at a statistically significant level. The high frequency of stance expressions with subjective
adjectives in COLTE might lead readers to perceive these L2 texts as being less objective. Our findings, however, do not support
previous studies comparing L1 and L2 student writing that found structural and functional distributions of bundles to be

similar in L1 and L2 texts (Chen & Baker, 2010; Adel & Erman, 2012).
Differences might be due to both frequency and dispersion thresholds established for identifying bundles. While we set

the cut-off point at 40 pwm across at least five texts (Biber et al., 2004; Pan et al., 2016), Adel and Erman (2012) and Chen and
Baker (2010) set it at 25 pwm across three texts. Furthermore, classification of bundle functions might also explain the
differences. Although we followed Biber et al.’s (2004) taxonomy strictly, both Adel € and Erman (2012) and Chen and Baker
(2010) classified identification/focus bundles as discourse organizers rather than as referential bundles. Furthermore, as
Hyland (2008a, 2008b) has shown, the context of writing (e.g., genres, discipline, author, audience) has a crucial effect on
T. Bychkovska, J.J. Lee / Journal of English for Academic Purposes 30 (2017) 38e52 49

bundle usage. This study compared L1-Chinese argumentative essays written within an FYC course for international students
with essays written by L1-English senior undergraduates for disciplinary courses. It is likely that disciplinary readers' ex-
pectations played a crucial role in shaping the writing, including the use of bundles, of L1 texts selected, while such disci-
plinary expectations may not have influenced the ESL essays.
Exploration of bundle misuses in L1-Chinese students’ writing also revealed interesting insights. Departing from findings
of L1-Chinese EFL undergraduate students (Huang, 2015), our analysis demonstrates that articles and prepositions are the
most commonly misused in bundles produced by US-based L1-Chinese students. Since Chinese is an article-less language,
and prepositions are generally considered a challenging aspect for L2 learners, these findings support patterns of SLA research
(Hinkel, 2004). We also found evidence of register mixing and direct translation from Chinese in COLTE. Many bundle types
used by the L1-Chinese students are informal and clausal, which are characteristic features of conversations (Biber et al.,
1999). Second, a number of bundles are direct translations of commonly used Chinese bundles, which are also regarded as
conversational. These findings suggest that the COLTE writers were still in the process of developing their awareness of
academic register.
In line with Hyland's (2008a) view, the findings reported in this study should not be interpreted necessarily as
“deficiencies in the English by these [L2] writers” (p. 59), as the L1-Chinese student texts were rated highly by their
teachers, similar to those of L1-English senior undergraduates. The corpora used consist of essays constructed in dis-
similar writing context by student writers at different stages of both academic and linguistic development. Therefore,
rather than viewing bundle variation in L1-Chinese student writing as “problematic,” they, as Leedham (2015) suggests,
may reflect “a different way of meeting the challenges of academic writing” (p. 32). Furthermore, evidence suggests that,
as L2 learners gain proficiency and experience with academic discourse, they begin to distinguish between registers and
their use of multiword sequences tends to come closer to the norms of academic writing (Leedham, 2015; Staples et al.,
2013).
The diversity of “accents” L2 students bring into their academic writing should be valued, and evidence suggests that the
attitudes among some faculty in Anglophone universities toward non-standard forms are changing (Jenkins & Wingate, 2015;
Zamel & Spack, 2006). Despite these shifting attitudes, L2 university student writing is still often judged on criteria estab-
lished for L1 students in English-dominant higher education, with many university faculty erroneously viewing lexico-
grammatical mistakes and variation from mainstream norms in L2 student writing as not only deficiencies in language but
also intelligence (Zamel & Spack, 2006). Such variation and mistakes in L2 student writing also have been found to negatively
affect the assessment of their writing in crucial ways (Jenkins & Wingate, 2015). Furthermore, although lexical bundles are
just one dimension of successful academic writing, they are an “important component of fluent linguistic production …. ,
helping to shape text meanings and contributing to our sense of distinctiveness in a register” (Hyland, 2008b, pp. 4e5). It is,
therefore, important for L2 writing instructors to help learners use multiword sequences appropriately and accurately in their
process of developing academic writing competence, especially since such courses as FYC are one of the few opportunities
that L2 undergraduate students receive direct instruction on academic writing without being or feeling stigmatized for
linguistic variations.
We conclude with some implications for ESL composition. Numerous scholars have pointed to the need to integrate the
teaching of lexical bundles in the L2 writing curriculum (e.g., Chen & Baker, 2010; Hyland, 2008a, 2008b). Supporting this
view, the evidence presented in this study shows that L2 writers in FYC are in need of assistance in developing academic
register awareness. One way is to expose learners to bundles typically used in appropriate “target” texts. Rather than texts
produced by experts (e.g., published RAs), we propose that the appropriate target and model for L2 undergraduate students
are student genres produced by high-rated senior undergraduate students such as those included in MICUSP or the British
Academic Written English corpus. These texts could be used to engage ESL FYC learners in “noticing” activities to raise their
awareness of the structures, functions, and contexts of lexical bundles in target student writing (Cortes, 2004). Noticing, for
example, stance bundles in model essays could assist L2 learners in expressing stance toward propositions in a less subjective
and personal manner, thus adhering to academic discourse norms (Pan et al., 2016). Moreover, the misused bundles offer
insights into some of the struggles L2 learners experience in constructing grammatically accurate bundles, particularly those
containing articles and prepositions. Rather than teaching these grammatical categories exclusively through a rule-based
approach, it might be more constructive to teach students structural “frames,” such as in the þ NOUN þ of and the NOUN
þ of the/a, as such frames have been found to be highly productive in academic writing (Chen & Baker, 2010; Pan et al., 2016).
Approaching the teaching of lexical bundles in such ways may lead to not only a reduction in the number of bundle misuses,
but it may also develop learners’ proficiency in using contextually appropriate bundles productively for the purposes of
writing academic texts.

Acknowledgments

We would like to thank the anonymous reviewers for their constructive comments and suggestions and Farzaneh Vahabi
for her assistance with coding the data.
50 T. Bychkovska, J.J. Lee / Journal of English for Academic Purposes 30 (2017) 38e52

Appendix A. Complete list of lexical bundles in MICUSP and COLTE

MICUSP COLTE

Tokens pmw Range Lexical bundle Tokens pmw Range lexical Bundle

39 177.09 22 (at) þ the end of the 28 266.56 22 on the other hand


26 118.06 20 at the same time 26 247.52 18 is one of the
24 108.98 20 on the other hand 19 180.88 16 one of the most
21 95.35 17 as a result of þ (the) 15 142.80 13 at the same time
19 86.27 16 it is clear that 11 104.72 10 more and more people
18 81.73 15 as well as the 10 95.20 8 a large amount of
17 77.19 12 it is important to þ (note that) 10 95.20 8 it is hard to
17 77.19 12 the way in which 10 95.20 8 on the one hand
13 59.03 10 in the case of 10 95.20 8 there are three main
13 59.03 9 in the form of 9 85.68 6 a lot of people
13 59.03 9 is one of the 9 85.68 9 all over the world
11 49.95 10 the beginning of the 9 85.68 9 as a result the
10 45.41 10 the fact that the 9 85.68 6 is a good choice
10 45.41 9 the ways in which 9 85.68 6 pay more attention to
10 45.41 8 that there is a 8 76.16 7 according to the article
10 45.41 8 the rest of the 8 76.16 8 an important role in
10 45.41 8 one of the most 8 76.16 8 do not need to
10 45.41 7 a great deal of 8 76.16 8 with the development of
10 45.41 6 in an attempt to 7 66.64 6 a huge amount of
9 40.87 8 for the rest of 7 66.64 7 in conclusion the research (presented)
9 40.87 8 in addition to the 7 66.64 6 is a kind of
9 40.87 8 on the basis of 7 66.64 5 is the most important
9 40.87 6 in a way that 7 66.64 6 it is a good
7 66.64 6 of the most important
7 66.64 6 the end of the
6 57.12 6 a huge number of
6 57.12 5 one of the best
6 57.12 6 (will) save a lot of
6 57.12 5 the best way to
5 47.60 5 a lot of time
5 47.60 5 an effective way to
5 47.60 5 and it is not
5 47.60 5 and they do not
5 47.60 5 because they do not
5 47.60 5 can be used in
5 47.60 5 do not have enough
5 47.60 5 do not have the
5 47.60 5 they do not want to
5 47.60 5 first of all the
5 47.60 5 for example in the
5 47.60 5 have enough time to
5 47.60 5 is an effective way
5 47.60 5 is not the only
5 47.60 5 it is easy to
5 47.60 5 it is evident that
5 47.60 5 nowadays more and more
5 47.60 5 people do not have
5 47.60 5 people who do not
5 47.60 5 that there is a
5 47.60 5 the development of the
5 47.60 5 there is a huge
5 47.60 5 they do not have

Note: pmw ¼ per million words.

Appendix B. Misused lexical bundles in COLTE

Lexical bundle No.


on the other hand 15
a large amount of 10
according to the article 8
at the same time 7
is one of the 7
one of the most 6
T. Bychkovska, J.J. Lee / Journal of English for Academic Purposes 30 (2017) 38e52 51

(continued )

Lexical bundle No.

of the most important 5


a huge amount of 3
a huge number of 3
all over the world 3
pay more attention to 3
the end of the 3
with the development of 3
an effective way to 2
is a kind of 2
they do not want to 2
an important role in 1
can be used in 1
do not need to 1
is the most important 1
(will) save a lot of 1
the best way to 1

Note: A lexical bundle can be misused according to several categories at the same time.

References

Adel, A., & Erman, B. (2012). Recurrent word combinations in academic writing by native and non-native speakers of English: A lexical bundles approach.
English for Specific Purposes, 31, 81e92.

Adel, A., & Ro €mer, U. (2012). Research on advanced student writing across disciplines and levels: Introducing the Michigan corpus of upper-level student
papers. International Journal of Corpus Linguistics, 17, 3e34.
Anthony, L. (2014). AntConc (version 3.4.3) [computer software]. Tokyo, Japan: Waseda University. Available from: http://www.laurenceanthony.net.
Aull, L. L., & Lancaster, Z. (2014). Linguistic markers of stance in early and advanced academic writing. A corpus-based comparison. Written Communication,
31, 151e183.
Biber, D., & Barbieri, F. (2007). Lexical bundles in university spoken and written registers. English for Specific Purposes, 26, 263e286.
Biber, D., & Conrad, S. (1999). Lexical bundles in conversation and academic prose. In H. Hasselgård, & S. Oksefjell (Eds.), Out of corpora: Studies in honor of
Stig Johansson (pp. 181e190). Amsterdam: Rodopi.
Biber, D., Conrad, S., & Cortes, V. (2004). If you look at...: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25, 371e405.
Biber, D., Gray, B., & Poonpon, K. (2011). Should we use characteristics of conversation to measure grammatical complexity in L2 writing development?
TESOL Quarterly, 45, 5e35.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). The Longman grammar of spoken and written English. London: Longman.
Chen, Y. H., & Baker, P. (2010). Lexical bundles in L1 and L2 academic writing. Language Learning & Technology, 14, 30e49.
Conrad, S. M., & Biber, D. (2005). The frequency and use of lexical bundles in conversation and academic prose. Lexicographica, 20, 56e71.
Cortes, V. (2002). Lexical bundles in freshman composition. In R. Reppen, S. M. Fitzmaurice, & D. Biber (Eds.), Using corpora to explore linguistic variation (pp.
131e145). Amsterdam: John Benjamins.
Cortes, V. (2004). Lexical bundles in published and student disciplinary writing: Examples from history and biology. English for Specific Purposes, 23,
397e423.
Cortes, V. (2008). A comparative analysis of lexical bundles in academic history writing in English and Spanish. Corpora, 3, 43e57.
Cortes, V. (2013). The purpose of this study is to: Connecting lexical bundles and moves in research article introductions. Journal of English for Academic
Purposes, 12, 33e43.
Ellis, N. (1996). Sequencing in SLA: Phonological memory, chunking, and points of order. Studies in Second Language Acquisition, 18, 91e126.
Hasselgren, A. (1994). Lexical teddy bears and advanced learners: A study into the ways Norwegian students cope with English vocabulary. International
Journal of Applied Linguistics, 4, 237e258.
Heng Hartse, J., & Kubota, R. (2014). Pluralizing English? Variation in high-stakes academic texts and challenges of copyediting. Journal of Second Language
Writing, 24, 71e82.
Hinkel, E. (2003). Simplicity without elegance: Features of sentences in L1 and L2 academic texts. TESOL Quarterly, 37, 275e301.
Hinkel, E. (2004). Teaching academic ESL writing: Practical techniques in vocabulary and grammar. Mahwah, NJ: Lawrence Erlbaum.
Huang, K. (2015). More does not mean better: Frequency and accuracy analysis of lexical bundles in Chinese EFL learners' essay writing. System, 53, 13e23.
Hyland, K. (2008a). As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes, 27, 4e21.
Hyland, K. (2008b). Academic clusters: Text patterning in published and postgraduate writing. International Journal of Applied Linguistics, 18, 41e62.
Hyland, K., & Milton, J. (1997). Qualification and certainty in L1 and L2 students' writing. Journal of Second Language Writing, 6, 183e205.
Institute of International Education (IIE). (2016). Open doors 2016. Retrieved from http://www.iie.org/Research-and-Publications/Open-Doors/Data/
International-Students/Leading-Places-of-Origin/2014-16.
Jenkins, J., & Wingate, U. (2015). Staff and students' perceptions of English language policies and practices in ‘international’ universities: A UK case study.
Higher Education Review, 47, 47e73.
Lee, D. Y. W., & Chen, S. X. (2009). Making a bigger deal of the smaller words. Function words and other key items in research writing by Chinese learners.
Journal of Second Language Writing, 18, 149e165.
Lee, J. J., & Casal, J. E. (2014). Metadiscourse in results and discussion chapters: A cross-linguistic analysis of English and Spanish thesis writers in engi-
neering. System, 46, 39e54.
Lee, J. J., & Deakin, L. (2016). Interactions in L1 and L2 undergraduate student writing: Interactional metadiscourse in successful and less-successful
argumentative essays. Journal of Second Language Writing, 33, 21e34.
Leedham, M. (2015). Chinese students' writing in English: Implications from a corpus-driven study. New York: Routledge.
Li, D. C. S. (2017). Multilingual Hong Kong: Languages, literacies and identities. Berlin: Springer.
MICUSP. (2009). Michigan corpus of upper-level student papers. Ann Arbor, MI: The Regents of the University of Michigan.
Pan, F., Reppen, R., & Biber, D. (2016). Comparing patterns of L1 versus L2 English academic professionals: Lexical bundles in Telecommunications research
journals. Journal of English for Academic Purposes, 21, 60e71.
Pang, P. (2009). A study on the use of four-word lexical bundles in argumentative essays by Chinese English-majors: A comparative study based on WECCL
and LOCNESS. Teaching English in China, 32, 25e45.
Staples, S., Egbert, J., Biber, D., & McClair, A. (2013). Formulaic sequences and EAP writing development: Lexical bundles in the TOEFL iBT writing section.
Journal of English for Academic Purposes, 12, 214e225.
52 T. Bychkovska, J.J. Lee / Journal of English for Academic Purposes 30 (2017) 38e52

Staples, S., & Reppen, R. (2016). Understanding first-year L2 writing: A lexico-grammatical analysis across L1s, genres, and language ratings. Journal of
Second Language Writing, 32, 17e35.
Wei, Y., & Lei, L. (2011). Lexical bundles in academic writing of advanced Chinese EFL learners. RELC Journal, 42, 155e166.
Zamel, V., & Spack, R. (2006). Teaching multilingual learners across the curriculum: Beyond the ESOL classroom and back again. Journal of Basic Writing, 25,
126e152.

Tetyana Bychkovska is the ESL Specialist at George Mason University, who received her M.A. in Applied Linguistics from Ohio University. Her research
interests include second language writing, EAP, and technology in language teaching.

Joseph J. Lee, PhD, is Assistant Director of the ELIP Academic & Global Communication Program in the Department of Linguistics at Ohio University. His
research and teaching interests include ESP/EAP, genre studies, classroom discourse, advanced academic literacy, applied corpus linguistics, and teacher
education. His recent publications include research articles in English for Specific Purposes, Journal of Second Language Writing, and TESL Canada Journal.

You might also like