Paper de La Chinita Coreana

DOI:10.20988/lfp.2019.46..
109
The Usage of Lexical Bundles in Composition

Writings: The Case of Korean Children
Learners of English as a Foreign Language
Hee-Jae Kim
1. Introduction
The Linguistic analysis based on corpus-based language studies

provides accurate and reliable formulaic expressions in the second
language field. The theoretical background is that Wray (2002)
emphasized using fluent language as a native speaker is related to the
proper use of formulaic expressions. Using the proper formulaic
expression means that speakers not only perform natural and idiomatic
English in grammatically correct sentences but also construct lexical
expressions in proper places (Biber et al., 1999).
Choosing native like lexical expression and using them in proper way
like native speakers do is very challenging for second language learners
because L2 learners do not have intuition and enough lexicon lists. In
other words, native speakers take advantage of well-constructed
prefabricated, semi-automatic word chunks, but non-native speakers
have no prior knowledge of such expressions. It causes limited uses of
such lexical sequences for L2 learners.
The previous studies on corpus-based results are on the fact that there
is a considerable difference in frequency and types between the list of
formulaic expressions, or formulaic sequences, or multi-word
expressions, or lexical bundles of L1 and the list of L2 (Qin, 2014). In
this context, the studies on such words sequences expands on the
understanding of lexical bundles used by native speakers to enhance L2

Yonsei University
110 Hee-Jae Kim
learners‟ communication skills and expressive ability in English. Also it

is expected that the lexical bundles retrieved from the authentic data of
the native speakers will provide a stepping stone for the L2 learners to
improve the learning efficiency and to automate the nativelike
expressions. However, most studies on the use of lexical bundles are
finding the differences between native speakers and advanced level of L2
learners. There are almost no studies of formulaic sequence emerging at
the early stages of L2 learning, especially findings the lexical bundles in
the English as a second language child learners‟ output.
The purpose of this study is to analyze the use of native-like lexical
bundles in the writing of L2 learners of Korean children in comparison
with the native English children‟s wiring corpus. In order to analyze the
lexical bundles between English native writings and learners‟ writing,
three corpora are used. The target corpus was made up of four 8-years
old Korean children‟s English composition. The reference corpora are
Corpus of Contemporary America English (COCA) and the Lancaster
Children‟s Corpus Writing Project (LCCWP). Wild et al. (2013)
emphasized that the vocabulary used in children‟s writing is different
from the vocabulary of adults, so it should be made using the corpus of
children even when creating children‟s dictionary, collocation or idiom
collection. This suggests that the child-centered usage and the
neighboring words are different from adult corpus. Normalization
technique is used to retrieve the meaningful lexical bundles and to
compare different corpus sizes.
The results from the analysis of the frequency and type of lexical
bundles extracted through these corpora are intended to help understand
the language use of children learners.
2. Literature Review
2.1. Lexical Bundles
„Lexical bundle‟ was first used in the study of Biber et al (1999), and
was defined as the type of multi-word chunks that occurs with very high
frequency in any register. Similar to lexical bundles, there are various
The Usage of Lexical Bundles in Composition Writings 111
terms that refer to chunks of word sequences, such as lexical phrase

(Cowis, 1998), formulaic sequences (Wray, 2008), lexical bundles (Biber
et al., 1999), n-grams (Stubbs 2007), cluster (Hyland, 2008), recurrent
word combinations (Altenberg, 1998), multi-word constructions (Lui,
2012). These terms, with similar definitions, are used in different terms
depending on how they are aligned (Hong, 2013).
The term „lexical bundle‟, which consists of 3 to 5 bundles of lexical
items, which appear more frequently and repeatedly in written text and
spoken text than phrasal verbs or idioms (Sinclair, 1991, Biber et.al.
1999; Cortes, 2004; Hyland 2008), is easy to analyze the lexical patterns
in the learner corpus with frequency.
Lexical bundles are relatively easy to extract from a large corpus and
has the advantage of analyzing the repeated word sequences of the
language. To reach this goal, the data needs to have cut-off points to
identify bundles that represents as a typical of the target register.
However, the cut-off point is arbitrary depending on the sizes of corpora
so it could be maximized 40 per 1 million or minimized 2 to 3 for
smaller corpora (Barber & Barbieri 2007, Altenberg 1998).
This study decide to focus on 3-word bundles and cut-off points 7.6
occurrences per million for COCA, cut-off points 2 for the Lancaster
Children‟s Corpus Writing Project (LCCWP) and Korean Children EFL
Learners‟ Corpus (KCELLC).
In order to contrast and compare three different sized corpora,
normalized frequencies were calculated per 1 million words.
2.2. Previous Research on Lexical Bundles
Previous researches on lexical bundles have focused on identifying the

differences in the frequency, structure, and functional characteristics of
lexical bundles according to the genre and register of texts. A study on
the relationship between the speaker‟s ability to use the language and the
frequency of use of the lexical bundles (Ohlrogge, A., 2009; Yoon, 2017)
and the differences between native and non-native speakers (Lee, 2009;
Hong, 2013; Kim, 2013; Hong, 2015; Ha, 2017; Yoon, 2017).
Ohlrogge, A. (2009), who studies the frequency of lexical bundles,
112 Hee-Jae Kim
conducted a EFL proficiency test and collected 170 learner‟s writings.

The collected article classified into beginner, intermediate, and advanced,
and the types and frequency of formalized expressions used in each as
class and analyzed. As a result, advanced learners use various
expressions including idioms and collocations and the number of them
are used less for beginners. The beginners use more often the copied
phrases from the prompts.
Yoon (2017) investigates the relationship between L2 proficiency and
the use of lexical bundles in L2 writing. The analysis of the frequency,
structure, and function of the lexical bundles was conducted by dividing
the composition of college students into the language level. As a result,
the higher the language level, the more numerous and varied lexical
bundle are used. In terms of structure types, the advanced level uses the
noun phrases and the preposition phrases more and the beginner level
uses the verb phrases more.
Most of the domestic studies have compared and analyzed the lexical
bundles used by Korean English learners and the usage structure of
lexical bundles actually used by native English speakers (Lee, 2009,
Hong, 2015, Ha, 2017, Hong, 2013, Kim 2013).
Chen, Y. H., & Baker, P. (2010) focused on the structural features
and functional usage patterns of lexical clusters. The research used a 4-
word lexical bundles that appeared in one corpus of published academic
texts and two corpora of students‟ academic writing of Chinese EFL
learners and English native speakers. The result shows that the native
speaker expert group used a lot of noun phrases, directives, and other
bundles of words. In contrast, two learner groups used more verbal
phrases.
Hong (2015) analyzed Korean learners‟ n-gram use between
intermediate and advanced English learners to find out the different and
similar use of n-gram. The result showed that the native uses NP and PP
more and the non-native groups use CL the most. In addition, both levels
of non-native overuse the NP. The lower level of Korean learners were
found to use the bundle that contains the first person pronoun „I.‟
Lee (2009) analyzed the use patterns of lexical bundles in the spoken
language in non-native speakers and native speakers. The study exams
the use of 4-words formulaic expressions. As a result, the Korean

learners use repeated formulaic expressions. The high frequency of
structural subcategories are clause phrases which is the major
characteristics of spoken discourse. The frequency order of native
speakers is „CL > PP > NP > VP‟. On the other hand, the non-native
ones‟ is „CL > VP > NP > PP.‟ Korean learners showed much higher
frequency of VP than native speakers, and PP showed much lower usage
frequency.
There have been few studies on lexical bundles for children English
learners such as the meaning unit vocabulary learning by Min & Son
(2005) and the extraction of the vocabulary bundle of the elementary
textbook of Hong (2017).
Lenko-Szymanska, A. (2014) is a vocabulary in the writing of
beginner, intermediate and advanced English learners in six countries
(Japan, Israel, Taiwan, Spain, Poland and Austria) built in the
International Corpus of Cross linguistic Interlanguage. As a result,
beginner learners showed that there is a pattern to use VP or CL
vocabulary bundles more frequently than PP vocabulary bundles
regardless of language, and advanced learners confirmed the difference
in the level of using PP vocabulary bundles.
In summary, most studies on vocabulary bundles have been done by
comparing the lexical bundles used by learners with the bundles of
vocabulary used by native speakers, and analyzing and explaining the
characteristics of data. However, there was a tendency for learners'
proficiency to be concentrated on intermediate and advanced. Therefore,
the extraction and structural analysis of the lexical bundles produced by
the child learners in the early stage not only serves as a milestone for the
learning process of the learners as a foreign language, but also for the
instructors to understand the type of lexical bundles to be taught to the
beginner learners. It is necessary to study the characteristics of the
lexical bundles of children's English learners.
This research is designed to represent the use of 3-words lexical
bundles in the writing of beginning level of Korean children EFL
learners. Since the production of EFL learners‟ unit are smaller than the
natives‟ and the compositions are written by children, the analysis was
114 Hee-Jae Kim
narrowed down to 3-word lexical bundles.

This study addresses three research questions.
1. What are the frequency of the 3-word lexical bundles in the writing
of the beginning level of Korean children EFL learners?
2. What are the grammatical types of the 3-word lexical bundles in the
writing of beginning level of Korean children EFL learners?
3. What are the differences in the grammatical types and frequency of
the 3-word lexical bundles between the native children and the
Korean children?
3. Methodology
Since the KCELL corpus contains essays from non-native English

speakers, the data was analyzed for the occurrences of nativelike lexical
bundles. For this reason, 3-word lexical bundles are retrieved rather than
4-word or more bigger lexical bundles. The reason for this is that a
relatively small number if more than 4-word lexical bundles can be
extracted, so that it would be more appropriated to analyzed 3-word
lexical bundles to satisfy the purpose of this study. According to Biber et
al. (1999) reported that 3-word lexical bundles are very frequent because
the 3-word lexical bundles are represent extended collocational
association and also are very frequent compared to the largest sequences.
Since the KCELL corpus and LCCPW corpus contain essays from
non-native English speakers and native children, the data is analyzed for
the occurrences of nativelike lexical bundles. To retrieve nativelike 3-
word lexical bundles from both corpora, the data needs a reference
corpus that is large enough to represent variety range of 3-word lexical
bundles. This reference corpus excludes the non-native idiosyncratic
word sequences out of the corpora and is able to compare the nativelike
3-word lexical bundles from native ones to non-native ones.
3.1. Korean Children EFL Learners’ Corpus
For this study, Korean Children EFL Learners‟ Corpus (KCELLC)

was constructed. The KCELC was a collection of 105 writing materials

written in L2 English by 4 Korean elementary students from 2013 to 2014.
The samples were collected from theme-based English writing class.
Using Macmillan McGraw-Hill's Treasures 1.1 to 2.1 textbooks, the
lessons were taught four times a month. Each unit of the textbook has
one theme and consists of 3 reading sections with fiction and non-fiction,
and 1 writing section. Each unit was taught 4 times in total. The first
three lessons were focused on building background knowledge, and
learning new vocabulary and expressions by reading fiction and non-
fiction. After the previous reading lessons, the class had a writing lesson
associated with the central theme of the unit. The composition lesson
approached writing through various structures such as descriptive writing,
how to write, writing a letter, story writing, and poetry writing.
The table 1 below summarizes the background and total writings of
the beginner learners.
Numbers of
Subject (4) Age Grade Writing time
Writing
Me (female) 8-9 G1-G2 2013-2014 40
Yoon (female) 8-9 G1-G2 2013-2014 33
Song (male) 8-9 G1-G2 2013-2014 18
Simon (male) 8-9 G1-G2 2013-2014 14
Total 105
Table 1. The Background and Total Writing of the KCELL Corpus
Of the total of 111 writing, 105 were selected for the data except for
poems. After building txt. files, I used AntConc 3.4 to examine the
tokens and types of the vocabulary that appeared in the data.
The total tokens in the data were 6,219 and the types were 1,315.
Table 2 shows the tokens, types and the average length of the text in the
105 writings.
Mean text
Data sets Texts St. dev. Tokens Types
length
Me (female) 40 63.08 26.77 2,523 805
Yoon (female) 33 57.45 26.58 1,896 634
116 Hee-Jae Kim
Song (male) 18 42.94 11.72 773 347

Simon (male) 14 73.14 27.07 1,027 445
Total 105 59.15 23.03 6,219 1,315
Table 2. The Tokens, Types and the Length of the text in the KCELLC
The table 2 shows the mean and standard deviation of the text length
for each of the 105 writings. The longest text consists of 148 words, and
the shortest text consists of 17 words. The average length of writing is
59.15 words and the standard deviation is 24.84. To demonstrate the
exact level of this material, I compared the length of the elementary
English learners in each country that Lenko-Szymanska (2014) had been
shown.
Data sets Texts Mean text length St. dev. Tokens

Japan 100 40.23 15.85 4,023
Israel 100 59.13 27.46 5,913
Taiwan 100 42.94 30.81 7,329
Spain 100 41.99 25.28 4,199
Poland 100 46.14 28.18 4,614
Austria 100 66.98 23.14 6,698
Mean 100 49.56 25.12 5,643
Lenko-Szymanska, A. (2014). The acquisition of formulaic language by EFL learners: A cross-
sectional and cross-linguistic perspective. International Journal of Corpus Linguistics,19 (2),225-251.
Table 3. Initial text length for each country used in the study
of Lenko-Szymanska, A. (2014)
Table 3 shows the text length of each country in the study of Lenko-
Szymanska, A. (2014). It shows the average length and number of tokens
of each writing produced by beginning English learners in six countries.
The text of the elementary learners used in this study is 6,219 total
tokens and the texts length is 59.15, which is located between the text
lengths of Israel and Austria.
3.2. Reference Corpus and Comparison Corpus
Since the focus of this study is to identify lexical bundles in Korean

Children English Language Learner Corpus (KCELLC), the reference
corpus and the comparison corpus needed to serve this purpose. A list of
3-word lexical bundles from Corpus of Contemporary America English
(COCA) was provided as reference corpus. For the comparative corpus
collected 3-word lexical bundles by collecting written samples form
native speakers from the Lancaster Corpus of Children's Project Writing
(LCCPW).
3.2.1. 3-word Lexical Bundles in COCA
In order to examine the native patterns of lexical bundles in Korean

Children English Language Learner Corpus, this study need to collect 3-
word lexical bundles from the reference corpus. The Contemporary
America English (COCA) was used as a reference corpus. COCA is
constructed with the 4.25 million words of written texts from native
speakers, so COCA‟s data are appropriate and standardized as a
reference corpus for retrieving the native like 3-word lexical bundles of
KCELLC (Lenko-Szymanska 2014).
A list of 3-word lexical bundles were provided from COCA. The list
of 3-word lexical bundles from 425 million COCA can be used freely for
the language research.
Since the list contains more than 1 million items of varying frequency,
the decision was made to collect the 3-word lexical bundles in the cutting
point 7.6 occurrences per million. Aa a result, the list of 2,789 3-word
lexical bundles was retrieved. Table 4 below shows the number of 3-
word lexical bundle types.
Total of 3 word-lexical bundle

Total
types
100 million words 1,020,008
COCA Free sample
7.6 cutting point 2,798
Table 4. COCA 3-word Lexical Bundles
3.2.2. 3-word Lexical Bundles in LCCPW
For the comparison with the 3-word lexical bundles of KCELLC, this
118 Hee-Jae Kim
study used the writing samples of native children from the Lancaster
Corpus of children‟s Project Writing (LCCPW) and retrieved 3-word
lexical bundles. This writing sample has been compiled from the
Linguistics & Modern English Language Department at Lancaster
University since 1996 with a writing essay of 10 English speakers from 9
to 12 years old. LCCPW contains writing materials produced through a
free topic in a project-oriented writing time called “writing-for-learning.”
Data sets Texts Tokens Types

AK 3 5,267 1,476
D 3 21,934 3,965
ED 3 1,738 609
ER 3 6,377 1,750
JJ 3 1,738 609
KH 3 4,875 1,086
LA 3 7,123 1,731
LK 3 7,931 1,740
NA 1 519 241
NM 2 11,498 2,208
RJ 3 3,862 1,066
RO 3 953 3,891
SJ 2 446 1,049
Total 29 79,570 8,569
Table 5. Types and Tokens in LCCWP
A comparison corpus was constructed using 29 writing materials. The

total tokens are 79,570 and the types are 8,569. Table 5 represents the
information of each data set.
Using AntConc 3.4, the 3-word lexical bundles are retrieved from
LCCPW. The total tokens are 79,500 and the total types are 70,464. The
table 6 shows the total tokens and types of LCCPW.
Number of writing 3-word token 3-word types

29 79,500 70,464
Table 6. Summary statistics in LCCPW corpus
In order to generate the list of the native-like 3-word lexical bundles,

this study re-retrieved the only a group of common in both COCA and
LCCWP. A list of 1,103 3-word lexical bundles was collected. As a
result of normalizing 3-word lexical bundles appearing in 1 million
words, the distribution of the 3-word was 10.79% of the total.
Data set Token Same 3-word Normalized 3- Proportion of

types/token word bundle 3-word
in COCA tokens bundles
LCCWP 79,570 1,103/2,863 107,942 10.79%
Table 7. Normalization of Native like 3-word lexical bundles in LCCPW
3.3. 3-word lexical bundles in KCELL
The KCELL corpus contains 105 writing essays on a range of

narrative and descriptive topics. The data is used to retrieve the 3-word
lexical bundles by AntConc 3.4. Table 8 shows the summary of 3-word
lexical bundles in KCELL corpus.
As shown in Table 8, the statistics of KCELL corpus has 5,996 tokens
and 5,206 types.
Number of writing 3-word token 3-word types

105 5,996 5,206
Table 8. Summary statistics in KCELL corpus
In order to retrieve the list of the native-like 3-word lexical bundles,

the KCELL corpus also has been compared to the high frequency 3-word
lexical bundles from COCA. A list of 116 3-word lexical bundles was
collected. As a result of normalizing 3-word lexical bundles appearing in
100 million words, the distribution of the 3-word was 10.46% of the total
tokens.
Data set token Native-like 3- Normalized 3- Proportion of

word word bundle 3-word
types/token tokens bundles
Korea 6,219 116/217 104,679 10.46%
Table 9. Normalization of Native like 3-word lexical bundles in KCELL corpus
120 Hee-Jae Kim
The 3-word lexical bundles corresponding to the frequency 1 of the

lexical bundles are covered to 67% of the total in the KCELL corpus.
This means that it is difficult to acquire that many of the 3-word lexical
bundles used by the beginner EFL learners. This phenomenon may have
been caused by constraints on teaching situations or writing topics.
However, the Lenko-Szymanska, A. (2014)‟s study also shows that 57%
of the lexical bundles appeared as a frequency of 1, which is a universal
characteristics of the beginner EFL learners.
The proportion of lexical bundles in learners‟ data sets made by the
native benchmark are 10.46% and 10.79% (see Table 7 & 9). According
to the study of the lexical bundles in Biber et al. (1999), the conversation
reveals 18% of and the academic prose has 25% of recurrent 3 words
lexical bundles.
As a result, the learner‟s bundle distribution in this study did not reach
the result of Biber et al. (1999)‟s study. Since the data sets are originally
composed by children and the writing topics are more related to student‟s
personal life and experiences, their outputs may be more likely to
conversation rather than academic writings (Lenko-Szymanska, A, 2014).
4. Result and Discussion
4.1. The frequency of lexical bundles in KCELLC and LCCPW
Table 10 compares the 10 most commonly used 3-word lexical

bundles in order of frequency in KCELLC and LCCPW. Both the
numbers of retrieved nativelike 3-word sequence types and tokens were
recorded for each data set (see Table 7 & 9). The normalized frequencies
were calculated per 1 million words in order to make comparisons
between the data sets of diverse lengths possible.
Bold labeled lexical bundle is expressions that appear in both data sets.
As can be seen in Table 10, only one of the 10 lexical bundles is
consistent in both data.
Bundles in KCELLC Bundles in LCCWP

Normalized Normalized
No. Bundles Bundles
Frq.* Frq.*
1 I like to 87,558 one of the 1,195
2 take care of 46,083 this is a 553
3 I went to 36,866 did you know 553
4 to be a 36,866 there is a 528
5 is a very 32,258 it is a 465
6 why did you 32,258 in the world 340
7 a very good 32,258 some of the 340
8 go to the 27,650 I went to 314
9 I am a 23,041 it has a 289
10 I can see 23,041 you can see 277
*Normalized Frequency
Table 10. Top 10 3-word lexical bundles in KCELLC and LCCWP
The overall distribution lexical bundles show a small difference of

0.33%, but the frequency of data set in KCELLC represents showed that
Korean Children English learners frequently use the same type of lexical
bundles compared to the native children.
The results of this study are consistent with previous studies
(Ohlrogge, A, 2009; Yoon, 2017) in which low level of language
learners use overuse the lexical bundles than native speakers. On the
other hand, the characteristics of learner language in the second language
acquisition research are consistent with the findings that learners use
limited language expressions repeatedly to increase the fluency of
communication.
Another possibility of overusing and limited using in this data seem to
have been influenced by the subject of the writing. This may have
affected the frequency of 3-word bundles.
Compared with a bunch of lexical bundles expressed by native
speakers, it is necessary to identify the structure that is used excessively
and the structure to use less, to reduce the repetitive use of the same
expression and challenge the extension of the type of lexical bundles.
Learning a variety of nativelike lexical bundles can also be an essential
pathway for learner‟s intermediate language level to reach a native
speaker level goal.
122 Hee-Jae Kim
4.2. The structure patterns of lexical bundles in KCELLC and

LCCPW
Since lexical bundles are not perfect grammatical units, but have
grammatical correlations (Biber et al., 1999), the final selected 3-word
lexical bundles were analyzed based on the Biber et al. (1999), Cortes
(2008), and Lee (2009)‟s structural classification. The final 3-word
lexical bundles were categorized based the first word of the bundles and
then grouped them into the grammatical structures like noun phrase (NP-
based), preposition phrase (PP-based), and verb phrase (VP-based).
As seen in <Picture 1>, the most frequently used lexical bundles is the
verb phrase in both the KCELLC (63%) and LCCPW (37%). Frequently
used verb phrases in KCELLC data are „I like to,‟ „take care of,‟ „I went
to,‟ „is a very,‟ „go to the,‟ and „I can see.‟ 7 of the top ten 3-word lexical
bundles lists of KCELLC belong to the verb phrase (See Table 10). And
KCELLC shows a sharp discrepancy of using VP compared to LCCPW.
Almost every low level of Korean English learners' group shows the
high frequency of verb phrases (Ohlrogge, A., 2009; Lee, 2009; Hong,
2015; Yoon, 2017). Hong (2015) pointed out that the low level of
English learners use the first person pronoun „I‟ in the bundles.
Picture 1. Structural distribution of lexical bundles in KCELL and LCCWP I

The order of the most frequent occurrence is VP > CL > PP > NP in

KCELL. As mentioned above, the VP accounted for 63% of the total,
followed by the CL with 14%, the PP 12%, and the last of NP is 11%.
As with the beginner learners analyzed in the study by Lenko-
Szymanska, A. (2014), the most of the VP bundles were used the most,
and the PP and the NP were less retrieved.
In LCCPW the order of the most frequent occurrence is VP > PP > NP
> CL. It consists of 37% of VP, 25% of the PP, 22% of the NP, and 16%
of CL. Although the LCCPW corpus is a native corpus, it is a result of a
structural frequency different from other native corpus. As Wild et al
(2013) mentioned, the vocabulary used in children‟s writings is different
from the vocabulary of adults. This suggests that the child-centered
usage and the neighboring words are different from adult corpus.
4.3. Structure list in KCELL and LCCPW
<Table 11> is a chart that summarized by subdividing each structure

according to the classification of Biber et. al (2004).
The use of the lexical bundles is often used as a limited expression for
beginner of non-native language learners. Some of the emerging bundles
seem to have been influenced by the subject of the writing. This may
have affected the frequency and the results of <Table 11>.
Structure KCELL LCCWP Total

Verb phrase (VP) 63 37 100
1st/2nd+VP 22 10 32
3rd+Vp 3 2 5
DM+VP 11 9 20
non-passive+VP 8 2 10
passive VP 14 11 25
WHq+VP 4 0 4
YesNo+VP 2 1 3
Cluase (CL) 14 16 31
that+CL 4 3 7
to+CL 9 9 18
WH+Cl 2 2 4
if+CL 0 2 2
124 Hee-Jae Kim
Noun phrase (NP) 11 22 33

NP+of-phrase 6 14 20
other+NP 4 1 5
post-modifier 1 6 8
Preposition phrase (PP) 12 25 37
Comparative 1 3 4
preposition 11 22 33
Total 100 (%) 100 (%) 200
Table 11. Structural distribution of lexical bundles in KCELL and LCCWP II (%)
LCCPW KCELL
Frq.* Frq.*
1 one of the 1,195 1 a pair of 23,041
2 some of the 340 2 a lot of 18,433
3 a picture of 277 3 a bit of 13,825
4 a lot of 264 4 a little bit 13,825
5 part of the 264 5 middle of the 13,825
6 the end of 264 6 part of this 13,825
7 end of the 252 7 the middle of 13,825
8 most of the 239
9 the size of 201
10 the side of 176
11 all of the 176
12 member of the 151
13 the back of 151
14 the state of 151
15 a kind of 138
16 one of them 126
17 the front of 126
18 the rest of 113
19 top of the 113
20 a list of 113
12 a pair of 113
22 a series of 113
23 all kinds of 113
24 the age of 101
25 the bottom of 101
Table 12. Noun+of' Noun structure of 3-word lexical bundles in KCELLC
and LCCPW
The purpose of this study is to analyze the structure of lexical bundles

which are not shown in the KCELL data through detailed comparison
with the native children‟s corpus.
The use of PP and NP structure is less used in beginners (Lenko,
2014). Therefore, the study wants to list detailed NP and PP of 3-word
lexical bundles from KCELLC and LCCPW and see the differences in
Table 12.
First of all, there are two N + of N phrases like „a pair of,‟ and „a lot
of‟ that appear in the KCELLC. Compared to the same structure in
LCCPW, the numbers are remarkably different. As shown in Table 12,
of-NP are rarely used by non-native language users.
LCCPW KCELL
Frq.*. Frq.*
1 in the world 340 1 in the beginning 18,433
2 out of the 264 2 in the water 18,433
3 in the air 214 3 to the ground 18,433
4 on the ground 214 4 to the hospital 18,433
5 because of the 214 5 as much as 13,825
6 in the same 214 6 down in the 13,825
7 at the end 189 7 down on the 13,825
8 in the dark 189 8 for the first 13,825
9 on to the 189 9 in many ways 13,825
10 because of their 176 10 in the dark 13,825
11 in the middle 164 11 in the middle 13,825
12 in the north 164 12 in the room 13,825
13 on the way 151 13 in the sky 13,825
14 on top of 151 14 in the south 13,825
15 to the ground 151 15 on the floor 13,825
Table 13. Preposition phrase structure of 3-word lexical bundles in KCELL and
LCCPW
The most frequently used NP in LCCPW is „one of the‟ which is the

highest frequency of the noun phrase in Biber et. al (1999)‟s study. In
addition, it is not in the list of <Table 12>, but it can be seen that it is
used with various possessive pronouns such as one of his, one of my, one
of our, one of their, and one of these.
126 Hee-Jae Kim
The PP in the entire of LCCPW, native children use various

prepositions such as in, on, at, to, etc. On the other hand, non-native
children used „in‟ preposition phrase the most in the KCELLC data.
5. Conclusion
The present study investigated the use of lexical bundles in the

writings of beginning level of Korean children EFL learners. A EFL
learner corpus by Korean children and a native children's corpus by
Lancaster Children's Corpus for Writing project were collected for the
study. 3-word lexical bundles in the two corpora were identified and
analyzed.
Results showed that the beginner learner writers used much more
lexical bundles and much less types of lexical bundles in their writing
than the native children. The learners used the limited types of lexical
bundles in repeated way. One possible reason may be influenced by the
teaching methods and textbooks of these learners. All four learners wrote
their composition in textbook-based learning and lectures while the
native children wrote their writings in free-theme based.
Structural analysis of the bundle indicated that both the learners and
the natives used verb phrases more than another phrases. However, the
learners used more verb phrases and less noun phrases and preposition
phrases than the natives.
The data of Korean English learner corpus for this study was
concentrated on beginner only. In addition, four learners, not a collection
of diverse learner‟s writing, gathered from one to two years. It is
insufficient data to understand the increase patterns or characteristics of
the vocabulary clusters that appear as the level changes in a small corpus
collected only by beginners of Korean English learners.
However, the purpose of this study was to analyze the patterns of 3-
word bundles in early Korean elementary English learners by comparing
the characteristics of vocabulary clusters of elementary English learners
with those of native English vocabulary there will be.
Also, by extracting a bunch of vocabulary that the child English
learners could not produce enough in the early stage, it was possible to
look up the list of vocabulary lists that beginners learn difficult. This list
of vocabulary bundles not only serves as a milestone for the language
learning process as a foreign language to the learners, but also allows the
instructors to understand the types of vocabulary to be taught to
elementary learners and to increase the degree and frequency of exposure.
It is necessary to study the learners.
128 Hee-Jae Kim
References
Altenberg, B. (1998). On the Phraseology of Spoken English: The evidence of

recurrent word-combinations. na.
Arnaud, Pierre JL, and Sandra J. Savignon. (1997). Rare words, complex lexical units
and the advanced learner. Second language vocabulary acquisition pp. 157-
173.
Assassi, T., & Benyelles, R. (2016). Formulaic Language for Improving
Communicative Competence. Arab World English Journal, 7.
Biber, Douglas, et al. (1999). Longman Grammar of Spoken and Written English.
Vol. 2. Cambridge. MA: MIT Press. 1999.
Biber, D. (2006). University Language: A Corpus-based Study of Spoken and Written
Registers. Amsterdam: John Benjamins.
Biber, D., F. Barbieri. (2007). Lexical Bundles in University Spoken and Written
Registers. English for Specific Purposes 26. pp. 263-286.
Biber, D., Conrad, S., and Cortes, V. (2004). If you look at…lexical bundles in
academic lectures and textbooks. Applied Linguistics 25. pp. 371-405.
Bybee, J. L. (2006). From Usage to Grammar: The mind's response to repetition.
Language 82-4. pp. 711-733.
Chen, Y. H., &Baker, P. (2010). Lexical Bundles in L1 and L2 Academic Writing.
Language Learning & Technology 14-2. pp. 30-49.
Crossley, S., and Salsbury, T. L. (2011). The Development of Lexical Bundle
Accuracy and Production in English Second Language Speakers.
International Review of Applied Linguistics in Language Teaching 49-1. pp.
1-26.
Ellis, N. C., Simpson‐vlach, R., & Maynard, C. (2008). Formulaic Language in
Native and Second Language Speakers: Psycholinguistics, Corpus Linguistics,
and TESOL. Tesol Quarterly 42-3. pp. 375-396.
Erman, B., &Warren, B. (2000). The Idiom Principle and the Open Choice Principle.
Text-Interdisciplinary. Journal for the Study of Discourse 20-1. pp. 29-62.
Ha, Myoungha. (2017). A Corpus-based Lexical Bundle Analysis of Native and Non-
native Writing. Linguistics Studies 44. pp. 245-262.
Hatami, S. (2015). Teaching Formulaic Sequences in the ESL Classroom. TESOL
Journal 6-1. pp. 112-129.
Hong, Shinchul.(2013). An N-gram Analysis of Korean English Learners‟ Writing.
English Linguistics 13-2. pp. 313-336.
Hong, Shinchul.(2015). The Comparison of N-gram Use between Intermediate and
Advanced Korean Learners of English. Journal of Language Sciences 22-1.
pp. 147-170.
Hyland, K. (2008). As Can Be Seen: Lexical bundles and disciplinary variation.
English for specific purposes 27-1. pp. 4-21.
Lee, Eun-Joo. (2009). A Corpus-based Study of the Korean EFL Learners‟ Use of
Formulaic Sequences. Foreign Language Education 16-2. pp. 321-340.

Lenko-Szymanska, A. (2014). The Acquisition of Formulaic Language by EFL
Learners: A cross-sectional and cross-linguistic perspective. International
Journal of Corpus Linguistics 19-2. pp. 225-251.
Lewis. M., (1993) The Lexical Approach. Hove; Language Teaching Publication.
Min, Chan-Kyoo & Son, Eun-Il. (2005). A Study of the Effects of Vocabulary
Instruction Focusing on the Meaningful Chunk for Elementary School
Students. Journal of the Korea English Education Society 4-2. pp. 75-96.
Nattinger and DeCarrico (1992). Lexical Phrase and Language Teaching. Oxford
University Press.
Ohlrogge, A. (2009). Formulaic Expressions in Intermediate EFL Writing
Assessment. Formulaic language 2. pp. 375-386.
Pawley, A., and Syder, F. H. (1983). Two puzzles for Linguistic Theory: Nativelike
selection and nativelike fluency. Language and communication 191, pp. 225.
Qin, J. (2014). Use of Formulaic Bundles by Non-native English Graduated Writers
and Published Authors in Applied Linguistics. System 42. pp. 220-231.
Sinclair, J. (1991). Corpus, Concordance, Collocation. Oxford University Press.
Skehan, Peter. (1998) A Cognitive Approach to Language Learning. Oxford
University Press.
Stubbs, M. (2004). Language Corpora. The handbook of applied linguistics. Malden,
MA: Blackwell.
Swan, K. (2001). Virtual interaction: Design Factors Affecting Student Satisfaction
and Perceived Learning in Asynchronous Online Courses. Distance
Education 22-2. pp. 306-331.
Tremblay, A., Derwing, B., Libben, G., & Westbury, C. (2011). Processing
Aadvantages of Lexical Bundles: evidence from self‐paced reading and
sentence recall tasks. Language Learning 61-2. pp. 569-613.
Wei, Yaoyu, & Lei Lei. (2011). Lexical Bundles in the Academic Writing of
Advanced Chinese EFL Learners. RELC Journal 42-2. pp. 155-166.
Wild, Kate., Adam Kilgarriff, & David Tugwell (2013). The Oxford Children‟s
Corpus: Using a Children‟s Corpus in Lexicography, International Journal of
Lexicography 26-2. pp. 190-218
Wood, David. (2010) Formulaic Language and Second Language Speech Fluency:
Background, Evidence and Classroom Applications. Bloomsbury Publishing.
Wray, A. (2002). Formulaic Language and the Lexicon. Cambridge University Press.
Yoon, Hyunsook. (2017). A Study on the Use of Lexical Bundles in Second
Language Writing at Different Levels of Proficiency. Studies in Foreign
Language Education 31-1. pp. 35-58.
130 Hee-Jae Kim
Hee-Jae Kim
Interdisciplinary Graduate Program in
Linguistics and Informatics.
Institute of Language and Information Studies
Yonsei University
50 Yonsei-ro, Seodaemun-gu
Seoul 120-749, Korea
E-mail: shine5nme@gmail.com
Received: Jan. 13, 2019

Revised: Feb. 21, 2019
Accepted: Feb. 22, 2019
Abstract
영어 작문에 나타나는 어휘 다발의 사용 양상:

한국 아동 영어 학습자 중심으로
김희재
(연세대학교)
본 연구는 한국 어린이 EFL 학습자의 초급 글쓰기에 나타나는 3-단

어 어휘다발의 사용 양상을 조사하였다. 이를 위해 원어민 참조 말뭉치
로 Corpus of Contemporary America English(COCA)와 원어민 비교 말
뭉치는 Lancaster Corpus of Children's Project Writing(LCCPW)에서 제
공하는 샘플을 이용하여 3-단어 어휘다발의 사용 양상을 비교하였다.
결과는 한국 아동 영어 학습자의 3-단어 어휘다발은 원어민 아이들보다
제한된 유형의 어휘다발을 반복적으로 사용하였고 어휘다발의 문법적
구조면에서는 학습자와 원어민 아동 모두 동사구를 다른 어구보다 더
많이 사용함을 나타냈다. 그러나, 학습자는 원어민보다 더 많은 동사구
와 적은 명사구와 전치사구를 사용하였다. 학습자의 동사구에서는 1 인
칭 대명사 ‘I’를 더 사용하는 경향을 보였다.
Keywords: Lexical bundles, Children's corpus, Korean English Learners, Lancaster

Children's Corpus
핵 심 어: 어휘 다발, 아동 말뭉치, 한국인 초급 아동 영어 학습자, 원

어민 아동
132 Hee-Jae Kim
Abstract
The Usage of Lexical Bundles in composition

writings: The case of Korean Children Learners of
English as a Foreign Language
Hee-Jae Kim
(Yonsei University)
The purpose of this study was to investigate the usage patterns of 3 - w

ord lexical bundles in beginner writing of Korean children EFL learners. F
or this purpose, I used Corpus of Contemporary America English (COCA)
as a native speaker reference corpus. Native children's comparison corpus w
as used as samples from the Lancaster Corpus of Children's Project Writing
(LCCPW). The above three corpora were used to compare the usage patte
rns of the 3-word lexical bundles. The results show that the 3 - word lexi
cal bundles of Korean children's English learners repeatedly use a limited n
umber of vocabulary clusters than the native children. In terms of the gram
matical structure of the 3-word lexical bundles, both learners and native sp
eakers used verb phrases more than other phrases. However, learners used
more verbal phrases, fewer noun phrases and prepositional phrases than nati
ve speakers. In the learner‟s verb phrase, the first person pronoun „I‟ tende
d to be used more.
Keywords: Lexical bundles, Children's corpus, Korean English Learners, Lancaster

Children's Corpus
핵 심 어: 어휘 다발, 아동 말뭉치, 한국인 초급 아동 영어 학습자, 원

어민 아동

Paper de La Chinita Coreana

Uploaded by

Copyright:

Available Formats

You might also like

Paper de La Chinita Coreana

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Paper de La Chinita Coreana

Uploaded by

Copyright:

Available Formats

DOI:10.20988/lfp.2019.46..

The Usage of Lexical Bundles in Composition

The Linguistic analysis based on corpus-based language studies

learners‟ communication skills and expressive ability in English. Also it

2.1. Lexical Bundles

terms that refer to chunks of word sequences, such as lexical phrase

2.2. Previous Research on Lexical Bundles

Previous researches on lexical bundles have focused on identifying the

conducted a EFL proficiency test and collected 170 learner‟s writings.

the use of 4-words formulaic expressions. As a result, the Korean

narrowed down to 3-word lexical bundles.

Since the KCELL corpus contains essays from non-native English

3.1. Korean Children EFL Learners’ Corpus

For this study, Korean Children EFL Learners‟ Corpus (KCELLC)

was constructed. The KCELC was a collection of 105 writing materials

Song (male) 18 42.94 11.72 773 347

Data sets Texts Mean text length St. dev. Tokens

3.2. Reference Corpus and Comparison Corpus

Since the focus of this study is to identify lexical bundles in Korean

3.2.1. 3-word Lexical Bundles in COCA

In order to examine the native patterns of lexical bundles in Korean

Total of 3 word-lexical bundle

3.2.2. 3-word Lexical Bundles in LCCPW

Data sets Texts Tokens Types

A comparison corpus was constructed using 29 writing materials. The

Number of writing 3-word token 3-word types

In order to generate the list of the native-like 3-word lexical bundles,

Data set Token Same 3-word Normalized 3- Proportion of

3.3. 3-word lexical bundles in KCELL

The KCELL corpus contains 105 writing essays on a range of

Number of writing 3-word token 3-word types

In order to retrieve the list of the native-like 3-word lexical bundles,

Data set token Native-like 3- Normalized 3- Proportion of

The 3-word lexical bundles corresponding to the frequency 1 of the

4. Result and Discussion

4.1. The frequency of lexical bundles in KCELLC and LCCPW

Table 10 compares the 10 most commonly used 3-word lexical

Bundles in KCELLC Bundles in LCCWP

Table 10. Top 10 3-word lexical bundles in KCELLC and LCCWP

The overall distribution lexical bundles show a small difference of

4.2. The structure patterns of lexical bundles in KCELLC and

Picture 1. Structural distribution of lexical bundles in KCELL and LCCWP I

The order of the most frequent occurrence is VP > CL > PP > NP in

4.3. Structure list in KCELL and LCCPW

<Table 11> is a chart that summarized by subdividing each structure

Structure KCELL LCCWP Total

Noun phrase (NP) 11 22 33

The purpose of this study is to analyze the structure of lexical bundles

The most frequently used NP in LCCPW is „one of the‟ which is the

The PP in the entire of LCCPW, native children use various

The present study investigated the use of lexical bundles in the

Altenberg, B. (1998). On the Phraseology of Spoken English: The evidence of

Formulaic Sequences. Foreign Language Education 16-2. pp. 321-340.

Received: Jan. 13, 2019

영어 작문에 나타나는 어휘 다발의 사용 양상:

본 연구는 한국 어린이 EFL 학습자의 초급 글쓰기에 나타나는 3-단

Keywords: Lexical bundles, Children's corpus, Korean English Learners, Lancaster

핵 심 어: 어휘 다발, 아동 말뭉치, 한국인 초급 아동 영어 학습자, 원

The Usage of Lexical Bundles in composition