Professional Documents
Culture Documents
Bolton Nelson HUNG A Corpus-Based Study of Connectors in Student Writing
Bolton Nelson HUNG A Corpus-Based Study of Connectors in Student Writing
Bolton Nelson HUNG A Corpus-Based Study of Connectors in Student Writing
net/publication/233611461
CITATIONS READS
99 3,777
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Kingsley Bolton on 11 August 2015.
. Introduction
This paper presents research findings from the analysis of corpus linguistic data
generated by the International Corpus of English project in Hong Kong (ICE-
HK).1 The research reported on in this paper focuses on the comparison of
the use of connectors in the writing of university students in Hong Kong and
Britain, and presents results derived from the comparison of data from ICE-
International Journal of Corpus Linguistics : (), ‒.
‒ ⁄ - ‒ © John Benjamins Publishing Company
K. Bolton, G. Nelson, and J. Hung
. Previous research
A substantial amount of previous research has been carried out on the anal-
ysis of patterns of connector (or ‘connective’) usage in student writing. Much
of this research has been evidently motivated by the need to teach English-
language learners in a ‘second-language’ (ESL) or ‘foreign-language’ (EFL) en-
vironment, while other research has also been carried out in the context of
rhetoric and composition teaching programmes in North America. A relatively
early study by Neuner (1987), for example, investigated the use of cohesive de-
vices in ‘good’ and ‘poor’ freshman essays at a US college. While his paper does
not deal exclusively with the use of connectors, Neuner highlights the ways
in which cohesion in essay writing is achieved through a variety of cohesive
devices, including ‘chains’ of reference, conjunctions, and lexical ties (Neuner
1987: 101).
Much of the other research on this topic has been concerned with the writ-
ing of students of English as a second/foreign language. Three recent academic
papers have previously investigated the broad area of interest of this current
study, i.e. the use of connectives in academic writing by Hong Kong Chinese
students, for whom English is generally a language learned/acquired at school.
Crewe (1990: 316–317) sets out to examine ‘the misuse and overuse of logical
connectors’ through the study of the writings of ESL students at The University
of Hong Kong. Crewe notes that, in the Hong Kong context, expressions such
as on the contrary are frequently misused, and argues that such misuse may re-
sult from pedagogic practice in textbooks and teaching that relies on paradig-
matic lists of connectors. As significant, if not more worrying, for Crewe is the
‘overuse’ of connectives, citing one student writer who packs a chain of expres-
Connectors in Hong Kong student writing
Field and Yip (1992) use an experimental approach to study ‘internal conjunc-
tive cohesion’ in the ESL writing of senior secondary/high school students at
Form Six Level in Hong Kong. In this study they compare the use of connec-
tors and other cohesive devices in the essays of three groups of Hong Kong
students (67 students in all) with those used in the essay of ‘L1’ students from
Sydney, Australia (29 students). The working hypothesis for this study was that
‘Cantonese students writing in English use more conjunctive cohesive devices
in the organization of their essays than students at a similar educational level
who are native English speakers’ (Field & Yip 1992: 15). Following Halliday and
Hasan (1976), the authors adopt a four-way classification of cohesive devices in
terms of additive (also, and, furthermore, etc.) adversative (but, however, on the
other hand, etc.), causal (hence, thus, etc.) and temporal categories (next, etc.).
The results of Field and Yip’s analysis again suggest that ‘L2’ writers from
Hong Kong tended to ‘overuse’ such devices compared with the L1 Australian
group, and they comment that:
The high frequency of devices in L2 and even in L1 scripts may be due to the
limited time provided for completion of the task. Content had to be devised
quickly and writers may have relied on organizational devices to shape the
essay rather than a strong development of their thought. The . . . educational
level of the writers, who would have little essay writing experience, may also
account for an overall high use. (Field & Yip 1992: 24)
They then note particular problems in the use of the connectors on the other
hand, moreover, furthermore and besides, among Hong Kong Chinese students.
In the first case, they state that on the other hand is frequently used to make an
additional point, with no indication of an implied contrast, suggesting trans-
K. Bolton, G. Nelson, and J. Hung
The third Hong Kong study of this topic to appear in recent years is that of
Milton and Tsang (1993), who adopt a corpus-based approach to the study of
student writing, drawing on data which at that time formed part of a four-
million-word (now larger) corpus of learner English, the Hong Kong Univer-
sity of Science and Technology (HKUST) corpus of learner English. The par-
ticular subset of data used for their analysis comprised 2,000 assignments writ-
ten by around 800 first-year undergraduates, together with 206 examination
scripts from the composition section of the Hong Kong Examination Author-
ity’s ‘A’ level Use of English examination (the equivalent to a high-school exit
test in North America). Milton and Tsang’s study attempts to compare the use
of connectors among Hong Kong students with that included in three ‘native-
speaker’ corpora, i.e. the Brown Corpus, the London Oslo/Bergen (LOB) Cor-
pus, and another corpus of their own which consists of computer science
textbooks.
Following the categorization of Celce-Murcia and Larsen-Freeman (1983),
Milton and Tsang chose to study the occurrence and distribution of 25 single-
word logical connectors, which they classified as additive (also, moreover, fur-
thermore, besides, actually, alternatively, regarding, similarly, likewise, namely),
adversative (nevertheless, although), causal (because, therefore, consequently),
and sequential (firstly, secondly, previously, afterward(s), eventually, finally, lastly,
Connectors in Hong Kong student writing
anyhow, anyway). On the basis of the comparison of results from the HKUST
corpus and the L1 corpora, the researchers identified 25 connectors which are
regularly overused by Hong Kong students, i.e.
also, moreover, furthermore, besides, regarding, namely, nevertheless,
although, because, therefore, firstly, secondly, lastly
Of these, they calculate that the connectors with the six highest rates of overuse
are lastly (used on average 17.4 times more frequently in the Hong Kong data
compared with that of the L1 corpora), besides (with a ratio of 16.8), moreover
(14.9), secondly (12.9), firstly (12.5), and consequently (11.2). Their analysis of
student difficulties in this aspect of essay-writing suggests that there are two
main problem areas, i.e. redundant use (‘overuse’), and misuse. By ‘redundant
use’ they mean that ‘the logical connector is not necessary; its presence does
not contribute to the coherence of the text’. ‘Misuse’ occurs when ‘the use of
the logical connector is misleading; another cohesive device should have been
used; the logical connector is placed inappropriately . . . [which] is related to
loose organisation and faulty logic within the text’ (Milton & Tsang 1993: 228).
As an example of redundant use, they cite the following example with moreover:
Any animal or insects need to generate their next generation with no excep-
tion. Moreover, the very first step is to date an opposite sex.
(Milton & Tsang 1993: 228)
As an example of misuse, they focus on the use of therefore, which should, they
assert, be used ‘as a causal logical connector . . . where the cause precedes the ef-
fect’. Thus, the following example of misuse is an instance of faulty logic ‘where
therefore is used to force a conclusion from unsupported assumptions’:
In conclusion, beside the methods mentioned above, there are many other
methods of courtship and they are interesting. Therefore, its better for us to
contact more the nature. (Milton & Tsang 1993: 230)
In their conclusion, Milton and Tsang reiterate that, in the writing of Hong
Kong students ‘[t]here is a high ratio of overuse of the entire range of logical
connectors in our students’ writing, in comparison to published English’ al-
though they also concede that distributional patterns may also be affected by
such factors as ‘genre’ and ‘variety’ (Milton and Tsang 1993: 239).
Another very relevant study that employed a corpus-based approach to this
issue is that of Granger and Tyson (1996). In this study, the researchers analysed
data from a large-scale corpus of learner English, the International Corpus of
K. Bolton, G. Nelson, and J. Hung
. Methodological issues
At least three major methodological issues arise from the previous research re-
viewed above. These are (i) the identification of linguistic items as ‘connectors’;
(ii) the calculation of the ‘ratio of occurrence’ (or ‘ratio of frequency’) of logi-
cal connectors in corpus-based studies, and (iii) the measurement of ‘overuse’,
‘underuse’ and ‘misuse’ of connectors using quantitative techniques.
With reference to the first issue, i.e. the identification of linguistic items as
‘connectors’, remarkably, most researchers in previous studies appear to take
the identification of such items as uncontroversial and given. For example,
Field and Yip (1992) base their analysis on Halliday and Hasan’s (1976) classifi-
cation, while Milton and Tsang (1993) adopt a framework from Celce-Murcia
and Larsen-Freeman (1983), and Granger and Tyson (1996) avail themselves
of a list of connectors derived from Quirk et al. (1985). In the course of our
own research, it became clear that such lists of connectors were neither uncon-
troversial nor finite, and we were therefore moved to question a methodology
that relied purely on pre-existing categorizations.
The second issue of the measurement of the ‘ratio of occurrence’ of logi-
cal connectors in corpus-based studies also arose through our reading of the
literature. As indicated above, the Crewe (1990) paper eschews a quantitative
methodology completely, while it was evident that the other studies in this
field showed a distinct mismatch of the analytical units of quantitative anal-
ysis used in corpus-based comparison studies. This is particularly true of the
methods adopted by various researchers to calculate the ‘ratio of occurrence’
of connectors in their linguistic data.
Field and Yip’s (1992) data-analysis relied on, first, a raw frequency count
of the number of ‘conjunctive cohesive devices’ or connectors in terms of in-
stances per L1 (‘English as a first language’) or per L2 (‘English as a second lan-
guage’) group, and, second, the percentage of such connectors across the four
categories of ‘additive’, ‘adversative’, ‘causal’, and ‘temporal’. No ‘ratio of occur-
rence’ facilitating comparison across individuals and groups is noted. In the
case of Milton and Tsang’s (1993) study, the term ‘ratio of occurrence’ is em-
ployed, although this is calculated simply by dividing the number of identified
connectors by the number of words in the corpus. Granger and Tyson (1996)
estimate a raw frequency count of the target connectors in a native-speaker
(L1) and a foreign/second language learner (L2) writing corpus, and then pro-
ceed to calculate a ‘ratio of occurrence’ based on the frequency of occurrence
of connectors per 100,000 words of text.
K. Bolton, G. Nelson, and J. Hung
Tsang’s study compares Hong Kong student academic writing against two gen-
eral corpora, Brown and LOB (containing texts from newspaper, literature,
popular writing, etc.), and against a very narrowly defined corpus of computer
textbooks. Both Field and Yip and Granger and Tyson compare ‘non-native’
student academic writing with ‘native’ student academic writing. In this lat-
ter case, the assumption is that the best ‘target model’ for ‘non-native’ or ESL
students is the writing of other students, those from a ‘native-speaking’ coun-
try (however that is defined). Again, we would challenge that assumption, and
would instead argue that a better set of control data would be provided by a
corpus of published academic writing in English. The target norm in academic
writing, for both ‘native’ and ‘non-native’ students is better defined as aca-
demic writing itself, and the best texts for comparison are clearly those already
published in international English-language academic journals.
In this study, our data consists of 10 untimed essays and 10 timed examina-
tion scripts written by undergraduate Hong Kong students. The data comprises
2755 sentences (46,460 words), and is part of the Hong Kong component of the
International Corpus of English (ICE-HK). In addition, we examine the cor-
responding data from the British component of the International Corpus of
English (ICE-GB), comprising 2471 sentences and 42,587 words.
With reference to the three methodological concerns identified above, i.e.
the identification of linguistic items as ‘connectors’, the measurement of the
‘ratio of occurrence’ of connectors in our data, and the calculation of ‘overuse’
and ‘underuse’ of connectors, a number of measures were adopted in order
to avoid inconsistencies in the research method. First, the list of connectors
we chose to identify and investigate were not derived from pre-existing cat-
egorizations provided by Halliday and Hasan (1976), Quirk et al. (1985), or
similar pedagogic and reference grammars, but devised ourselves by analyz-
ing the subset of academic writing taken from the ICE-GB corpus. This con-
sists of 40 samples, taken from academic papers and books across a range of
disciplines, published between 1990 and 1993 inclusively. It comprises 85,628
words, in 4,507 sentences. Here, our approach was to initially identify the con-
nectors used by text authors in the academic writing component of ICE-GB as
a valid starting point for the analysis which followed. This approach had the
important advantage of giving us a reliable and non-arbitrary list of connec-
K. Bolton, G. Nelson, and J. Hung
tors to form the basis of our study. We found that the use of this more ‘realistic’
list of connectors greatly improves the accuracy of the analysis which followed,
as it was then possible to use this same list as a benchmark when calculating
instances of ‘overuse’ or ‘underuse’.
Table 1 below shows a complete list of the connectors found in the aca-
demic writing data in ICE-GB, together with their raw frequencies, and their
frequencies per sentence (multiplied by 1000). The list contains a total of 54
connectors; we include the complete list here in order to show the wide va-
Table 1. All connectors in the academic writing category, ICE-GB corpus (The figures
in parentheses are raw frequencies. Total number of sentences = 4,507; total number of
words = 85,628)
riety of connectors that are available, although many of them have very low
frequency.
The other major methodological consideration we have is the calculation
of ‘ratio of occurrence’ (termed ‘ratio of frequency’ in this study). As may be
seen from Table 1, the base unit for our analysis is the sentence, for the rea-
sons we explain in the previous section of this paper. Therefore the frequency
of connectors per 100,000 words, as presented by Granger and Tyson, is, we
contend, not an appropriate measure of connector frequency. In all cases, our
frequencies per sentence are multiplied by 1,000, in order to eliminate very
low figures.
The next stage of our analysis was thus to compare the frequencies of these
connectors in the writing of Hong Kong students (from ICE-HK) and in the
writing of British students (from ICE-GB). For ease of comparison, Table 2 also
reproduces the data in Table 1 for academic writing.2
With reference to Table 2 below, we can see that in both of the student
datasets, 19 of the connectors that are used in academic writing have a score of
zero. The following connectors are not used at all by the students:
on the whole, on the one hand, in contrast, in sum, in the event, in total, or,
still
Table 2. (Continued.)
Table 3. The top 10 most overused connectors, with their differences from the aca-
demic norm
Rank Hong Kong Great Britain
1 so (+31.6) however (+20.5)
2 and (+24.0) so (+12.2)
3 also (+15.4) therefore (+8.4)
4 thus (+10.4) thus (+6.8)
5 but (+8.4) furthermore (+5.6)
6 therefore (+8.2) firstly (+4.6)
7 moreover (+7.7) then (+2.7)
7 then (+7.7) also (+2.6)
9 on the other hand (+5.0) though (+2.4)
10 in fact (+4.4) finally, in turn, lastly (+1.4)
Mean difference = +11.8 Mean difference = +6.7
The connectors so and and are particularly overused by the Hong Kong stu-
dents. The British students also overuse so, though the majority of the British
overuse is attributable to the frequency of however. At 40.9 instances per 1,000
K. Bolton, G. Nelson, and J. Hung
sentences, the British students use however about twice as much as academic
writers (20.4). In contrast, the Hong Kong students (23.6) are quite close to the
academic norm in the use of this connector. Both groups of students overuse
therefore and thus. The figures for but, moreover and on the other hand are also
worth noting. These connectors are overused by the Hong Kong students, and
slightly underused by the British students. Instances of connector overuse are
illustrated in the following excerpts from ICE-HK (Excerpt 1) and ICE-GB
(Excerpt 2). Spelling errors are in the originals.
Excerpt 1: Student writing from ICE-HK
(ICE-HK-W1A-014: Timed examination script in Music, by a student in a
Hong Kong university)
<#98:1> So, we can see that the British opera now become more human-
ity, not only reflects the king or Queen.
<#99:1> And, in the hamony, the development is the mostly use with
tonality.
<#100:1> (i.e. with Center key).
<#101:1> Besides also use with the Aeolian Dolian and Phygian mode for
the hamony.
<#102:1> And modulation is also fully used.
<#103:1> However, the techniques of the 20th century such as the atonal-
ity, bitonality, are also used.
<#104:1> And in Britten opera’s one technique he has used is interesting
i.e. reconile the hostile key by enharmonic mean.
<#105:1> In the Peter Grimes, the last Prologue, Peter and Ellen first meet
and sing in the F minor and E major.
<#106:1> However, later, Peter sing in A flat and G sharp and as a result
they sing in the unison.
<#107:1> On the other hand, the using of orchestra is developied.
<#46:2> Therefore on erosion of the rocks after uplift and the various
processes of denudation resulting in the topography, the sill may be ex-
posed as a linear feature.
<#47:2> It may also be exposed as a wide plane.
<#48:2> However this is rare as generally rocks are deformed after depo-
sition and rarely remain horizontal.
<#49:2> Dykes are another igneous intrusion which generally form from
sills.
<#50:2> They are a vertical, discordant feature which cut across the bed-
ding planes.
<#51:2> Therefore on a horizontal plane they are linear features at prac-
tically 90◦ to the surface.
The connectors which have a score of zero in the student datasets are instances
of non-use, rather than underuse (see Table 2). Most of them also have fairly
low frequencies in academic writing. A notable exception to this is on the other
hand, which is not used at all by the British students, but is overused by the
Hong Kong students. In the Hong Kong data, the figures for indeed, conse-
quently, and again indicate some underuse. However, across the entire range
of connectors, the figures for underuse are noticeably lower than those for
overuse. In summary, Table 2 shows considerable levels of overuse and much
smaller levels of underuse, on the part of all the students.
What is also significant is the fact that the results of this present study
confirm a number of findings from the two earlier Hong Kong based studies,
but contradict many more. For example, our results do agree with Field and
Yip’s (1992) finding that on the other hand and besides3 occur only in the Hong
Kong student data, and that moreover is somewhat overused by the Hong Kong
group. The results for Hong Kong student writing indicate a Rf (ratio of fre-
quency) of 10.2 compared to an Rf of 0.4 for the ICE-GB student group, and an
Rf of 2.4 for the ICE-GB academic writing group (see Table 2 above). However,
there is no support for their contention that furthermore is overused by Hong
Kong students in comparison with a ‘native speaker’ group of students. In fact,
in our data, the Rf for this connector is 3.6 occurrences per 1,000 sentences, in
comparison with a figure of 6.1 for the British students in ICE-GB, and a fig-
ure of 0.4 for the ICE-GB academic writing group. More significantly, perhaps,
Field and Yip’s earlier study also fails to clearly profile the high frequencies of
the ‘top five’ connectors used by Hong Kong student writers (Table 3), i.e. so
(Rf, 31.6), and (24.0), also (15.4), thus (10.4), and but (8.4). Our results in Ta-
ble 3 also directly contradict those of Milton and Tsang (1993: 226), whose rank
K. Bolton, G. Nelson, and J. Hung
ordering of overused connectors gives the following result: lastly (1), besides
(2), moreover (3), secondly (4), and firstly (5).
. Conclusion
Notes
. The ICE Hong Kong project has been supported by a grant from the Research Grants
Council of the Hong Kong Special Administrative Region, China (Project No. HKU
7174/000H). The ICE Hong Kong project aims to collect, computerize, and analyze one
million words of spoken and written Hong Kong English from the 1990s. Each word will
be labelled for its wordclass (noun, verb, etc.) and sample speech recordings will be dig-
itized and aligned to the transcriptions. The ICE corpus will be the most comprehensive
database of Hong Kong English ever compiled. This research is being conducted in parallel
Connectors in Hong Kong student writing
with nineteen other national or regional varieties of English from around the world, includ-
ing Australia, Canada, East Africa, Great Britain, Malaysia, New Zealand, South Africa, the
Caribbean, the Philippines, and the United States (Greenbaum 1996: 3–5).
. Unlike the ICE-GB corpus, the Hong Kong corpus has not yet been POS-tagged, so all
results for Hong Kong are based on a manual examination of the data. In order to ensure
consistency with the results from ICE-GB, the following procedures have been followed:
(a) And, but, and or are counted as connectors only when they occur in sentence-initial
position.
(b) In the case of then, we distinguish between adverbial then, which expresses temporal
sequence, as in [1], and connector then, which is used to develop the argument, as in
[2] and [3]:
[1] The simple sugar formed is then fermented by yeast to form alcohol.
[ICE-HK-W1A-016]
[2] According to the above evidence, we find that women usually are under men’s
control in working sphere. Then how about the role taking of women in family?
[ICE-HK-W1A-008]
[3] The result of these injunctions, then, was to promote the constant accumulation of
capital [...] [ICE-HK-W1A-003]
. In the ICE-HK data, on the other hand occurs 20 times (7.2 per 1000 sentences). In each
instance, it occurs without the corresponding connector on the one hand. This confirms Field
and Yip’s (1992) observation that it is misused – and not simply overused – by Hong Kong
students, who use it to add information, without any expresssion of contrast. In the same
data, besides occurs 30 times (10.9 per 1000 sentences), in each case in sentence-initial posi-
tion, and often in paragraph-initial position, again with no apparent expression of contrast.
The following example illustrates the misuse of this connector:
<#92:1> For example, in Britten opera’s A Midsummer Night’s Dream and Tip-
pett’s Midsummer Marriages are the subject from Shakesperian The Midsummer
Night’s Dream.
<#94:1> Besides, Tippett’s The knot Garden also from Shakesperian’s All wells that
end wells </p>
<p> <#95:1> Besides they also deal with the contrast between the collective activities
and the loneliness and misery of discontented individual e.g. Peter Grimes.
[ICE-HK-W1A-014]
References
Celce-Murcia, M., & D. Larsen-Freeman (1983). The grammar book: An ESL/EFL teacher’s
course. Rowley, Mass.: Newbury House.
Crewe, W. (1990). The illogic of logical connectives. ELT Journal, 44 (4), 316–325.
K. Bolton, G. Nelson, and J. Hung
Field, Y., & L. M. O. Yip (1992). A comparison of internal conjunctive cohesion in the
English essay writing of Cantonese speakers and native speakers of English. RELC
Journal, 23 (1), 15–28.
Granger, S., & S. Tyson (1996). Connector usage in the English essay writing of native and
non-native EFL speakers of English. World Englishes, 15 (1), 17–27.
Greenbaum, S. (1996). Comparing English World-Wide. Oxford: Clarendon Press.
Halliday, M. A. K., & R. Hasan (1976). Cohesion in English. London: Longman.
Milton, J., & E. S. C. Tsang (1993). A corpus-based study of logical connectors in EFL
students’ writing: directions for future research. In R. Pemberton & E. S. C. Tsang
(Eds.), Studies in Lexis (pp. 215–246). Hong Kong: The Hong Kong University of
Science and Technology Language Centre.
Neuner, J. L. (1987). Cohesive ties and chains in good and poor freshman essays. Research in
the Teaching of English, 21, 92–103.
Quirk, R., S. Greenbaum, G. Leech, & J. Svartvik (1985). A Comprehensive Grammar of the
English Language. London: Longman.