College Writing: A Corpus-Based Analysis

COLLEGE ESSAY-WRITING: A CORPUS-BASED ANALYSIS
Teodora Popescu
“1 Decembrie 1918” University of Alba Iulia
1. Introduction
Starting with the 70’s, Error Analysis (EA) became a scientific method in its own
right, owing a lot to the research done by Corder (1967), Richards (1971) and Selinker
(1972), who identified different aspects of the second/foreign language learners’ own
language system, which is neither the L1 (mother tongue), nor the L2 (second/foreign
language). The essential shift that their studies brought about in linguistics is a
reassessment of the importance of errors made by ESL/EFL learners. Therefore, according
to Corder (1967), a learner’s errors are not random, but systematic (unsystematic errors
occur in one’s native language) and they are not negative or interfering with learning the
Target Language, but on the contrary, they represent a necessary positive, facilitative
factor, indispensable to the learning process, highly indicative of individual learner
strategies. Further on, Richards (1971) identified three types of errors: a) interference
errors generated by L1 transfer; b) intralingual errors which result from incorrect,
incomplete or overgeneralised) application of language rules; c) developmental errors
caused by the construction of faulty hypotheses in L2.
By the same token, Selinker (1972, and more recently, 1992) elaborated on the
theory of interlanguage, by which we understand a third language, with its own lexicon,
grammar and discourse structure, phonological traits, etc. The basic processes through
which interlanguage is created are: language transfer (negative transfer, positive transfer,
avoidance, and overuse), overgeneralization (at phonetic, grammatical, lexical, discourse
level) and simplification (both syntactic and semantic).
This process-oriented approach to error-analysis (investigation into the reasons why
language errors are made, and learners’ active strategies) has allowed for the adoption of a
learning-based perspective. It follows that teachers now view errors as necessary stages in
all language learning, as the product of intelligent cognitive strategies, hence as potentially
useful indicators of what processes the student is using.
In our endeavour to investigate students’ errors occurring in essay-writing, we first
tried to identify and categorise these mistakes, and further on we attempted to explore the
reasons why they might have come about. In order to ascertain learners’ writing
competence (in L2), we analysed learner errors from a linguistic perspective: (spelling –
partly accounting for phonetic inaccuracies, morphological, syntactic, collocational and
discursive – in terms of non-achieving coherence and cohesion). The approach we adopted
was one provided by electronic tools of concordancing software.
2. Corpus Linguistics
The term corpus, derived from the Latin word for body, was first encountered in the
6th century to refer to a collection of legal texts, Corpus Juris Civilis (Francis 1992: 17).
The term corpus has preserved this initial meaning, i.e. a body of text; nevertheless this
definition is not entirely satisfactory for corpus linguists. According to one of the five
definitions provided by the Oxford English Dictionary, a corpus is ‘the body of written or
spoken material upon which a linguistic analysis is based’. It results that a corpus is not
just a collection of texts; it represents in fact ‘a collection of texts assumed to be
representative of a given language, dialect, or other subset of a language, to be used for
linguistic analysis’ (Francis 1982: 7 apud Francis 1992: 17). Furthermore, Francis (1992)
mentions three main areas in which corpora have traditionally been used: lexicographical
182
TRANSLATION STUDIES: RETROSPECTIVE AND PROSPECTIVE VIEWS. 2nd Edition.
1 – 2 November 2007. GalaĠi, Romania
studies in the creation of dictionaries, dialectological studies and the creation of grammars.
Modern corpus linguists nevertheless are quite different from their early fellows. Kennedy
(1992) underlined the fact that initial corpora were mostly of written texts only, just the
forms were counted, not the meanings and they were untagged, so homonyms were often
classed as one word. Another important reason was that traditionally, linguists had been
strongly influenced by Chomsky’s theory that corpora were inadequate whereas intuition
was. Chomsky contested the concept of empiricism on which corpus linguistics had been
based and offered a rationalist approach instead, supporting a sort of methodology by
which ‘rather than try and account for language observationally, one should try to account
for language introspectively’ (McEnery & Wilson 1996: 6). Chomsky condemned corpus-
based studies asserting that ‘Any natural corpus will be skewed…the corpus, if natural,
will be so wildly skewed that the description would be no more than a mere list’ (Chomsky
1962: 159 apud Leech 1991: 8). This theory is not surprising as long as Chomsky, more
interested in competence than performance, was against an approach that was foremost
based on actual performance data.
Nonetheless corpora research continued in spite of early criticisms, and it even
strengthened due to technological advances in computer software. Now it is possible to
process texts of several million words in length (Sinclair 1991).
Nelson (2000) pointed out that there are several reasons that speak in favour of
using corpora in linguistics analysis: objectivity vs. intuition, verifiability of results
(Svartvik 1992, Biber 1996), broadness of language able to be represented (Svartvik 1992,
Biber 1995, Biber, Conrad & Reppen 1994), access, broad scope of analysis, pedagogic –
face validity, authenticity, motivation (Johns 1988, Tribble & Jones 1990), possibility of
cumulative results (Biber 1995), accountability, reliability, view of all language (Sinclair
2000).
3. Methodology
The learner essay-writing corpus (LEWC) was created by collecting the essays
written by 30 Romanian-speaking university students of economics in their 2nd year at an
intermediate level of language learning. We need to mention that students typed their own
essays, which were subsequently compiled by the teacher. This fact might a priori account
for the automatic correction of some spelling mistakes (made by the word editor students
used), as well as for a limited amount of correction in the case of morphological or
syntactic errors. Results would have undoubtedly been different had we had our students
handwrite their essays. The essays are argumentative, non-technical, including titles such
as ‘Crime does not pay’, ‘Most university degrees are theoretical and do not prepare
students for the real world. They are therefore, of very little value’, ‘Living in a city has
greater advantages than living in a small town or country’, and have an average length of
about 500 words each. The first step was to analyse the corpus as a whole, in order to
identify the most frequent words that were used. The figure below will show a screen shot
of the frequency count of our corpus.
183
Teodora Popescu
College Essay-Writing: A Corpus-Based Analysis
Figure 1 Frequency sort for LEWC

As can be seen from the picture above, the most frequent words are articles
(definite and indefinite), prepositions and conjunctions, personal pronouns – mainly 1st
person singular since it was an argumentative essay and students expressed their own
views, as well as the verb to be. The most frequent noun was people, again because an
argumentative essay has to be rather referential and generalising. Next in our count we
have crime, person and life. Person is the singular of people, therefore the same
explanation as above might hold true. Crime was mainly used because most of the essays
were written on the topic ‘Crime does not pay’.
The next step was to highlight the errors that occurred in the corpus and to try to
classify them. As we mentioned in the introduction to this study we considered the errors
from a linguistic point of view: (spelling, morphological, lexical – inappropriate use of
lexis, lexical – collocational, syntactic, and discursive).
Type of error Examples No. of
occurrences
Spelling * […] and you remain like a dad person […] 25
a dead person…
* They kill for example for unpayed debts[…]
unpaid debts
Morphological *Crimes always catches up with the criminal… 54
Crimes and criminals are always discovered…
* so they have to action […]
[…] so they have to act / take action […]
Lexical – * My device 1 in life is… 68
inappropriate use My motto/watchword/slogan in life is […]
of lexis * I think rappers are equal with the criminals…
184
Lexical – *…where they have a lot of opportunities and where have 147
collocations achieved respect and wealth […]
[…] have earned respect and acquired wealth…
Syntactic * In our country, is still that old system […] 43
In our country there still exists […]
Discursive * Instead of studying math when at a law school it would be 29
better that everyone will have a course that teaches them or
improves their skills when it comes to society and person to
person relationships?
Wouldn’t it be better, if, while a Law student, one could take a
course in personal and social communication skills instead of
studying Math?
Table 1. Classification of student errors in the LEWC
We previously draw attention to the fact that the relatively low number of spelling
mistakes is due to the quasi-satisfactory computer-literacy of our students. There are
specific mistakes that a spell-check tool will automatically correct, while others will be just
underlined. By a right click on the mouse the student can see the alternatives offered by the
computer. Needless to say that the options of the computer are not always the best. There
were nevertheless, spelling mistakes which could not be corrected with the help of the
computer; these are, in general, words that have more than two misspelled letters. E.g. *
[…] the person who died wasn’t crossing the street regulamentary […] (in this case the
student’s mistake is twofold, both intralingual and interlingual; probably knowing that
certain adjectives in English – e.g. customary, preliminary, etc. have the –ary suffix, he/she
added it to the Romanian adjective ‘regulamentar’ – both adj. and adv.; in English
‘lawful/-ly’). Neither could the word ‘consilier’ be corrected. What the computer offered as
solutions were: ‘consoler’, ‘costlier’, ‘consulter’, ‘consular’ and ‘consigliore’. The best
rendition of the idea would be in fact ‘counsellor’. A third example is a rather common
mistake that Romanian students make in English: “Is it worthed to live my life in death
[…]”. Once students learn the passive voice in English, they will develop a strategy of
always using the subject + verb to be + past participle. This overgeneralization works
detrimentally, since ‘worthed’ does not exist in the English language.
The mistakes that students made in terms of morphology, syntax and discourse did
not come necessarily as a surprise, as we knew from the outset that our students’ language
competence was at an intermediate level (similar to B1 of the CEF) and we can’t speak of
proper foreign language writing competence as long as a certain linguistic competence is
achieved. One typical mistake that we recorded was the omission of the subject: “In the
majority of the countries is a mixture […]”, which can be easily ascribed to the fact that
subject-elliptic utterances are quite frequent in Romanian. Another example of bad
morphology/syntax/discourse is the incorrect use of ‘what’ (interrogative pronoun) instead
of ‘that’: *”[…] if you accomplish something what is not palpable[…]”; * “Many
professions what are needed in a society are now missing people to practice them[…]” or *
“In our country are few respectable companies what can give the recompense that top
people need.” In Romanian there is one single word for both functions: ‘ce’.
We noticed a high occurrence of lexical mistakes, especially in the case of
collocational patterns. We considered therefore that it might be useful to delve even deeper
into this type of errors. This time it was relatively easier to identify mistakes with the help
of a concordancer. We started form analysing the most frequent words in their vicinities
and subsequently categorised the mistakes. We will present the sub-classification of
collocational errors in the following table:
Type of error Example No of
occurrences
Noun + Verb * crime always gets paid. 10
…a crime will always be punished
185
Teodora Popescu
Verb + Noun * achieve offence 27

commit an offence
* are now missing people
now lack people
Adj + Noun *dense course 22
elaborate course
Adverb + Verb * will suffer very hard 15
will suffer deeply
Verb + * graduate the university 26
Preposition graduate from university
Noun + * experience on computer 18
Preposition experience in/with computers
Preposition + * in university 21
Noun at university
* face to face with a computer
in front of a computer
Multi-word * be indicate that 8
expressions it is advisable to
* it’s the best to
it’s best to
Table 2 Sub-classification of collocational errors
We would like to present in the following some of the most frequent words in the
corpus and the way they collocate.
Some of the incorrect patterns that you can see below are: *Crime does not worth;
*one can make a crime; *crime always gets paid. Nevertheless, the most common verb
used was ‘to commit a crime:’
Fig 2 Crime concordances
186
Fig 3 People concordances

A word with such a high degree of generality was by and large used in a correct
collocational manner, the mistakes being of ‘grammar’ type:
1. *… essentially if people are being paid by the “quantify” of work they do, … - incorrect
use of tense / incorrect use of morphological category;
2. *… I think that the rehabilitate is for people that achieved minor offences like theft –
incorrect use of definite article / incorrect use of morphological category / collocational
error / morphological error;
3. * The people general opinion is that a better live is far away … - incorrect use of definite
article / incorrect use of genitive.
4. Pedagogical implications
We are aware that the present study does not cover all aspects pertaining to errors
occurring in the LEWC and there still exist numerous facets that deserve a researcher’s and
practitioner’s attention.
What this study has nevertheless shown is that collocational errors were the most
numerous, which entails that the educator should devote substantial time and space to the
teaching of collocations. Despite all criticisms brought to the behaviouristic approach,
which laid particular emphasis on learning by heart, thus shaping appropriate language
behaviours, it is important to teach and learn a large amount of collocations explicitly. In
particular, as can be inferred from the above error analysis samples, it is important to focus
on the larger context in which collocations / lexical items occur.
5. Suggestions for further research

Starting from the findings of the research already carried out and in view of further
analyses of students’ writing skills, we will try to focus on specific methods strategies of
teaching essay writing, with particular emphasis on the development of collocational
competence alongside the syntactic and morphological ones.
187
Teodora Popescu
Notes:
1
The Oxford Genie CD-ROM gives the following definitions for device (Engl.): 1. an object or a piece of equipment that
has been designed to do a particular job 2. a bomb or weapon that will explode 3. a method of doing sth that produces a
particular result or effect 4.a plan or trick that is used to get sth that sb wants. Unfortunately, in LeviĠchi’s DicĠionar
român-englez. Romanian-English Dictionary (1998), the first definition for deviză (Rom.) is device, motto. This was an
important means of identifying yet another type or error source: consulting a faulty dictionary.
Bibliography:
o Biber, D. (1995) Dimensions of Register Variation, Cambridge: Cambridge University Press.
o Biber, D., Conrad, S. & Reppen, R. (1994) ‘Corpus-Based Approaches to Issues in Applied Linguistics,’ Applied
Linguistics, Vol.15, No.2.
o Corder, S.P. (1976) ‘The Significance of Learners’ Errors,’ International Review of Applied Linguistics, 5, 161-170.
o McEnery, T. and Wilson, A. (1996) Corpus Linguistics, Edinburgh: Edinburgh University Press
o Leech, G. (1991) ‘The State-of-the-Art in Corpus Linguistics,’ in Aijmer, K. & Altenberg, B. (eds) English Corpus
Linguistics, Studies in Honour of Jan Svartvik, London and New York: Longman.
o Nelson, M. (2000) A Corpus-Based Study of Business English and Business English Teaching Materials.
[UnpublishedPhD Thesis. Manchester: University of Manchester.] Thesis available at
http://users.utu.fi/micnel/thesis.html
o Popescu, T. (2007) ‘Concordancing for Increased Learner Independence and Translation Skills,’ Proceedings of the
International Conference Foreign Language Competence as an Integral Component of a University Graduate
Profile, University of Defence in Brno, Czech Republic, 221-229.
o Popescu, T. (2007) ‘Teaching Business Collocations,’ in D. Galova (ed), Languages for Specific Purposes:
Searching for Common Solutions, Newcastle: Cambridge Scholars Publishing, 164-174.
o Popescu, T. (2006) ‘Encouraging Learner Independence through Teaching Collocations,’ Messages, Sages and Ages
2, 805-814.
o Richards, J.C. (1971) ‘A Non-Contrastive Approach to Error-Analysis,’ English Language Teaching, 25, 204-219.
o Selinker, L. (1972) Interlanguage. Error Analysis, London: Longman,
o Selinker, L. (1992) Rediscovering Interlanguage, New York: Longman Inc.
o Sinclair, J. (1991) Corpus, Concordance, Collocation, Oxford: Oxford University Press.
o Svartvik, J. (1992) ‘Corpus Linguistics Comes of Age,’ in Svartvik, J. (ed) Directions in Corpus Linguistics,
Proceedings of Nobel Symposium 82 Stockholm, 4-8 August 1991, Berlin, New York: Mouton de Gruyter.
188

College Writing: A Corpus-Based Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

College Writing: A Corpus-Based Analysis

Uploaded by

Copyright:

Available Formats

COLLEGE ESSAY-WRITING: A CORPUS-BASED ANALYSIS

Figure 1 Frequency sort for LEWC

Verb + Noun * achieve offence 27

Fig 2 Crime concordances

Fig 3 People concordances

5. Suggestions for further research

You might also like