Professional Documents
Culture Documents
College Writing: A Corpus-Based Analysis
College Writing: A Corpus-Based Analysis
Teodora Popescu
“1 Decembrie 1918” University of Alba Iulia
1. Introduction
Starting with the 70’s, Error Analysis (EA) became a scientific method in its own
right, owing a lot to the research done by Corder (1967), Richards (1971) and Selinker
(1972), who identified different aspects of the second/foreign language learners’ own
language system, which is neither the L1 (mother tongue), nor the L2 (second/foreign
language). The essential shift that their studies brought about in linguistics is a
reassessment of the importance of errors made by ESL/EFL learners. Therefore, according
to Corder (1967), a learner’s errors are not random, but systematic (unsystematic errors
occur in one’s native language) and they are not negative or interfering with learning the
Target Language, but on the contrary, they represent a necessary positive, facilitative
factor, indispensable to the learning process, highly indicative of individual learner
strategies. Further on, Richards (1971) identified three types of errors: a) interference
errors generated by L1 transfer; b) intralingual errors which result from incorrect,
incomplete or overgeneralised) application of language rules; c) developmental errors
caused by the construction of faulty hypotheses in L2.
By the same token, Selinker (1972, and more recently, 1992) elaborated on the
theory of interlanguage, by which we understand a third language, with its own lexicon,
grammar and discourse structure, phonological traits, etc. The basic processes through
which interlanguage is created are: language transfer (negative transfer, positive transfer,
avoidance, and overuse), overgeneralization (at phonetic, grammatical, lexical, discourse
level) and simplification (both syntactic and semantic).
This process-oriented approach to error-analysis (investigation into the reasons why
language errors are made, and learners’ active strategies) has allowed for the adoption of a
learning-based perspective. It follows that teachers now view errors as necessary stages in
all language learning, as the product of intelligent cognitive strategies, hence as potentially
useful indicators of what processes the student is using.
In our endeavour to investigate students’ errors occurring in essay-writing, we first
tried to identify and categorise these mistakes, and further on we attempted to explore the
reasons why they might have come about. In order to ascertain learners’ writing
competence (in L2), we analysed learner errors from a linguistic perspective: (spelling –
partly accounting for phonetic inaccuracies, morphological, syntactic, collocational and
discursive – in terms of non-achieving coherence and cohesion). The approach we adopted
was one provided by electronic tools of concordancing software.
2. Corpus Linguistics
The term corpus, derived from the Latin word for body, was first encountered in the
6th century to refer to a collection of legal texts, Corpus Juris Civilis (Francis 1992: 17).
The term corpus has preserved this initial meaning, i.e. a body of text; nevertheless this
definition is not entirely satisfactory for corpus linguists. According to one of the five
definitions provided by the Oxford English Dictionary, a corpus is ‘the body of written or
spoken material upon which a linguistic analysis is based’. It results that a corpus is not
just a collection of texts; it represents in fact ‘a collection of texts assumed to be
representative of a given language, dialect, or other subset of a language, to be used for
linguistic analysis’ (Francis 1982: 7 apud Francis 1992: 17). Furthermore, Francis (1992)
mentions three main areas in which corpora have traditionally been used: lexicographical
182
TRANSLATION STUDIES: RETROSPECTIVE AND PROSPECTIVE VIEWS. 2nd Edition.
1 – 2 November 2007. GalaĠi, Romania
studies in the creation of dictionaries, dialectological studies and the creation of grammars.
Modern corpus linguists nevertheless are quite different from their early fellows. Kennedy
(1992) underlined the fact that initial corpora were mostly of written texts only, just the
forms were counted, not the meanings and they were untagged, so homonyms were often
classed as one word. Another important reason was that traditionally, linguists had been
strongly influenced by Chomsky’s theory that corpora were inadequate whereas intuition
was. Chomsky contested the concept of empiricism on which corpus linguistics had been
based and offered a rationalist approach instead, supporting a sort of methodology by
which ‘rather than try and account for language observationally, one should try to account
for language introspectively’ (McEnery & Wilson 1996: 6). Chomsky condemned corpus-
based studies asserting that ‘Any natural corpus will be skewed…the corpus, if natural,
will be so wildly skewed that the description would be no more than a mere list’ (Chomsky
1962: 159 apud Leech 1991: 8). This theory is not surprising as long as Chomsky, more
interested in competence than performance, was against an approach that was foremost
based on actual performance data.
Nonetheless corpora research continued in spite of early criticisms, and it even
strengthened due to technological advances in computer software. Now it is possible to
process texts of several million words in length (Sinclair 1991).
Nelson (2000) pointed out that there are several reasons that speak in favour of
using corpora in linguistics analysis: objectivity vs. intuition, verifiability of results
(Svartvik 1992, Biber 1996), broadness of language able to be represented (Svartvik 1992,
Biber 1995, Biber, Conrad & Reppen 1994), access, broad scope of analysis, pedagogic –
face validity, authenticity, motivation (Johns 1988, Tribble & Jones 1990), possibility of
cumulative results (Biber 1995), accountability, reliability, view of all language (Sinclair
2000).
3. Methodology
The learner essay-writing corpus (LEWC) was created by collecting the essays
written by 30 Romanian-speaking university students of economics in their 2nd year at an
intermediate level of language learning. We need to mention that students typed their own
essays, which were subsequently compiled by the teacher. This fact might a priori account
for the automatic correction of some spelling mistakes (made by the word editor students
used), as well as for a limited amount of correction in the case of morphological or
syntactic errors. Results would have undoubtedly been different had we had our students
handwrite their essays. The essays are argumentative, non-technical, including titles such
as ‘Crime does not pay’, ‘Most university degrees are theoretical and do not prepare
students for the real world. They are therefore, of very little value’, ‘Living in a city has
greater advantages than living in a small town or country’, and have an average length of
about 500 words each. The first step was to analyse the corpus as a whole, in order to
identify the most frequent words that were used. The figure below will show a screen shot
of the frequency count of our corpus.
183
Teodora Popescu
College Essay-Writing: A Corpus-Based Analysis
184
TRANSLATION STUDIES: RETROSPECTIVE AND PROSPECTIVE VIEWS. 2nd Edition.
1 – 2 November 2007. GalaĠi, Romania
Lexical – *…where they have a lot of opportunities and where have 147
collocations achieved respect and wealth […]
[…] have earned respect and acquired wealth…
Syntactic * In our country, is still that old system […] 43
In our country there still exists […]
Discursive * Instead of studying math when at a law school it would be 29
better that everyone will have a course that teaches them or
improves their skills when it comes to society and person to
person relationships?
Wouldn’t it be better, if, while a Law student, one could take a
course in personal and social communication skills instead of
studying Math?
Table 1. Classification of student errors in the LEWC
We previously draw attention to the fact that the relatively low number of spelling
mistakes is due to the quasi-satisfactory computer-literacy of our students. There are
specific mistakes that a spell-check tool will automatically correct, while others will be just
underlined. By a right click on the mouse the student can see the alternatives offered by the
computer. Needless to say that the options of the computer are not always the best. There
were nevertheless, spelling mistakes which could not be corrected with the help of the
computer; these are, in general, words that have more than two misspelled letters. E.g. *
[…] the person who died wasn’t crossing the street regulamentary […] (in this case the
student’s mistake is twofold, both intralingual and interlingual; probably knowing that
certain adjectives in English – e.g. customary, preliminary, etc. have the –ary suffix, he/she
added it to the Romanian adjective ‘regulamentar’ – both adj. and adv.; in English
‘lawful/-ly’). Neither could the word ‘consilier’ be corrected. What the computer offered as
solutions were: ‘consoler’, ‘costlier’, ‘consulter’, ‘consular’ and ‘consigliore’. The best
rendition of the idea would be in fact ‘counsellor’. A third example is a rather common
mistake that Romanian students make in English: “Is it worthed to live my life in death
[…]”. Once students learn the passive voice in English, they will develop a strategy of
always using the subject + verb to be + past participle. This overgeneralization works
detrimentally, since ‘worthed’ does not exist in the English language.
The mistakes that students made in terms of morphology, syntax and discourse did
not come necessarily as a surprise, as we knew from the outset that our students’ language
competence was at an intermediate level (similar to B1 of the CEF) and we can’t speak of
proper foreign language writing competence as long as a certain linguistic competence is
achieved. One typical mistake that we recorded was the omission of the subject: “In the
majority of the countries is a mixture […]”, which can be easily ascribed to the fact that
subject-elliptic utterances are quite frequent in Romanian. Another example of bad
morphology/syntax/discourse is the incorrect use of ‘what’ (interrogative pronoun) instead
of ‘that’: *”[…] if you accomplish something what is not palpable[…]”; * “Many
professions what are needed in a society are now missing people to practice them[…]” or *
“In our country are few respectable companies what can give the recompense that top
people need.” In Romanian there is one single word for both functions: ‘ce’.
We noticed a high occurrence of lexical mistakes, especially in the case of
collocational patterns. We considered therefore that it might be useful to delve even deeper
into this type of errors. This time it was relatively easier to identify mistakes with the help
of a concordancer. We started form analysing the most frequent words in their vicinities
and subsequently categorised the mistakes. We will present the sub-classification of
collocational errors in the following table:
Type of error Example No of
occurrences
Noun + Verb * crime always gets paid. 10
…a crime will always be punished
185
Teodora Popescu
College Essay-Writing: A Corpus-Based Analysis
186
TRANSLATION STUDIES: RETROSPECTIVE AND PROSPECTIVE VIEWS. 2nd Edition.
1 – 2 November 2007. GalaĠi, Romania
4. Pedagogical implications
We are aware that the present study does not cover all aspects pertaining to errors
occurring in the LEWC and there still exist numerous facets that deserve a researcher’s and
practitioner’s attention.
What this study has nevertheless shown is that collocational errors were the most
numerous, which entails that the educator should devote substantial time and space to the
teaching of collocations. Despite all criticisms brought to the behaviouristic approach,
which laid particular emphasis on learning by heart, thus shaping appropriate language
behaviours, it is important to teach and learn a large amount of collocations explicitly. In
particular, as can be inferred from the above error analysis samples, it is important to focus
on the larger context in which collocations / lexical items occur.
Notes:
1
The Oxford Genie CD-ROM gives the following definitions for device (Engl.): 1. an object or a piece of equipment that
has been designed to do a particular job 2. a bomb or weapon that will explode 3. a method of doing sth that produces a
particular result or effect 4.a plan or trick that is used to get sth that sb wants. Unfortunately, in LeviĠchi’s DicĠionar
român-englez. Romanian-English Dictionary (1998), the first definition for deviză (Rom.) is device, motto. This was an
important means of identifying yet another type or error source: consulting a faulty dictionary.
Bibliography:
o Biber, D. (1995) Dimensions of Register Variation, Cambridge: Cambridge University Press.
o Biber, D., Conrad, S. & Reppen, R. (1994) ‘Corpus-Based Approaches to Issues in Applied Linguistics,’ Applied
Linguistics, Vol.15, No.2.
o Corder, S.P. (1976) ‘The Significance of Learners’ Errors,’ International Review of Applied Linguistics, 5, 161-170.
o McEnery, T. and Wilson, A. (1996) Corpus Linguistics, Edinburgh: Edinburgh University Press
o Leech, G. (1991) ‘The State-of-the-Art in Corpus Linguistics,’ in Aijmer, K. & Altenberg, B. (eds) English Corpus
Linguistics, Studies in Honour of Jan Svartvik, London and New York: Longman.
o Nelson, M. (2000) A Corpus-Based Study of Business English and Business English Teaching Materials.
[UnpublishedPhD Thesis. Manchester: University of Manchester.] Thesis available at
http://users.utu.fi/micnel/thesis.html
o Popescu, T. (2007) ‘Concordancing for Increased Learner Independence and Translation Skills,’ Proceedings of the
International Conference Foreign Language Competence as an Integral Component of a University Graduate
Profile, University of Defence in Brno, Czech Republic, 221-229.
o Popescu, T. (2007) ‘Teaching Business Collocations,’ in D. Galova (ed), Languages for Specific Purposes:
Searching for Common Solutions, Newcastle: Cambridge Scholars Publishing, 164-174.
o Popescu, T. (2006) ‘Encouraging Learner Independence through Teaching Collocations,’ Messages, Sages and Ages
2, 805-814.
o Richards, J.C. (1971) ‘A Non-Contrastive Approach to Error-Analysis,’ English Language Teaching, 25, 204-219.
o Selinker, L. (1972) Interlanguage. Error Analysis, London: Longman,
o Selinker, L. (1992) Rediscovering Interlanguage, New York: Longman Inc.
o Sinclair, J. (1991) Corpus, Concordance, Collocation, Oxford: Oxford University Press.
o Svartvik, J. (1992) ‘Corpus Linguistics Comes of Age,’ in Svartvik, J. (ed) Directions in Corpus Linguistics,
Proceedings of Nobel Symposium 82 Stockholm, 4-8 August 1991, Berlin, New York: Mouton de Gruyter.
188