Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 13

Corpus

linguistics
Lecture 5
Corpus linguistics

C o rp us L i ng uis t ics - an i nnovati ve and r evol u ti onar y trend i n t he fi el d o f


l i n gu i st i c s

- Or i gi n at e d i n the 19 50 s

C o rp us l in g ui s t ics - uses a l arge co l l ecti ons o f b ot h sp oken and wri tt en


n at u ra l t e xt s ( corp o ra or co rp uses, si ngul a r cor pus) that are sto red on
c omp u t er s.

C o rp us l in g ui s t ics - ex pl o res p att erns of l anguage use; p rovi d es an e xtreme l y


p o we r f u l t o ol f or the anal ysi s o f natu ral l angu age an d can p rovi d e trem en do us
i n si gh t s as to ho w l anguage use var i es i n d i ff erent si tuati o ns, suc h as spo ken
v ers us wri tten , or for ma l in teracti o ns versu s cas ual co nvers ati on
Corpus linguistics

 It is empirical, analysing the actual patterns of use in natural texts.

 It utilizes a large and principled collection of natural texts, known


as a ‘corpus’, as the basis for analysis.

 It makes extensive use of computers for analysis, using both


automatic and interactive techniques.

 It depends on both quantitative and qualitative analytical


techniques

(From Biber, Conrad and Reppen, 1998: 4.)


What is a corpus?

• C o r p u s - l a r g e p r i n c i p l e d c o l l e c t i o n o f n a t u ra l t e x t s

• L a n g u a g e h a s b e e n c o l l e c t e d f r o m n a t u ra l l y o c c u r r i n g s o u r c e s r a t h e r t h a n f r o m
surveys or questionnaires

• I n t h e c a s e o f s p o k e n l a n g u a g e , t h i s m e a n s fi r s t r e c o r d i n g a n d t h e n t r a n s c r i b i n g t h e
speech.

• T h e r e a r e a n u m b e r o f e x i s t i n g c o r p o ra : t h e B r i t i s h N a t i o n a l C o r p u s ( B N C ) , t h e
C o r p u s o f Co n t e m p o r a r y A m e r i c a n E n g l i s h ( C O C A ) , t h e B r ow n C o r p u s , t h e
L a n c a s t e r / O s l o – B e r g e n ( LO B ) C o r p u s a n d t h e H e l s i n k i C o r p u s o f E n g l i s h Te x t s .

B e c a u s e c o r p u s l i n g u i s t i c s u s e s l a r g e c o l l e c t i o n s o f n a t u ra l l y o c c u r r i n g l a n g u a g e ,
t h e u s e o f c o m p u t e r s f o r a n a l y s i s i s i m p e ra t i v e
Types of corpora

➣ mono-lingual versus multi-lingual corpora

➣ special-purpose, domain-specifi c corpora versus general-purpose,


large-scale corpora

➣ spoken language corpora versus collections of written text ➣ ad-


hoc corpus collections versus balanced, representative corpora

➣ raw text versus marked-up documents

➣ unannotated versus annotated corpora

➣ Web as a corpus
General Corpora

➣General Corpora- General corpora, such as (e.g. the Brown


Corpus, the BNC, LOB) aim to represent language in its broadest
sense and to serve as widely available resources for baseline or
comparative studies of general linguistic features. A general
corpus is designed to be balanced and include language samples
from a wide range of registers or genres, including both fi ction
and non-fi ction in all their diversity
Specialised corpora

 S p ec ia l ize d co rp o ra – th os e des igne d with mo re spe c ifi c re s earc h go als


in m in d – may be the mo st c ruc ia l ‘gro wt h are a’ fo r c o rpu s l ingu is tic s

 M ay inc lu de bo t h s po ke n a nd wr itt en co mpo ne nt s, a s d o th e In te rn ati on al


Co rpus o f En glis h (ICE), a c o rpus de sign ed f o r t he s tudy o f n at io na l
var ie t ie s o f En glish , an d t he TO E FL-2 00 0 Sp oken & Writ ten A c ademic
L a ngua ge Co rpu s; c o rpo ra of ne wspape r wri tin g, fi cti o n o r ac ade mic pro se

 T h e M i c h i gan Co rpus o f A c a de mi c S poken En glis h; M IC AS E , teen age


l an g uag e (th e B er gen Co rpus o f Lo ndo n Teen age Langu age; C O LT ), c hil d
l an g uag e (th e C HILDE S da tab ase ), t he la ngua ge of te levis io n (Q uagl io,
2 00 9) a nd c a ll ce nt r e in tera cti o ns (Fri gin al, 2 00 9).

 ‘ le ar n er ’s corp u s ’- t his is a c o rpus th at in c lude s s po ken o r writt en


la ngu age s amples pr odu ce d by no n-n ati ve s pea ker s, t he mo st we ll-kno wn
e x am ple bein g the Int ern at io nal Corpu s of Le arn er Engl is h (ICL E).
What can a corpus tell us?

1. We can get
from a corpus, is
frequency of
occurrence
information.
What can a corpus tell us?

2 . Word l i sts deri ved from corpo ra


ca n be useful for vo cabul ar y
i nstructi on and test devel opment.

3 . C onco rdanci ng packages can


provi de additi onal informatio n
a bo ut l exi cal co -o ccurrence
patterns.

4 . ‘key word i n context ’ (KWIC )


may th en be used to ex pl ore
vari ous uses o r va ri ous senses of
the target word.
What can a corpus tell us?

Through the use of corpus


analyses we can discover
patterns of use that
previously were unnoticed.

Lexical phrases, or lexical


bundles, is another area of
collocational studies that has
come to light through corpus
linguistics.
a tagged corpus

 When a corpus is tagged, each


word in the corpus is given a
g ra m m a t i c a l l a b e l .

 The process of assigning


g ra m m a t i c a l l a b e l s t o w o r d s i s
c o m p l e x . Fo r e x a m p l e , e v e n a
simple word such as can falls
i n t o t w o g ra m m a t i c a l c a t e g o r i e s .
It can be a modal – ‘I can reach
t h e b o o k ’ . O r, i t c a n b e u s e d a s a
n o u n – ‘ P u t t h e p a p e r i n t h e c a n ’.

 Once texts have been tagged it


i s p o s s i b l e t o e x p l o r e a va r i e t y o f
complex linguistic issues.
Why using corpora?

C o rp o ra have b ee n us e d to a d dres s a num be r o f inte re s ti ng is sue s:

1. E xp l o r ing la ngua g e c hang es a cro s s the ce nturie s- Ga in ins ight s int o


c h ange s rela te d t o la ngua ge devel opme nt , bo th in fi r st an d s ec o nd
l ang uage s itu at io ns .

2. To expl o re si mi la rit i es o r d iff e re nce s a c ro ss d iff e re nt na t io na l o r


r e gi o nal var iet ies o f En glish ( Aust rali a n E ngli sh , A me ric an En gli sh ,
B r i ti s h E ngl ish , Indi an En gli sh ) have yie lde d in te res tin g in fo rmat io n abo u t
t he s ys te mati c lin guis tic d iff e ren ce s tha t o cc ur in th es e diff e re nt reg ion al
va rie t ie s o f En glis h.

3. Ca n o f ten p rov id e va l uab le res o urc es fo r te ac hers a nd s tude nts . Fo r


e x am ple , MICA S E, a spe ci alized co rpu s o f s po ken a ca demic l angu age , may
b e u se d t o bet ter pre pare s tu de nt s to me et th e deman ds of s po ken
l ang uage t hat t he y will en co un te r a t un ive rs ity.
How can corpora inform language
teaching?

 Providing a basis for deciding which language features and structures are important
a n d a l s o h o w va r i o u s f e a t u r e s a n d s t r u c t u r e s a r e u s e d .

 D e c i s i o n s c a n n o w b e g r o u n d e d o n a c t u a l p a t t e r n s o f l a n g u a g e u s e i n va r i o u s
situations (such as spoken or written, formal or casual situations).

 Te a c h e r s c a n s h a p e i n s t r u c t i o n b a s e d o n c o r p u s - b a s e d i n f o r m a t i o n . ( e . g . i f t h e f o c u s
of instruction is conversational English, teachers could read corpus investigations on
s p o ke n l a n g u a g e t o d e t e r m i n e w h i c h f e a t u r e s a n d g r a m m a t i c a l s t r u c t u r e s a r e
characteristic of conversational English)

 I f t h e f o c u s o f i n s t r u c t i o n i s a p a r t i c u l a r g ra m m a t i c a l s t r u c t u r e , c o r p u s - b a s e d
studies can provide a picture of the range of use of that particular structure,
i d e n t i f y i n g l e x i c a l a n d p ra g m a t i c c o - o c c u r r e n c e p a t t e r n s a s s o c i a t e d w i t h i t .

 L e a r n e r s c a n b e a c t i v e l y i nv o l v e d i n e x p l o r i n g c o r p o ra ; i f a d e q u a t e f a c i l i t i e s d o
not exist, teachers can bring in printouts or results from corpus searches for use in
the classroom

You might also like