Collocation

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/288152909

Collocations

Chapter · December 2006


DOI: 10.1016/B0-08-044854-2/00414-4

CITATIONS READS

6 7,736

1 author:

Ramesh Krishnamurthy

160 PUBLICATIONS   627 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

cobuild View project

corpus linguistics View project

All content following this page was uploaded by Ramesh Krishnamurthy on 23 August 2019.

The user has requested enhancement of the downloaded file.


Collocation

Ramesh Krishnamurthy, Aston University, Birmingham, UK

Abstract

J.R. Firth first gave collocation prominence in linguistic theory. Halliday, Sinclair,

Stubbs, and Hoey have all extended Firth’s ideas. Palmer and Hornby recognised the

pedagogical value of collocation, and incorporated it into their early EFL dictionaries.

More recent EFL dictionaries, based on large computerized language corpora, have

used complex software and statistical measures to gain further insights into the way

that collocational patterns are woven into language, and the results are visible in

the dictionary entries of later editions. This has fed back into language pedagogy, and

is also influencing translation and computational research.

Body Text

1. Historical use of the term collocation

The fact that certain words co-occurred frequently was noticed in Biblical

concordances (e.g. Cruden listed the occurrences of dry with ground in 1769). Style

and usage guides in the 19th-20th centuries (e.g. Fowler’s The King’s English)

addressed only the overuse of collocations, labelling them clichés, and criticising their

use, especially by journalists (e.g. Brian O’Nolan, in more humorous vein).

1
2. Collocation in modern Linguistics

In modern linguistics, collocation refers to the fact that certain lexical items tend to

co-occur more frequently in natural language use than syntax and semantics alone

would dictate. Collocation was first given theoretical prominence by J.R. Firth, who

separated it from cognitive and semantic ideas of word-meaning (calling it an

‘abstraction at the syntagmatic level’) and accorded it a distinct status in his account

of the linguistic levels at which meaning can arise. Firth implicitly indicated that

collocation required a quantitative basis, giving actual numbers of co-occurrences in

some texts.

Halliday saw collocation as a cohesive device and identified the need for a measure of

significant proximity between collocating items, and said that collocation could only

be discussed in terms of probability, thus validating the need for quantitative analyses

and the use of statistics. Sinclair performed the first computational investigation of

collocation, comparing written and spoken corpora, identifying 5 words as the span

of significant proximity, and experimenting with statistical measures and

lemmatization.

Halliday and Sinclair thought that collocation could enable a lexical analysis of

language independent of grammar. Sinclair suggested that lexical items could be

defined by their collocational environments, saw collocation as part of the idiom

principle (lexically determined choices), as opposed to the open choice principle

(grammatically determined choices). Leech included ‘collocative’ in his categories of

meaning, but marginalized it as an idiosyncratic property of individual words,

incapable of contributing to generalizations. Sinclair and Stubbs suggest that all

2
lexical items have collocations, Hoey accommodates collocation within a model of

‘lexical priming’, and suggests that most sentences are made up of interlocking

collocations, and can therefore be seen as reproductions of earlier sentences.

3. Collocation and lexicography

The pedagogical value of collocation was recognized by English teachers in the

1930s, and English collocations were described in detail by Harold Palmer in a report

on phraseology research with A.S. Hornby, using the term fairly loosely to cover

longer phrases, proverbs, etc as well as individual word-combinations. They showed a

major interest in the classification of collocations in grammatical and semantic terms,

but also used collocations to indicate the relevant senses of words in wordlists (draw

1. e.g., a picture 2. e.g., a line), and in their dictionary examples (a practice continued

in Hornby’s 1948 OALD and subsequent editions).

Early EFL dictionaries avoided using the term collocation, e.g. OALD 1974 refers to

‘special uses of an adjective with a preposition’ (liable: ~for, be ~ to sth), ‘special

grammatical way in which the headword is used’ (meantime: in the ~). LDOCE 1978

refers to ‘ways in which English words are used together, whether loosely bound or

occurring in fixed phrases’ and ‘special phrases in which a word is usually (or

always) found’, but also has a section headed ‘Collocations’, defined as ‘a group of

words which are often used together to form a natural-sounding combination’ and

states that they are shown in 3 ways: in example sentences, in explanations in Usage

Notes, or in heavy black type inside round brackets if they are very frequent or almost

a fixed phrase (‘but not an idiom’), signalled by ‘in the phr.’ or similar rubrics, and

gives the example a mountain fastness.

3
Later EFL dictionaries (Cobuild, Cambridge, Macmillan, etc) continued to

incorporate collocations in their dictionaries, including them in definitions and

examples, and typographically highlighting them in phrases. Sinclair’s Introduction to

the Cobuild Dictionary (1987), in the section on ‘Word and Environment’, talks of

‘the way in which the patterns of words with each other are related to the meanings

and uses of the words’ and says that ‘the sense of a word is bound up with a particular

usage… a close association of words or a grouping of words into a set phrase’ and ‘(a

word) only has a particular meaning when it is in a particular environment’,

discussing examples such as hard luck, hard facts, hard evidence, strong evidence,

tough luck, and sad facts.

In Sinclair (1987), collocates are defined as ‘words which co-occur significantly with

headwords’, and regular or significant collocation as ‘lexical items occurring within

five words… of the headword’ with a greater frequency than expected, which ‘was

established only on the basis of corpus evidence’. For the first time in lexicography, a

statistical notion of collocation has been introduced.

Collocation is used to distinguish senses: ‘Different sets of collocates found with

these different senses pinpoint the fact that they are different senses’; ‘Collocation…

frequently reinforces meaning distinctions’; and lexical sets used in disambiguation

are ‘signalled by coincidence of collocation’ (Sinclair 1987). Collocation can also be

a marker of metaphoricity: the presence of modifiers and qualifiers indicates

metaphorical uses of treadmill and blanket (e.g. …the corporate treadmill; …the

treadmill of office life; a security blanket for new democracies; a blanket of snow).

Collocation is the ‘lexical realisation of the situational context’ (ibid.). In the central

patterns of English, ‘meaning was only created by choosing two or more words

4
simultaneously’ (ibid.). However, the flexibility of collocation (sometimes crossing

sentence boundaries) caused problems in the wording of definitions: often, ‘no

particular group of collocates occurs in a structured relationship with the word’ and

therefore ‘there is no suitable pattern ready for use as a vehicle of explanation’ (ibid.).

The difficulty of eliciting collocates by intuition is discussed: we tend to think of

semantic sets; feet suggests ‘legs, toes, head’ or ‘shoe, sandals, sock’, or ‘walk, run’,

whereas significant corpus collocates of feet are ‘tall, high, long, and numbers’ (ibid.).

Prompted by hint, we produce ‘subtle, small, clue’; the corpus indicates ‘give, take,

no’. The difference between left-hand and right-hand collocates is exemplified by

open: the most frequent words before open are ‘the, to, an, is, an, wide, was, door,

more, eyes’ and after open are ‘to, and, the, for, up, space, a, it, in, door’ (ibid.).

Lexicographers can also use collocations to distinguish between near-synonyms, e.g.

the difference between electric (collocates: specific devices such as guitar, chair,

light, car, motor, windows, oven, all ‘powered by electricity’), and electrical

(collocates: more generic terms such as engineering, equipment, goods, appliances,

power, activity, signals, systems, etc, ‘concerning or involving electricity’).

4. Finding collocations in a corpus

Initially, collocates for dictionary headwords were identified manually by

lexicographers wading through pages of printouts of concordance lines. This was

clearly unsatisfactory, and only impressionistic views were feasible. Right-sorted

concordances obscured left-context collocates and vice versa. The fixed-length

context of printouts prevented the observation of collocates beyond a few words.

5
Subsequent software developments have enabled the automatic measurement of

statistically significant co-occurrences, within a specifiable and adjustable span or

window of context, using different measures of statistical significance, principally

mutual information (or MI-score) and t-score. MI-score privileges lower-frequency,

high-attraction collocates (e.g. dentist with hygienist, optician, and molar) while t-

score favours higher-frequency collocates (e.g. dentist with chair), including

significant grammatical words (e.g. dentist with a, and your). The software can also

display the collocate’s positional distribution if required, and recursive options are

available to investigate the detailed phraseology of collocating items.

Software has also become more publicly available, from MicroConcord to

Wordsmith Tools and Michael Barlow’s Collocate. Kilgarriff and Tugwell’s

WordSketch (Kilgarriff et al 2004) was used in creating the Macmillan EFL

dictionary, and offers clause-functional information about collocations, e.g. wear +

objects: suit, dress, hat, etc + prepositional phrases (after of: armour, clothing, jeans,

etc; after with: pride, sleeve, collar, etc; after on: sleeve, wrist, finger, etc; after over:

shirt, head, dress, etc); similarly fish is the subject of the verbs swim, catch, fry, etc,

the object of the verbs catch, eat, feed, etc, modified by the adjectives tropical, bony,

oily, etc, and so on.

Lexicographers are in general less concerned about the detailed classification of

collocations, although their judgments affect the both the placement and specific

treatment of the combinations. Hornby’s attempts at classification (focusing on verbs)

later used transformations and meaning distinctions as well as surface patterns, and

6
Hunston and Francis (2000) list the linguistic and lexicological terminology that has

developed subsequently for collocational units: lexical phrases, composites, gambits,

routine formulae, phrasemes, etc, and refer to the work of Moon and Melčuk in

discussing degrees of fixity and variation, which does impact on lexicography.

However, one of Firth’s original terms, colligation, used to describe the habitual co-

occurrence of grammatical elements, has not achieved the same widespread usage as

collocation. One manifestation of colligation, phrasal verbs, the combination of verb

and particle (adverb or preposition) to form semantic units, has been highlighted in

EFL dictionaries, and several EFL publishers have produced separate dictionaries of

phrasal verbs.

There have been some dictionaries of collocations, but so far each has had its own

limitations: not wholly corpus-based (e.g. Benson, Benson and Ilson; Hill and Lewis),

based on a small corpus (e.g. Kjellmer), or limited coverage (the recent Oxford

Collocational Dictionary for Students).

5. Collocation in computational linguistics, pedagogy, and translation

Interest in collocation has increased substantially in the past decade, as evidenced by

workshops at lexicographical, linguistic, pedagogical, and translation conferences.

For computational purposes, the relevant features of collocation are that they are

‘arbitrary, domain independent, recurrent, and cohesive lexical clusters’ (Smadja

1993), and ‘of limited semantic compositionality’ (Manning and Schutze 1999).

But the greatest interest has been generated in the pedagogic profession, with

numerous conference and journal papers. Lewis’s book (2000) encapsulates the main

7
concerns: students do not recognise collocations in their input, and hence fail to

produce them; collocation represents fluency (which precedes accuracy, represented

by grammar); transparent versus ‘arbitrary’ (or idiomatic) combinations, with familiar

words in rarer combinations (a heavy smoker is not a fat person); transformation is

misleading (extremely disappointed but rarely extreme disappointment); students may

generalise more easily from corpus concordance examples than from canonical

versions in dictionaries (exploring versus explaining); collocation as a bridge between

the artificial separation of lexis and grammar; collocation extends knowledge of

familiar words (easier than acquiring new words in isolation); longer chunks are more

useful and easier to store than isolated words.

6. Conclusions and the future

From many fields, it seems that collocation has a great future. The applications of

collocation in language teaching have been one of the notable recent successes. Its

more detailed exploration in large language corpora requires a significant advance in

software. The exact parameters are not fully established, and the statistical measures

can be improved. Research to identify word-senses by the clustering of collocates was

initiated in the 1960s (Sinclair et al 1970), but has still not become sufficiently robust

for automatic processing. The identification of lexical sets by collocation, signalled in

Sinclair (1966, 1970) and Halliday (1966), is yet to be achieved, as is a corpus-

generated thesaurus. The theoretical impetus of collocation has yet to reach the level

of a language-pervasive system, although Hoey’s notion of Lexical Priming heads in

that direction.

8
Further Reading

Benson, M., Benson, E. & Ilson, R. (1986). The BBI Combinatory Dictionary of

English. New York: John Benjamins

Church, K.W. & Hanks, P. (1989). ‘Word Association Norms, Mutual Information,

and Lexicography’ in Proceedings of the 27th Annual Meeting of the

Association for Computational Linguistics, reprinted in Computational

Linguistics 16:1, 1990.

Church, K.W., Gale, W., Hanks, P., & Hindle, D. (1990). ‘Using Statistics in Lexical

Analysis’, in U. Zernik (ed.) Lexical Acquisition: Using on-line Resources to

Build a Lexicon. Lawrence Erlbaum Associates .

Clear, J. (1993). ‘From Firth Principles: Computational Tools for the Study of

Collocation’, in Baker, M., Francis, G., & Tognini-Bonelli, E. (eds.) Text and

Technology. Amsterdam: John Benjamins.

Cowie, A.P. (1999). English Dictionaries for Foreign Learners - a History. Oxford:

Clarendon Press.

Firth. J.R. (1957): ‘Modes of Meaning’ in Papers in Linguistics 1934-51. London:

Oxford University Press.

Firth, J.R. (1957): ‘A Synopsis of Linguistic Theory 1930-55’ in Studies in Linguistic

Analysis, Philosophical Society, Oxford; reprinted in F. Palmer (ed.) Selected

Papers of J.R. Firth. Harlow: Longman.

Halliday, M.A.K (1966). ‘Lexis as a linguistic level’ in Bazell, C.E., Catford, J.C.,

Halliday, M.A.K., Robins, R.H. (eds.) In Memory of J.R. Firth. London:

Longman

Halliday, M.A.K. & Hasan, R. (1976). Cohesion in English. London: Longman

9
Hill, J. & Lewis, M. (1997). LTP Dictionary of Selected Collocations. Hove: LTP

Hoey, M (2003). ‘Textual colligation – a special kind of lexical priming’ to appear in

K Aijmer & B Altenberg (eds) Proceedings of ICAME 2002, Göteborg.

Kenny, D. (1998). ‘Creatures of Habit? What Translators Usually Do with Words’ in

Meta 43(4), 515-523.

Kilgarriff, A., Rychly, P., Smrz, P. & Tugwell, D. (2004). ‘The Sketch Engine’, in

Williams , G. & Vessier, S. (eds.) Proceedings of Euralex 2004. Lorient,

France: Université de Bretagne Sud.

Kjellmer, G. (1994) A dictionary of English collocations. Oxford: Clarendon

Press

Leech, G. (1974). Semantics. London: Penguin.

Lewis, M. (2000) Teaching Collocation. Hove: Language Teaching Publications.

Louw, B. (1993). ‘Irony in the text or insincerity in the writer? The diagnostic

potential of semantic prosodies’ in M Baker et al (eds) Text and Technology.

Amsterdam: John Benjamins.

Moon, R. (1998). Fixed Expressions and Idioms in English: A Corpus-based

Approach. Oxford: OUP

Palmer, H.E. (1933). Second Interim Report on English Collocations Tokyo:

Kaitakusha

Sinclair, J.M. (1966). ‘Beginning the Study of Lexis’ in Bazell, C.E., Catford, J.C.,

Halliday, M.A.K., Robins, R.H. (eds.) In Memory of J.R. Firth. London:

Longman

Sinclair, J.M., Jones, S. & Daley, R. (1970). English Lexical Studies, Report to OSTI

on Project C/LP/08. Now published as Krishnamurthy (ed.) (2004). English

10
collocation studies: the OSTI Report. London: Continuum.

Sinclair, J.M. (1987). Looking Up - An account of the COBUILD Project in lexical

Computing. London: Collins ELT.

Sinclair, J.M. (1987). ‘Introduction’ In the Collins Cobuild English Language

Dictionary. London/Glasgow: Collins.

Sinclair, J.M. (1987). ‘Collocation: a progress report’ in Steele, R. & Threadgold, T.

(eds.) Language Topics. Amsterdam/Philadelphia: Benjamins.

Sinclair, J.M. (1991). Corpus, Concordance, Collocation Oxford: O.U.P.

Stubbs, M. (1996). Text and Corpus Analysis Oxford: Blackwell.

Smadja, F. (1993). ‘Retrieving Collocations from Text: Xtract’, Computational

Linguistics 19(1):143-177.

Smadja, F., McKeown, K. & V. Hatzivassiloglou (1996). ‘Translating Collocations

for Bilingual Lexicons: A Statistical Approach’. Computational

Linguistics22(1):1-38.

A brief biography

Ramesh Krishnamurthy was born in Madras, India, and has degrees in French and

German from Cambridge University, and Sanskrit and Indian Religions from London

University. He worked for the COBUILD project at Birmingham University from

1984-2003, where he compiled and edited dictionaries, grammars, and other

publications, and contributed to the development of corpora, software, and electronic

products. He has been an Honorary Research Fellow at Birmingham University and

Wolverhampton University, and has taught on undergraduate and postgraduate

courses, and supervised postgraduate research. He has contributed to several

11
European linguistics projects, and conducted workshops and courses on corpus

linguistics and lexicography in several countries.

12

View publication stats

You might also like