Professional Documents
Culture Documents
COCA
COCA
• However, there are also online corpus processors that come with
built-in mega corpora of millions of words.
• It means that the word ‘jump’ has been repeated 19,993 times in the
corpus.
• Put differently, it means that the word ‘jump’ has occurred in 19,993
contexts in the corpus.
• You will need to use the ‘POS’ option to the right of the search box.
• To view the contexts in which your query phrase is used, you click the
phrase itself.
• The wildcard means anything: anything that comes in the position of the
wildcard.
• How about the expression ‘at first glance’? Is it fixed or flexible? Can
‘first’ be replaced by something else?
• What if we want to know which words are used with the suffix ‘-
icity’?
• We can search for parts of speech using the tags provided in the interface.
• If we want the most frequent common noun in COCA, we can use the
following:
• COCA also includes texts from different periods of time: 1990 – 2015.
• It does not always give an accurate idea about which word is more frequent.
where C(w) is the raw frequency of the given word, N is the total number of
words in the corpus, and the common base ranges from 10 to 1,000,000
depending on the size of the corpus.
Session 8 - Online Corpus Processors: The Case of COCA 17
COCA: Key Word In Content (KWIC) 1
• The Key Word In Content (KWIC) is the concordance function which
displays up to 1,000 random contexts of the query word.
• Two questions:
• What is the difference between the KWIC and clicking the word frequency to
see the contexts?
• The main difference is that with the KWIC, we get the context with
the parts of speech encoded in colors.
• It also displays the raw frequency of the first word to the second word.
• The third column is the ratio of word 1 to word 2 and it reads as follows: there
are 436.0 times as many cases of steady pace as there are stable pace.
• This is the window size – i.e. the search space in which the engine tries to
find collocations. It is meant to find both adjacent and non-adjacent
collocations.
• Adjacent collocations are the ones that immediately precede or follow your
query word. They are usually inseparable. In this case, you need to set the
window size to ±1. An example of adjacent collocations is at hand and kick
the bucket.
• Non-adjacent collocations are the ones that can be separated by one or more
words such as give up.
Session 8 - Online Corpus Processors: The Case of COCA 25
COCA: Finding Collocations 4