Professional Documents
Culture Documents
Corpus Definitions. Last Year
Corpus Definitions. Last Year
Corpus Definitions. Last Year
Corpus-based Studies You start from a previous theory or hypothesis and try
to test it against a corpus.
Parallel corpora Means that the texts are direct translations of one
another. One good example is the UN Corpus. It is
multilingual as it represents the six official UN
languages and the texts are exact translations.
Static corpora And most of the corpora are static – are not updated
once the researcher is done compiling them. For
example, COCA stopped in 2017.
A diachronic – or a historical corpus comprises texts from more than one period of
time.
A specialized corpus Is a corpus that is specific to one language form and
one text genre.
threshold option • This option sets the minimum frequency for the
two words to co-occur to be considered a
collocation.
The Wordlist function • gives you a list of word frequencies that can be
sorted in an ascending or a descending order.
The token count • count represents the total number of the words
in the corpus including duplicates.
AntConc is a free, offline, and light-weight corpus processor
that we will use for this course
POS taggers. • They take raw corpora as input and give the
grammatical category of each word as output.
Shallow Parsers • also work on the syntactic level but they try to
analyze phrase structure, that is to say, they try
to identify: noun, verb, adjectival, adverbial,
and prepositional phrases. It doesn’t work at
the word level but the phrase level.
Core Frame elements • Frame elements that are essential to the
meaning of a frame are called "core" FEs (e.g
Speaker in frames connected with
communication); expressions of time, place and
manner are generally not core FEs.