Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

1|Page

BS ENGLISH

FINAL EXAMINATION

Year, 2020

(8th SEMESTER/EVENING)

CORPUS LINGUISTICS

Student Name : Zainab Ihsan

Roll Number : 42

Program : BS-English

Semester : V-III (Evening)

Course Title : Corpus Linguistics

Course Code : ENG - 410

Session : 2016-2020

Submitted To : Sir Abdul Haseeb Ezmi

Total no. of pages : 12

Bahauddin Zakariya University Multan


2|Page

CORPUS LINGUISTICS:
QUESTION NUMBER 1:

WHAT DO YOU MEAN BY LINGUISTIC ANNOTATION? GIVE EXAMPLES TO


DISCUSS THEM IN DETAIL?

LINGUISTIC ANNOTATION:

“A NOTE OR TAG THAT IS ADDED TO A TEXT OR DIAGRAM FOR ITS


EXPLANATION IS

CALLED ANNOTATION”

TERMINOLOGIES:

Those annotations in which we give separate tags to the s to show their specific
characteristics, the process is known as “TAGGING”.

The codes that are allocated to each of the in this Corpus to make it annotated corpora
are known as “TAGS”.

PARTS OF SPEECH TAGGING:

The most basic type of annotation is called “PARTS-OF-SPEECH TAGGING”. This


phenomena is also known as GRAMMATICAL TAGGING or MORPHOSYNTACTICAL
TAGGING”.

PURPOSE:

The purpose of this type of tagging is to assign each meaningful a 'TAG ‘in the text. In this
way each lexical unit will have a tag of its own and it would be easy to identify each from its
tag that is allocated to it.
3|Page

FOR EXAMPLE:

The first step in this type of tagging is to know the difference between homographs. Graphs
are the s that has the same pronunciation while speaking but this s may differ in their
meanings and context while they are used in the text.

We may not be able to distinguish between the sounds of the “bank” /baŋk/.But we can

distinguish it by its meaning i.e. the bank as a noun that means land on the side of a river

and bank as adverb that means heap of something.

CODES:

A basic set of codes is told to the Corpus analysts and these codes can be assigned every part
of speech in the text that we are annotating. Some examples of these codes are;

AT: Singular article

IN: Preposition

IO: of

ANNOTATED TEXT (POS TAGGING):

The DT boy's NPSname NN was VBZ Santiago NP. Dusk NP was VBZ falling VBD as IN the DT
boy NN arrived VBN with IN his PPS herd NP at IN an DT abandoned JJchurch NP. The DT
roof NP had VBZ fallen VBD in IN long JJ ago RP , and CC an DT enormous JJ sycamore NP
had VBZ grown VBD on IN the DT spot NN where RP the DTsacristy NP had VBZ once RP
stood VBD.
4|Page

ADVANTAGES OF ANNOTATION:
There are several advantages of having the privilege to convert simple corpora into its
annotated form.

1) INCREASED SPAN:

Annotations increase the span of linguistic occurrences that can be found in a simple text or
the corpora that we have. If we have simple text then we can only search for the s that are
individually occurring in that Corpus but if we have an annotated Corpus then we can search
by giving more specific instructions about the s that we want to search from that Corpus.

For example,
We can search for the “SURPRISE “that is only occurring in the Corpus as a verb.

2) REUSABLE:

The annotations can be used again. if they are first done manually then they should be saved
for the purpose of reusing the same annotated Corpus. People may argue that it is easy to
annotate a text by using automatic ways but these manually annotated corpuses are more
accurate.
5|Page

QUESTION NUMBER 2:

HOW WILL YOU APPLY CORPUS LINGUISTICS FOR THE STUDY OF CULTURE?
DISCUSS IN DETAIL.

CULTURE:
ACCORDING TO SOCIOLOGY;

“THE LANGUAGE, CUSTOMS, BELIEFS, RULES, KNOWLEDGE


AND COLLECTIVE IDENTITIES AND MEMORIES DEVELOPED
BY MEMBERS OF ALL SOCIETAL GROUPS THAT MAKE THEIR
OWN SOCIAL ENVIRONMENT MEANINGFUL IS KNOWN AS
CULTURE.”

OXFORD LEARNER'S DICTIONARY ALSO DESCRIBE CULTURE AS,

“THE CUSTOMS, BELIEFS, ART, WAY OF LIFE AND SOCIAL


ORGANISATION OF A PARTICULAR COUNTRY OR GROUP.”

CULTURAL STUDIES:

“CULTURAL STUDIES IS A FIELD OF THEORETICALLY, POLITICALLY AND


EMPIRICALLY ENGAGED CULTURAL ANALYSIS THAT CONCENTRATES
UPON THE POLITICAL DYNAMICS OF CONTEMPORARY CULTURE, ITS
HISTORICAL FOUNDATIONS, DEFINING TRAITS, CONFLICTS AND
CONTINGENCIES.”

COMPARISON OF CORPORA:

When the Lancaster- Oslo/ Bergin Corpus (LOB) was completed in British English, the
comparison of this Corpus was carried out with “Parallel” American Brown Corpus that was
given by Hofland and Johansson (1982). The comparison was basically of the “vocabulary”
between both of these corpora. This comparison showed differences that one not the
differences of linguistics i.e. Spelling e.g. (humor/ humour) or Morphology e.g.
(beat/beaten). The difference is that we find out between the context of the s and how
frequently that s was used in both of these corpora.
6|Page

STUDY OF CULTURE IN CORPUS LINGUISTICS:

Fallon and Geoffrey Leech took up the idea of studying culture in the corpora.

INITIAL DATA:

LEECH AND FALLON (1992) used the British and American comparison of frequency as
their initial data. To check the sense of s being used in two corpuses, the technique that they
both used was KWIC concordance (Key s In Context) . Fifteen broad categories were made
by them in which these differences were categorised. the concepts in this category is revealed
not only the differences in language between both of the countries i.e. AMERICA AND
BRITAIN but also evident were the cultural differences.

FOR EXAMPLE:
The s that were related to “travelling” were more frequent in American English as compared
to British English maybe due the reason that the area of United states is much larger than
Britain.

s that were suggestive of military or crime we are also occupant in American English Corpus
than in British English Corpus. The reason of this repetition of s related to crime maybe
because of the “Gun Culture” that is more prevalent in America. Also more violence was
shown in the American Corpus.

COMPARISON:

Generally speaking, the time (1961) at which the American and Britain corpora were
compared the more representation of culture was from the AMERICAN SIDE.

The study of culture is still in its initial stages in Corpus linguistics and it requires some kind
of methodology for its refinement, it is still a very fascinating kind of study and can be very
helpful in language studies as well as in cultural studies of different nations.
7|Page

QUESTION NUMBER 3:

DEFINE THE TERM 'CONCORDANCE' AND DISCUSS AT LEAST TWO


CONCORDANCERS WHICH YOU WILL USE FOR THE PREPARATION OF
CONCORDANCES?

CONCORDANCES:

“CONCORDANCE IS AN ALPHABETICAL LIST OF S SHOWING


THEIR AND HOW OFTEN THEY WERE USED”

Concordances show every instance of a given which is surrounding the text and these
concordances can be generated for both English and for other languages.

CONCORDANCERS:

“A concordance is a computer program that automatically constructs a


concordance”

These are used in Corpus linguistics to bring out the ordered lists of data from the Corpus that
is being analysed and then the linguist analyses the data.

EARLY CONCORDANCERS:

The concordances where available even before the computers were invented. At that time
they were present and available for the most important kind of works that were considered to
be precious at that time.

These important works with the works that was written by Shakespeare or the religious and
Holy books i.e. QURAN and BIBLE. At that time there were no softwares because
computers were not available to the mankind so all these concordances were to be put
together by hand.
8|Page

ADVANCED CONCORDANCERS:

Now a days we have text in electronic form and also such kind of software that have been
invented where the text can be loaded on a concordance and the search can be started.

PARALLEL CONCORDANCERS:

A parallel concordance allows the linguist to search a that is present in multiple languages.
These softwares are used in the study of translated texts.

ANTCONC:

AntConc is a concordance that is available for free and it works on all kinds of platforms.
This concordance allows the analyst to load files and then search can be carried out in those
files by the Corpus analyst. It also allows regulating the size of text. It means that this
software tells us the count of the characters or s that represent in a Corpus.

• Concordance plot:
It also allows sorting the results. The software offers other forms in which the result can be
displayed. One of the displays is the “CONCORDANCE PLOT”. This display is in the form
of a barcode and it helps us to show the distribution of s throughout the text.

• File view:
File view is another option of showing the results. This display shows no restriction on the
count or size of the text.

• Cluster/N-Gram:
This option is used for finding out the s that are present in combination throughout the text.

• s List:
This option shows the frequency of the s that are already picked out of the text.
9|Page

PARACONC:

It is a type of concordance that can work on a maximum of four languages. To use this
software the text must be ALIGNED first and by the alignment of the text we mean that the
sentences are to be put in a way that we can determine where the sentence is starting and
where the end of the sentence is.

The easiest way of aligning the sentences is to type them on different lines so that it is easier
for the analyst to know that which sentence is on which line and in that way the translations
of all the sentences can be found out in different files. The alignment can be done of the
paragraphs too but it makes the search more complicated.

There is no particular mention of the language that can be used in the software so any
language can be used in this concordance.

QUESTION NUMBER 4:

HOW CAN CORPUS LINGUISTICS PLAY ITS ROLE IN IDENTIFYING CO-


OCCURRENCE PERCENT OF S AS FAR AS COLLOCATIONS ARE
CONCERNED? WHICH FORMULA WILL YOU APPLY FOR THIS TYPE OF
ANALYSIS? ALSO DISCUSS THE FORMULA WHICH CAN BE APPLIED FOR
THE ANALYSIS OF BILINGUAL PARALLEL CORPUS?
10 | P a g e

COLLOCATIONS:

“COLLOCATIONS CAN BE DEFINED AS THE


CHARACTERISTIC CO-OCCURRENCE OF PATTERNS OF S”

This idea is said to be very important in many areas of linguistics. KJellmer (1991) argued
that our mental lexicon is made up not only of single s but of larger psychological units, both
fixed and more variable.

LEXICOGRAPHY:

The co-occurrence pattern is basically important in dictionary writing that is also called
LEXICOGRAPHY.As a dictionary has to tell people about the uses and different meanings
of the s that's meaning one wants to know. So it is important for the dictionary writers to
know the pattern of core occurrences and their context to write dictionaries.

KJELLMER’S PHRASOLOGICAL UNITS:


In KJellmer's phrasological units, the company which keeps individual s help to define their
use and sense. This information is important for natural language processing and for
language teaching but in connected discourse every occurs in company of other s. So, it is
possible to identify which go occurrences are significant collocations.

FORMULAS:
It is often possible to identify which s are frequently occurring together in a Corpus and
which s have by chance occurred together. But how would we know that which s are by
chance acting together and which s are frequently up in together because there are a process
that consists of thousands of s and it is not humanly possible to check this occurrence of s
manually. So there are two formulas that are often used to calculate this relationship between
the s and to know that either they are naturally occurring together and the belong together or
it is a by chance co-occurrence. These two formulas are;

1) MUTUAL INFORMATION
2) Z-SCORE
11 | P a g e

1) MUTUAL INFORMATION:

• ORIGIN OF FORMULA:

It is a formula that is taken from theoretical computer science where it is known as


INFORMATION THEORY.

• USE IN CORPUS ANALYSIS:

The mutual information score between any two s or any other category of s occurring in the
same text can be measured by observing how frequently or how often they are occurring
together in a text

FOR EXAMPLE,
“RAIN COAT “are the two s that belong together but they can occur as two different s too
whereas the s “mutual information” in the above sentence are just juxtaposition together.

If the two s belong and occur together frequently the mutual information will be higher. If the
s occurs merely together then the score will be low. If the s occurs together only one or two
times together then the school will be near to zero.

2) Z- SCORE:
Z- Score provides the data as provided by mutual information. It provides a frequency of s
occurring together.

• NODE:
An Ode is a that selected in a Corpus to know its frequency with the other s that are occurring
with it.
12 | P a g e

• CONTEXT WINDOW:
A context window is a term that is specifically used for z score. And here it makes the
difference, this context window is the main difference between z score and mutual
information because mutual information provides the overall frequency of s that are occurring
together but z score provides the frequency of a that is chosen by the analyst i.e. NODE and
its frequency is analysed with the s that are occurring on its left and right side in the context
of s is shown by the program. The number of s is installed in the program for example we can
say that three s on the right side of the node and three s on the left side of the node word are
chosen and the frequency would be gauged on the basis of these s that are occurring on the
sides of that note . This particular size of the worlds chosen by the program is called
CONTEXT WINDOW.

If the score of an old is higher, higher is the degree of collectability.

• TACT CONCORDANCE:
It is a package that uses z score only for calculating the collocation of s in a Corpus.

BILINGUAL PARALLEL CORPUS:

The formula that is used for the calculation of co occurrence pattern in bilingual parallel
Corpus is MUTUAL INFORMATION. It can be used to examine the relativity between two
corporate that are Aligned in a parallel form.

• LEVEL OF ALIGNMENT:
The level at which these two corpora must be aligned is sentence level. After arranging them
in sentences we may be interested in their translation. That means that we may be interested
to know that which s can be translated as what.

FOR EXAMPLE,
In these two sentences

Amo a mi mama

I love my mom

Amomay be the translation of love, I is the translation of a, mi is the translation of my and


mama of mom.

The main thing that we consider is that which would occur more significantly with each other
rather than having a accidental occurrence with each other.
13 | P a g e

USES OF MUTUAL INFORMATION AND Z-SCORE:

1) DICTIONARIES:
Z- Score and mutual information both are used in compiling dictionaries. Both the softwares
are used while the s in the dictionaries are being set.

2) Multi units:
These multi units are extracted from the Corpus through both of these techniques that does
not only extract single s but phrases too for example, human digestive system is the example
of multi.

This is not only used in traditional lexicography but also used in modern techniques of
extraction of s that are used in translated versions of studies or books.

You might also like