Name: David Elkharis Larosa Class: B Subject: Discourse Analysis Corpus Approaches To Discourse Analysis A. What Is A Corpus?

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Name : David Elkharis Larosa

Class :B
Subject : Discourse Analysis
Corpus Approaches to Discourse Analysis
A. What is a corpus?

It is generally assumed that a corpus is a collection of spoken or written authentic texts t


hat is representative of a particular area of language use, by virtue of its size and composition. It
is not always the case, however, that the corpus is representative of language use in general, or
even of a specific language variety, as the data set may be very specialized (such as material
collected from the internet) and it may not always be based on samples of complete texts. The
data may also be only of the spoken or written discourse of a single person, such as a single
author’s written work. It is important, then, to be aware of the specific nature and source of
corpus data so that appropriate claims can be made from the analyses that are based on it
(Kennedy 1998 , Tognini-Bonelli 2004 ).
A corpus is usually computer-readable and able to be accessed with tools such as
concordances which are able to find and sort out language patterns. The corpus has usually
(although not always) been designed for the purpose of the analysis, and the texts have been
selected to provide a sample of specific text-types, or genres, or a broad and balanced sample of
spoken and/or written discourse.
B. Kinds of corpora
1. General corpora

Corpora may be general or they may be specialized. A general corpus, also known as a
reference corpus. A general corpus, thus, provides sample data from which we can make
generalizations about spoken and written discourse as a whole, and frequencies of occurrence
And co-occurrence of particular aspects of language in the discourse. It will not, however, tell us
about the language and discourse of particular genres or domain of use (unless the corpus can be
broken down into separate genres or areas of use in some way). For this, we need a specialized
corpus.
2. Specialized corpora
Specialized corpora are required when the research question relates to the use of spoken
or written discourse in particular kinds of texts or in particular situations. A specialized corpus
might be used, for example, to examine the use of hedges in casual conversation or the ways in
which people signal a change in topic in an academic presentation.

C. Design and construction of corpora

These contain data that can be used for asking very many questions about the use of
spoken and written discourse both in general and in specific areas of use, such as academic
writing or speaking. If, however, your interest is in what happens in a particular genre, or in a
particular genre in a setting for which there is no available data, then you will have to make up
your own corpus for your study.
Hyland’s ( 2002a ) study of the use of personal pronouns such as I , me , we and us in
Hong Kong student’s academic writing is an example of a corpus that was designed to answer a
question about the use of discourse in a particular genre, in a particular setting. The specific aim
of his study was to examine the extent to which student writers use self-mention in their texts ‘to
strengthen their arguments and gain personal recognition for their claims’ in their written
discourse.
D. Issues to consider in constructing a corpus

There are a number of issues that need to be considered when constructing a corpus.
The first of these is what to include in the corpus; that is, the variety or dialect of the language,
the genre(s) to be included, whether the texts should be spoken, written or both and whether the
texts should be monologic, dialogic or multi-party. The next issue is the size of the corpus and of
the individual texts, as well as the number of texts to include in each category. The sources and
subject matter of the texts may also be an issue that needs to be considered. Other issues include
sociolinguistic and demographic considerations such as the nationality, gender, age, occupation,
education level, native language or dialect and the relationship between participants in the texts.
There are some the issues to consider in constructing a corpus:
1. Authenticity, representativeness and validity of the corpus
2. Kinds of texts to include in the corpus
3. Size of the texts in the corpus
4. Sampling and representativeness of the corpus
E. The Longman Spoken and Written English Corpus
Longman Spoken and Written English (LSWE) Corpus is an important example of a
corpus study. The LSWE was used as the basis for the Longman Grammar of Spoken and
Written English . The LSWE corpus is made up of 40 million words, representing four major
discourse types: conversation, fiction, news and academic prose, with two additional categories:
non-conversational speech (such as lectures and public meetings) and general written non-fiction
prose. The LSWE corpus aimed to provide a representative sampling of texts across the
discourse types it contained. The conversational data in the corpus was collected in real-life
settings and is many times larger than most other collections of conversational data.
F. Discourse characteristics of conversational English.
1. Non-clausal units in conversational discourse
A key observation made in the Longman grammar is that conversational discourse makes
wide use of non-clausal units ; that is, utterances which do not contain an explicit subject or
verb. These units are independent or self-standing in that they have no grammatical connection
with what immediately precedes or follows them. The use of these units in conversational
discourse is very different from written discourse where they rarely occur.
2. Personal pronouns and ellipsis in conversation
Conversational discourse also makes wide use of personal pronouns and ellipsis. This is
largely because of the shared context in which conversation occurs. The meaning of these items
and what has been left out of the conversation can usually be derived from the context in which
the conversation is taking place.
3. Situational ellipsis in conversation
Speakers often use situational ellipsis in conversation, leaving out words of low
information value where the meaning of the missing item or items can be derived from the
immediate context, rather than from elsewhere in the text.
4. Non-clausal units as elliptic replies in conversation
Non-clausal units as elliptic replies often occur in conversational discourse, as in the
example below where Marie simply says ‘Why (do you have to get Paul to come over)?’ In the
shared social situation in which the conversation is taking place both speakers know what she is
asking about:
Ryan: I’m gonna have to get Paul to come over, too.
Mare: Why ?
5. Repetition in conversation
Conversation also uses repetition much more than written discourse. This might be done,
for example, to give added emphasis to a point being made in a conversation. One way speakers
may do this is by echoing each other.
6. Lexical bundles in conversational discourse
Conversational discourse also makes frequent use of lexical bundles; that is, formulaic
multiword sequences such as It’s going to be, If you want to and or something like that (Biber,
Conrad and Cortes 2004). Research has shown that lexical bundles occur much more frequently
in spoken discourse than they do in written discourse. Speakers may, for example, use them to
give themselves time to think what they will say next. They do this as conversation occurs in real
time and speakers often take and hold on to the floor at the same time as they are planning what
to say next.
G. Performance phenomena of conversational discourse
Speakers need to both plan what they are going to say and speak at the same time as they
are doing this, meaning that their speech contains pauses, hesitations and repetitions while this
happens. There are some performance of conversational discourse:
1. Silent and filled pauses in conversation
2. Utterance launchers and filled pauses
3. Attention signals in conversation
4. Response elicitors in conversation
5. Non-clausal items as response forms
6. Extended coordination of clauses
H. Constructional principles of conversational discourse
The principle of keep talking refers to the need to keep a conversation going while
planning for the conversation is going on. The principle of limited planning ahead refers to
human memory limitations on planning ahead; that is, restrictions on the amount of syntactic
information that can be stored in memory while the planning is taking place. The principle of
qualification of what has been said refers to the need to qualify what has been said ‘after the
event’ and to add things which otherwise would have already been said in the conversation.
1. Prefaces in conversation
Prefaces may include fronting of clausal units, noun phrase discourse markers and other
expressions such as interjections, response forms, stance adverbs, linking adverbs, overtures,
utterance launchers and the non-initial use of discourse markers.
2. Tags in conversation
Speakers add tags in many ways as an afterthought to a grammatical unit in
conversational discourse. They can do this by use of a question tag at the end of a sentence. The
effect of this is to turn a statement into a question. A tag can also be added to the end of a
statement to reinforce what has just been said. This can be done by repeating a noun phrase, by
paraphrasing what has been said or by adding a clausal or non-clausal unit retrospectively to
what has just been said.
I. studies Corpus of the social nature of discourse
Swales ( 2003 ) found (as did Biber et al. 2002 in the TOEFL study, although the
framework for their analysis was quite different) that, in the area of academic speaking (in
contrast to academic writing), there were fewer differences between disciplines than he had
expected and that many spoken academic interactions had a lot in common with general
conversational English. He found academic speaking across the university tended to be informal
and conversational, guarded rather than evaluative and deferential rather than confrontational. He
found spoken discourse to be unpretentious in terms of vocabulary choice. It also generally
avoided name-dropping and the use of obscure references.
J. Collocation and corpus studies
Corpus studies have also been used to examine collocations in spoken and written
discourse. Hyland and Tse’s (2004) study of dissertation acknowledgements, for example, found
the collocation ‘special thanks’ was the most common way in which dissertation writers
expressed gratitude in the acknowledgements section of their dissertations. This was followed by
‘sincere thanks’ and ‘deep thanks’. They found this by searching their corpus to see how the
writers typically expressed gratitude, and then what items typically occur to the left of the item
‘thanks’.
K. Corpus studies and academic writing
Corpora have been extremely useful for academic writing teachers in that they are able to
show how language is used in particular academic genres. Hyland’s ( 2002a ) study of the use of
personal pronouns in Hong Kong student’s academic writing is an example of this kind, as are
his (2008a, 2008b) analyses of word clusters in published research articles and graduate student
writing.
L. Criticisms of corpus studies
Flowerdew ( 2005 ) and Handford ( 2010 ) provide a summary of, and response to some
of these criticisms. One criticism is that the computer-based orientation of corpus studies are a
bottom-up investigation of language use. A further criticism is that corpora are so large they do
not allow for a consideration of contextual aspects of texts (Widdowson 1995 , 2000 , Virtanen
2009 ). Tribble ( 2002 ) counters this view by providing a detailed discussion of contextual
features, such as the social context of the text, communicative purpose of the text, roles of
readers and writers of the text, shared cultural values required of readers and writers of the text
and knowledge of other texts that can be considered in corpus studies to help address this issue.
One way of gaining contextual information for an analysis is by the use of interviews and focus
group discussions with users of the genre and consideration of the textual information revealed in
the corpus study in relation to this information, as Hyland ( 2004 c) did in his Disciplinary
Discourses . The analysis can also be combined with other contextual information available on
the data such as information on the speech event and speaker attributes and other information
that is available on the data, such as the information that accompanies the MICASE and BAWE
corpora. Each of these strategies can help offset the argument that corpus studies are, necessarily,
decontextualized and only of interest at the item, rather than the discourse level (see Handford
2010 for further discussion of this).

You might also like