Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

1.

CORPUS(CORPORA)
Corpus (corpora) is a collection of texts (written and spoken) stored in the computer.

A corpus may be considered under four main headings:


(1) Sampling and representativeness
(2) Finite size
(3) Machine-readable form
4) A standard reference

(1) Sampling and representativeness


- Sampling texts ensure the texts represent the text population.
- Representative sampling takes some following considerations:
+ The length of each sample text
+ The number of texts that is included in the corpus.

Example:
If we want to build a corpus of Vietnamese literary works, we need to ensure that the texts in
the corpus represent the diversity of the literary works community.
To do this, we need to select text samples from many different sources such as books,
newspapers...

In addition, we also need to consider the length of each text and the number of texts in the
corpus to ensure diversity and representation of Vietnamese literary works.

(2) Finite size


When creating a corpus for a specific research project, the finite size of the corpus is an
important consideration.

Example:
If a researcher is studying the use of slang language in online communication, they may
decide to limit the size of the corpus to 1 million words. This finite size allows the researcher
to manage the data effectively and analyze it within a reasonable timeframe. By setting a
finite size for the corpus, the researcher can ensure that the corpus is focused and relevant
to the research question at hand.

(3) Machine-readable form


When compiling a corpus in a machine-readable form, the texts are encoded in a format that
can be easily processed and analyzed by computer programs.

Example:
Machine-readable characters on pre-printed checks - magnetic ink character recognition
(MICR) - enable high-speed check sorting and deposit processing

4) A standard reference
Reference standards are used to determine quantitative data (such as assay and impurity),
qualitative data (such as identification test), and calibration (such as melting point standard)

Ex:
the Brown Corpus, the LOB corpus and the London-Lund corpus.

Translation corpora
A parallel corpus represents the same text in its original language (what we will call L1) and
in translation (what we will call L2). It is often used for contrastive linguistics.

Parallel corpora
These corpora allow people to compare, for example, L1 English texts in one genre with L1
French texts in the same genre. (McEnery & Wilson, 2001)

2. Corpus-based research
The research uses written or spoken (corpus) as a basis for the research

Ex :
Example: “A study on Semantic and Syntatic Features of Words and Phrases in Quotation
Marks Used in English and Vietnamese Economic Magazines”

- Research: + Semantic + Syntactic

- Features: + Words + Phrases

3. QUANLITATIVE VERSUS QUANTITATIVE ANALYSIS IN CORPUS BASED RESEARCH

Definition of qualitative and quantitative research

Qualitative Research: Qualitative analysis in corpus-based research involves the detailed


examination of the content and context of language use within a corpus.

Quantitative Research: Quantitative analysis in corpus-based research involves the


systematic counting and statistical analysis of linguistic features within a text corpus.

The difference between qualitative and quantitative corpus analysis is that in qualitative
research we classify, conduct features, constructing more complex statistical models in an
attempt to explain what is observed, whereas a quantitative research describe aspects of
usage in the language, provide “real-life” example of particular phenomena.

Qualitative analysis can provide greater richness and precision, whereas quantitative
analysis can provide statistical reliable and generalizable results.

=> It is a good idea to combine the two qualitative and quantitative perspectives on the same
linguistic feature.

4. GUIDELINES ON CONDUCTING CORPUS-BASED RESEARCH

It looks like you've outlined the general process for conducting corpus-based research.
Here's a breakdown of each step:

a. Determine the problem statement/research questions


Identify the specific area of linguistics or language study you want to investigate. This could
involve aspects such as syntax, semantics, discourse analysis, etc.

b. Construct a corpus based on the purpose of the study

This involves selecting texts or spoken language samples that are relevant to your research
questions. The corpus should be representative of the language variety or genre you're
studying.

c. Analyze a corpus to find out a linguistic feature under investigation

Use computational tools and methods to analyze the corpus. The goal is to identify patterns
or relationships related to the linguistic feature you're investigating.

d. Report on the findings

Clearly present your research findings in a well-structured report or paper, ensuring that you
provide adequate background information, methodological details, and a thorough
discussion of your results.
04 GUIDELINES

It looks like you've outlined the general process for conducting corpus-based research. Here's a breakdown of each step:

a. Determine the problem statement/research questions: Identify the specific area of linguistics or language study you want to
investigate. This could involve aspects such as syntax, semantics, discourse analysis, etc. Develop clear research questions or
hypotheses that guide your investigation.

b. Construct a corpus based on the purpose of the study: Decide on the scope and size of your corpus. This involves selecting
texts or spoken language samples that are relevant to your research questions. The corpus should be representative of the
language variety or genre you're studying.
c. Analyze a corpus to find out a linguistic feature under investigation: Use computational tools and methods to analyze the
corpus. This could include concordance analysis, collocation analysis, frequency counts, and more sophisticated statistical
techniques. The goal is to identify patterns or relationships related to the linguistic feature you're investigating.

d. Report on the findings:


Summarize and interpret the results of your analysis. Discuss how your findings
relate to your research questions or hypotheses. Consider the implications of your findings for theory and/or practical
applications. This could involve writing a
research paper, presenting at conferences, or publishing your results in academic journals.

5. SOME LINGUISTIC AREAS TO INVESTIGATE USING A CORPUS


Morphemes/Words
Phrase Structures/Syntax
Pragmatics
English for specific purposes (ESP)
Mistakes in learner language (In written and spoken from) Linguistic features in newspapers, literature and conversations

Example: The following is an excerpt from Research Methodology of a corpus based study, titled: "A Study on Semantic and
Syntactic Features of Words and Phrases in Quotations Marks Used in English and Vietnamese Economic

METHOD AND PROCEDURES

This chapter consists of the presentation of the methods used in the study that support each other in investigating data and
finding the result. It also mentions the procedures in which the problems of the study are solved.

In this chapter, how data are collected and analyzed is presented.

DESIGN

This thesis is a combination of qualitative and quantitative approaches in which quantitative approach is those which focus on
numbers and frequencies of words and phrases used in quotation mark of English and Vietnamese economic magazines.

RESEARCH METHODS

Research methods are specific procedures for collecting and analyzing data. Developing your research methods is an integral
part of your research design.
.We should combine many of method at the same time

.There are at least 3 basic methods need to be used in corpus-based research


+The descriptive method
+The analytic method
+The comparative method

DATA COLLECTION

Data is the information that you collect for the purposes of answering your research question. The type of data you need
depends on the aims of your research.

You might also like