Professional Documents
Culture Documents
Bai Nhom
Bai Nhom
CORPUS(CORPORA)
Corpus (corpora) is a collection of texts (written and spoken) stored in the computer.
Example:
If we want to build a corpus of Vietnamese literary works, we need to ensure that the texts in
the corpus represent the diversity of the literary works community.
To do this, we need to select text samples from many different sources such as books,
newspapers...
In addition, we also need to consider the length of each text and the number of texts in the
corpus to ensure diversity and representation of Vietnamese literary works.
Example:
If a researcher is studying the use of slang language in online communication, they may
decide to limit the size of the corpus to 1 million words. This finite size allows the researcher
to manage the data effectively and analyze it within a reasonable timeframe. By setting a
finite size for the corpus, the researcher can ensure that the corpus is focused and relevant
to the research question at hand.
Example:
Machine-readable characters on pre-printed checks - magnetic ink character recognition
(MICR) - enable high-speed check sorting and deposit processing
4) A standard reference
Reference standards are used to determine quantitative data (such as assay and impurity),
qualitative data (such as identification test), and calibration (such as melting point standard)
Ex:
the Brown Corpus, the LOB corpus and the London-Lund corpus.
Translation corpora
A parallel corpus represents the same text in its original language (what we will call L1) and
in translation (what we will call L2). It is often used for contrastive linguistics.
Parallel corpora
These corpora allow people to compare, for example, L1 English texts in one genre with L1
French texts in the same genre. (McEnery & Wilson, 2001)
2. Corpus-based research
The research uses written or spoken (corpus) as a basis for the research
Ex :
Example: “A study on Semantic and Syntatic Features of Words and Phrases in Quotation
Marks Used in English and Vietnamese Economic Magazines”
The difference between qualitative and quantitative corpus analysis is that in qualitative
research we classify, conduct features, constructing more complex statistical models in an
attempt to explain what is observed, whereas a quantitative research describe aspects of
usage in the language, provide “real-life” example of particular phenomena.
Qualitative analysis can provide greater richness and precision, whereas quantitative
analysis can provide statistical reliable and generalizable results.
=> It is a good idea to combine the two qualitative and quantitative perspectives on the same
linguistic feature.
It looks like you've outlined the general process for conducting corpus-based research.
Here's a breakdown of each step:
This involves selecting texts or spoken language samples that are relevant to your research
questions. The corpus should be representative of the language variety or genre you're
studying.
Use computational tools and methods to analyze the corpus. The goal is to identify patterns
or relationships related to the linguistic feature you're investigating.
Clearly present your research findings in a well-structured report or paper, ensuring that you
provide adequate background information, methodological details, and a thorough
discussion of your results.
04 GUIDELINES
It looks like you've outlined the general process for conducting corpus-based research. Here's a breakdown of each step:
a. Determine the problem statement/research questions: Identify the specific area of linguistics or language study you want to
investigate. This could involve aspects such as syntax, semantics, discourse analysis, etc. Develop clear research questions or
hypotheses that guide your investigation.
b. Construct a corpus based on the purpose of the study: Decide on the scope and size of your corpus. This involves selecting
texts or spoken language samples that are relevant to your research questions. The corpus should be representative of the
language variety or genre you're studying.
c. Analyze a corpus to find out a linguistic feature under investigation: Use computational tools and methods to analyze the
corpus. This could include concordance analysis, collocation analysis, frequency counts, and more sophisticated statistical
techniques. The goal is to identify patterns or relationships related to the linguistic feature you're investigating.
Example: The following is an excerpt from Research Methodology of a corpus based study, titled: "A Study on Semantic and
Syntactic Features of Words and Phrases in Quotations Marks Used in English and Vietnamese Economic
This chapter consists of the presentation of the methods used in the study that support each other in investigating data and
finding the result. It also mentions the procedures in which the problems of the study are solved.
DESIGN
This thesis is a combination of qualitative and quantitative approaches in which quantitative approach is those which focus on
numbers and frequencies of words and phrases used in quotation mark of English and Vietnamese economic magazines.
RESEARCH METHODS
Research methods are specific procedures for collecting and analyzing data. Developing your research methods is an integral
part of your research design.
.We should combine many of method at the same time
DATA COLLECTION
Data is the information that you collect for the purposes of answering your research question. The type of data you need
depends on the aims of your research.