Professional Documents
Culture Documents
Chapter 28
Chapter 28
Language teachers use lists of important vocabulary to plan courses and create teaching
materials. They rely heavily on corpus data, which are collections of written and spoken texts, to
develop these lists. This practice dates back to the late 1800s when Kaeding conducted manual
counts of word frequencies in an eleven-million-word corpus to identify crucial words for
stenographers. Since then, similar methods have been used by teachers.
The idea is to focus on words that appear most often because learning these helps understand a
lot of language. For example, West's General Service List from 1953 includes about 2,000 word
families covering 80% of general English texts. Learning these common words is especially
helpful for beginners.
The use of computerized corpus analysis has made it much easier to compile word-frequency
statistics than in the past. This has sparked new research focused on teaching methods (e.g.,
Nation and Waring 1997; Biber et al. 1999; Coxhead 2000). Now, with corpora widely available
and automated word counting tools, teachers can create customized vocabulary lists for their
students. However, it's crucial to note that corpus software alone cannot effectively generate
teaching lists without significant human input. Teachers must carefully consider various factors
and understand their implications when creating these lists.
The first important decision is choosing or creating a suitable corpus. Nowadays, there are many
corpora available, and it's relatively easy to make one specifically for a project. It's crucial to
analyze needs carefully to ensure the corpus matches the target language well. Section 3 of this
chapter provides more details on designing a corpus.
A second decision, which has received less research attention, is how to define what counts as a
'word' (Gardner 2008). For example, teachers must decide whether related forms like 'run' and
'running' should be treated separately or together (lemmatized). Similar decisions arise for words
with multiple meanings (polysemous), such as 'run' meaning both a race and managing a
company. A list that combines various forms, meanings, and phrases into one abstract 'word'
description can save space but might lose important details. Clearly, some forms, meanings, and
combinations of high-frequency words like 'run' are more crucial to learn than others, while
many lower-frequency words may be less essential. Without separate frequency counts, however,
a list won't indicate which to prioritize for learning.
Certainly! Some researchers advocate for grouping related words together when teaching
vocabulary. They argue that once learners grasp one form of a word, such as "run,"
understanding its related forms like "running" becomes easier. This approach suggests that
learning these related forms doesn't require much additional effort once the base word is
understood.
Furthermore, these researchers suggest using abstract meanings to encompass various senses of
words that have multiple meanings. For example, the word "run" can mean both participating in
a race and managing an organization. By teaching an abstract concept that encompasses both
meanings, learners may find it easier to understand and use the word in different contexts.
The decisions about how to organize a vocabulary list depend on what it will be used for.
For example, if the list is meant to help learners understand texts better (comprehension),
then making distinctions between different forms and meanings of words might not be as
crucial. Learners can often figure out related forms and meanings based on the context of
what they're reading or hearing.
On the other hand, if the goal is for learners to use words accurately and actively in
speaking or writing (active control), then a more detailed approach is needed. This means
the list should include specific information about different forms and meanings of words.
Time is an important consideration. Automated tools that analyze large amounts of text
(corpora) are still not perfect at distinguishing between the different meanings of words.
They also struggle to accurately identify how words commonly appear together
(collocations). This means that creating a comprehensive vocabulary list often requires
extensive manual analysis.
The General Service List, developed by West and his colleagues, considered several criteria
beyond just how often words are used:
Less common words were included if they were important for expressing key ideas.
Words with synonyms available were excluded.
Words needed to be neutral in style.
Highly emotional words, used mainly for emphasis like exclamation marks, were not
included. Other suggested criteria include:
How difficult a word is for learners (Ghadessy, 1979).
How familiar and mentally available words are (Richards, 1974). Teachers also need to
consider grouping words that naturally go together, like days of the week, which vary
widely in how often they are used (O'Keeffe et al., 2007). Simply focusing on word
frequency may not be enough to create an effective list for teaching purposes.
In recent decades, corpus linguistics has revealed an important insight about how proficient
language users operate. They not only rely on individual words but also use fixed combinations
of words known as 'formulaic sequences'. These sequences can include various types:
Collocations and Colligations: These are pairs or groups of words that commonly occur
together, such as "hard luck", "tectonic plates", "black coffee", and "by the way".
Pragmatically Specialised Expressions: These are phrases used in specific social or
situational contexts, like "Happy Birthday", "Pleased to meet you", and "Come in".
Idioms: These are expressions where the meaning is not directly deducible from the
individual words used, such as "the last straw", "fall on your sword", and "part and
parcel".
Lexicalised Sentence Stems: These are structured patterns often used in sentences, like
"what’s X doing Y" (e.g., "what’s this fly doing in my soup") or "X BE sorry to keep
TENSE you waiting" (e.g., "Mr Jones is sorry to have kept you waiting").
What all these formulaic sequences have in common is that they are stored in memory as
complete units and are used without needing to be created or analyzed using grammar rules each
time they are used. This makes language use more efficient and natural for speakers and learners
alike.
in this context, formulaic sequences are considered an essential part of a language's vocabulary
that learners should acquire. Many experts highlight the importance of mastering these
sequences because:
Some researchers argue that learning grammar involves abstracting patterns from a base of
memorized formulas, though this view is controversial (Nattinger and DeCarrico 1992; Lewis
1993).
Wray (2002: Ch. 2) outlines four main ways to identify formulaic sequences:
However, each method has issues because they are all indirect. Defining formulaic sequences as
items recalled from memory depends on an internal, unobservable trait, which makes
confirmation challenging.
Every person has a unique language exposure history, so a corpus (a collection of texts
used for linguistic analysis) cannot perfectly represent everyone's experience.
Just because a phrase is common in a corpus doesn't mean it's common in every speaker's
experience, especially since formulaic sequences often depend on specific contexts.
The relationship between how often a phrase appears (frequency) and whether it's
formulaic isn't straightforward.
Some highly formulaic expressions, like idioms, are not frequently used, while some
common word combinations aren't truly formulaic.
There's debate about assuming that frequently occurring word sequences are stored in the
mind; they could be frequent due to natural associations in the world or language.
After identifying formulaic sequences, developers must decide which ones learners
should prioritize learning.
A common approach is to prioritize frequent formulae, but this method has been
criticized for potential pedagogical drawbacks discussed earlier.
Methods for identifying and teaching formulaic sequences are still developing.
Recent efforts have combined frequency-based analysis with insights from psychology
and teacher feedback to create lists of useful phrases for specific language learning
purposes, such as academic English.
Further research is needed to refine these methods and create effective teaching materials.
Academic written English is the main focus of these corpora, as it's crucial for EAP students to
learn how to read academic texts effectively and write clearly and accurately according to the
standards of various academic disciplines. One prominent example is the British Academic
Written English Corpus (BAWE), which contains 6.5 million words of student writing from
undergraduate and postgraduate programs across arts, humanities, social sciences, life sciences,
and physical sciences. This corpus is valuable for researchers, though access for EAP teachers
might be limited.
Another significant corpus is the Michigan Corpus of Upper-Level Student Papers (MICUSP),
which also focuses on academic written English and will be made accessible for academic use.
Smaller EAP corpora, including PhD theses, have been compiled by individuals but aren't always
readily available.
The Academic Word List (AWL) is widely used for academic vocabulary learning. It's derived
from a corpus of about 3.5 million words across arts, commerce, law, and science disciplines.
This list is popular among researchers, materials writers, EAP practitioners, and students alike,
aiming to teach essential academic vocabulary. However, some scholars argue that its
applicability varies across different disciplines.
Constructing a corpus for academic vocabulary learning involves careful needs analysis.
According to McEnery and Wilson (1996), it's crucial to sample texts that accurately represent
the target language's characteristics and proportions. Sampling methods like random sampling,
stratified sampling, and purposive sampling are used to ensure the corpus is representative.
Corpus size also matters. A small corpus may suffice for exploring grammar, but a larger one is
necessary for generating frequency lists that accurately reflect the language patterns being
studied, especially for less common words and phrases.
In Section 5 of the text, the compilation of a corpus focused on science and engineering research
articles is described. It discusses various approaches to designing vocabulary teaching materials
based on this corpus, providing examples of such materials.
This overview highlights how different corpora serve the needs of academic vocabulary learning
and the importance of their construction and accessibility in educational contexts.
4. What vocabulary input do my teaching materials provide
7. Table 28.1 compares keywords from in-house EAP materials with non-academic texts
from the British National Corpus. It shows that while general academic words like
"academic" and "project" are common, discipline-specific vocabulary is lacking in the in-
house materials.
The comparison indicates that while in-house EAP materials cover general academic
vocabulary, they often miss the specialized terms needed for comprehensive academic
language proficiency. Materials derived from specialized corpora tailored to specific
academic disciplines would be more effective in teaching this specialized vocabulary.
1. Frequency of Use:
o Objective: Determine if "average" is more often a noun or an adjective.
o Explanation: Students count instances of "average" as a noun vs. an adjective in
concordances to understand its common grammatical role in scientific texts.
2. Identifying Examples:
o Objective: Find specific examples of "average" used in different contexts.
o Examples:
As the result of a sum divided by count.
Referring to a varying but stable number or size.
Describing a typical person or thing.
Indicating a normal amount or quality for a group.
Describing something as neither very good nor bad.
o Explanation: Students locate and highlight these usages to understand different
meanings of "average."
3. Article Usage:
o Objective: Analyze the common article usage (definite "the" vs. indefinite "a/an")
with "average" as a noun.
o Explanation: By examining contexts where "average" appears as a noun,
students identify if it usually follows "the" or "a/an," understanding its
grammatical patterns.
4. Prepositional Phrases:
o Objective: Identify instances where "average" is part of a prepositional phrase.
o Explanation: Students find phrases where "average" is near prepositions like
"of," "in," or "for" to learn its syntactic structure in academic writing.
5. Section in Journal Articles:
o Objective: Predict sections of journal articles likely to contain "average."
o Explanation: Students speculate on where "average" appears in journal articles
(e.g., results, discussion) based on writing conventions.
6. Progressive Challenge:
o Objective: Prepare for complex tasks by mastering the analysis of "average."
o Explanation: Completing this task builds foundational skills for analyzing more
complex language patterns in their academic fields.
Conclusion
This task enhances vocabulary acquisition and analytical skills in corpus-based learning.
Students explore academic vocabulary like "average," preparing them to use specialized
language effectively in their academic and professional work.