Chapter 28

Chapter 28
What vocabulary is important for my learner?
Language teachers use lists of important vocabulary to plan courses and create teaching
materials. They rely heavily on corpus data, which are collections of written and spoken texts, to
develop these lists. This practice dates back to the late 1800s when Kaeding conducted manual
counts of word frequencies in an eleven-million-word corpus to identify crucial words for
stenographers. Since then, similar methods have been used by teachers.
The idea is to focus on words that appear most often because learning these helps understand a
lot of language. For example, West's General Service List from 1953 includes about 2,000 word
families covering 80% of general English texts. Learning these common words is especially
helpful for beginners.
The use of computerized corpus analysis has made it much easier to compile word-frequency
statistics than in the past. This has sparked new research focused on teaching methods (e.g.,
Nation and Waring 1997; Biber et al. 1999; Coxhead 2000). Now, with corpora widely available
and automated word counting tools, teachers can create customized vocabulary lists for their
students. However, it's crucial to note that corpus software alone cannot effectively generate
teaching lists without significant human input. Teachers must carefully consider various factors
and understand their implications when creating these lists.
The first important decision is choosing or creating a suitable corpus. Nowadays, there are many
corpora available, and it's relatively easy to make one specifically for a project. It's crucial to
analyze needs carefully to ensure the corpus matches the target language well. Section 3 of this
chapter provides more details on designing a corpus.
A second decision, which has received less research attention, is how to define what counts as a
'word' (Gardner 2008). For example, teachers must decide whether related forms like 'run' and
'running' should be treated separately or together (lemmatized). Similar decisions arise for words
with multiple meanings (polysemous), such as 'run' meaning both a race and managing a
company. A list that combines various forms, meanings, and phrases into one abstract 'word'
description can save space but might lose important details. Clearly, some forms, meanings, and
combinations of high-frequency words like 'run' are more crucial to learn than others, while
many lower-frequency words may be less essential. Without separate frequency counts, however,
a list won't indicate which to prioritize for learning.
Certainly! Some researchers advocate for grouping related words together when teaching
vocabulary. They argue that once learners grasp one form of a word, such as "run,"
understanding its related forms like "running" becomes easier. This approach suggests that
learning these related forms doesn't require much additional effort once the base word is
understood.
Furthermore, these researchers suggest using abstract meanings to encompass various senses of
words that have multiple meanings. For example, the word "run" can mean both participating in
a race and managing an organization. By teaching an abstract concept that encompasses both
meanings, learners may find it easier to understand and use the word in different contexts.
 The decisions about how to organize a vocabulary list depend on what it will be used for.
For example, if the list is meant to help learners understand texts better (comprehension),
then making distinctions between different forms and meanings of words might not be as
crucial. Learners can often figure out related forms and meanings based on the context of
what they're reading or hearing.
 On the other hand, if the goal is for learners to use words accurately and actively in
speaking or writing (active control), then a more detailed approach is needed. This means
the list should include specific information about different forms and meanings of words.
Challenges in Vocabulary Research:
 Time is an important consideration. Automated tools that analyze large amounts of text
(corpora) are still not perfect at distinguishing between the different meanings of words.
They also struggle to accurately identify how words commonly appear together
(collocations). This means that creating a comprehensive vocabulary list often requires
extensive manual analysis.
The General Service List, developed by West and his colleagues, considered several criteria
beyond just how often words are used:
 Less common words were included if they were important for expressing key ideas.
 Words with synonyms available were excluded.
 Words needed to be neutral in style.
 Highly emotional words, used mainly for emphasis like exclamation marks, were not
included. Other suggested criteria include:
 How difficult a word is for learners (Ghadessy, 1979).
 How familiar and mentally available words are (Richards, 1974). Teachers also need to
consider grouping words that naturally go together, like days of the week, which vary
widely in how often they are used (O'Keeffe et al., 2007). Simply focusing on word
frequency may not be enough to create an effective list for teaching purposes.
2. Vocabulary materials and formulaic language
In recent decades, corpus linguistics has revealed an important insight about how proficient
language users operate. They not only rely on individual words but also use fixed combinations
of words known as 'formulaic sequences'. These sequences can include various types:
 Collocations and Colligations: These are pairs or groups of words that commonly occur
together, such as "hard luck", "tectonic plates", "black coffee", and "by the way".
 Pragmatically Specialised Expressions: These are phrases used in specific social or
situational contexts, like "Happy Birthday", "Pleased to meet you", and "Come in".
 Idioms: These are expressions where the meaning is not directly deducible from the
individual words used, such as "the last straw", "fall on your sword", and "part and
parcel".
 Lexicalised Sentence Stems: These are structured patterns often used in sentences, like
"what’s X doing Y" (e.g., "what’s this fly doing in my soup") or "X BE sorry to keep
TENSE you waiting" (e.g., "Mr Jones is sorry to have kept you waiting").
What all these formulaic sequences have in common is that they are stored in memory as
complete units and are used without needing to be created or analyzed using grammar rules each
time they are used. This makes language use more efficient and natural for speakers and learners
alike.
in this context, formulaic sequences are considered an essential part of a language's vocabulary
that learners should acquire. Many experts highlight the importance of mastering these
sequences because:
 Using ready-made phrases is seen as mentally efficient (Ellis 2003).

 It increases the chances of speaking naturally and idiomatically (Kjellmer 1990; Hoey
2005).
 Knowing formulaic sequences well is believed to contribute to fluency and the ability to
produce language like a native speaker (Pawley and Syder 1983; Kjellmer 1990).
Some researchers argue that learning grammar involves abstracting patterns from a base of
memorized formulas, though this view is controversial (Nattinger and DeCarrico 1992; Lewis
1993).
Accepting the importance of formulaic sequences poses challenges for educators:
 They must identify which word combinations qualify as formulaic.

 They must prioritize which formulaic sequences are most crucial for learners to master,
considering the vast number known to native speakers.
Wray (2002: Ch. 2) outlines four main ways to identify formulaic sequences:
 Intuition: Recognizing sequences that analysts or informants perceive as formulaic.

 Frequency: Identifying sequences that occur frequently in language corpora.
 Structure: Noting sequences that don’t follow typical grammar rules.
 Phonology: Identifying sequences that are phonologically coherent.
However, each method has issues because they are all indirect. Defining formulaic sequences as
items recalled from memory depends on an internal, unobservable trait, which makes
confirmation challenging.
In corpus linguistics, frequency of occurrence is commonly used as the main measure. It is

believed that common phrases are stored in language users' minds and recalled more frequently
than less conventional ones. This method is relatively unbiased and enables efficient processing
of large amounts of text.
Yet, relying solely on frequency also has limitations:
 It may not reflect every speaker's language experience.

 Many formulaic expressions are context-dependent and may not be frequently used
across all situations.
 Some high-frequency sequences aren’t formulaic in nature, creating challenges in
accurately identifying them solely based on frequency.
 Corpus Representativeness and Variation:
 Every person has a unique language exposure history, so a corpus (a collection of texts
used for linguistic analysis) cannot perfectly represent everyone's experience.
 Just because a phrase is common in a corpus doesn't mean it's common in every speaker's
experience, especially since formulaic sequences often depend on specific contexts.
 Frequency vs. Formulaicity:
 The relationship between how often a phrase appears (frequency) and whether it's
formulaic isn't straightforward.
 Some highly formulaic expressions, like idioms, are not frequently used, while some
common word combinations aren't truly formulaic.
 There's debate about assuming that frequently occurring word sequences are stored in the
mind; they could be frequent due to natural associations in the world or language.
 Challenges in Vocabulary Development:
 After identifying formulaic sequences, developers must decide which ones learners
should prioritize learning.
 A common approach is to prioritize frequent formulae, but this method has been
criticized for potential pedagogical drawbacks discussed earlier.
 Current State and Future Directions:
 Methods for identifying and teaching formulaic sequences are still developing.
 Recent efforts have combined frequency-based analysis with insights from psychology
and teacher feedback to create lists of useful phrases for specific language learning
purposes, such as academic English.
 Further research is needed to refine these methods and create effective teaching materials.
What type of corpus is suitable for academic vocabulary learning:

The type of corpus that's suitable for learning academic vocabulary should primarily include the
language that students will encounter during their studies. Recently, several corpora have been
developed specifically for English for Academic Purposes (EAP) to cater to international
students in higher education. To get an overview of these corpora, you can refer to chapters by
Coxhead and Lee, as well as Krishnamurthy and Kosem (2007).
Academic written English is the main focus of these corpora, as it's crucial for EAP students to
learn how to read academic texts effectively and write clearly and accurately according to the
standards of various academic disciplines. One prominent example is the British Academic
Written English Corpus (BAWE), which contains 6.5 million words of student writing from
undergraduate and postgraduate programs across arts, humanities, social sciences, life sciences,
and physical sciences. This corpus is valuable for researchers, though access for EAP teachers
might be limited.
Another significant corpus is the Michigan Corpus of Upper-Level Student Papers (MICUSP),
which also focuses on academic written English and will be made accessible for academic use.
Smaller EAP corpora, including PhD theses, have been compiled by individuals but aren't always
readily available.
The Academic Word List (AWL) is widely used for academic vocabulary learning. It's derived
from a corpus of about 3.5 million words across arts, commerce, law, and science disciplines.
This list is popular among researchers, materials writers, EAP practitioners, and students alike,
aiming to teach essential academic vocabulary. However, some scholars argue that its
applicability varies across different disciplines.
Constructing a corpus for academic vocabulary learning involves careful needs analysis.
According to McEnery and Wilson (1996), it's crucial to sample texts that accurately represent
the target language's characteristics and proportions. Sampling methods like random sampling,
stratified sampling, and purposive sampling are used to ensure the corpus is representative.
Corpus size also matters. A small corpus may suffice for exploring grammar, but a larger one is
necessary for generating frequency lists that accurately reflect the language patterns being
studied, especially for less common words and phrases.
In Section 5 of the text, the compilation of a corpus focused on science and engineering research
articles is described. It discusses various approaches to designing vocabulary teaching materials
based on this corpus, providing examples of such materials.
This overview highlights how different corpora serve the needs of academic vocabulary learning
and the importance of their construction and accessibility in educational contexts.
4. What vocabulary input do my teaching materials provide
1. Discrepancy in Language Usage:

o Studies comparing academic writing textbooks with corpus-based research
highlight significant differences. According to Harwood, textbooks often simplify
the diverse styles and language variations found in real academic writing. This
oversimplification can mislead students by not accurately reflecting the
complexity and specificity of academic language across disciplines.
2. Example of Thesis Structure:
o Paltridge's study examined the advice on organizing theses from handbooks and
compared it with actual thesis structures identified in a corpus. He found that
while handbooks cover some aspects of the research process, they lack detailed
information on specific chapter content and the range of thesis types found in real
academic contexts.
3. Role of EAP Teachers in Materials Development:
o The text suggests that selecting data for teaching materials should ideally be done
by EAP teachers who are proficient in analyzing and utilizing language corpora.
However, it acknowledges that only a few EAP teachers may have received
training in corpus analysis.
4. Creation of In-House Materials:
o In the absence of corpus data or suitable textbooks, teachers often resort to
creating in-house materials. This involves adapting authentic texts or content from
existing textbooks to cater to specific teaching needs. This approach, as described
by Samuda, involves iterative processes of redesigning, tweaking, and refining
materials based on classroom feedback and pedagogical considerations.
5. Evaluation and Improvement of Materials:
o Within EAP units, in-house materials designed for teaching academic reading and
writing undergo rigorous evaluation. They are piloted in classrooms, revised
based on feedback from students and educators, and continuously updated to
ensure they effectively meet the evolving needs of learners.
6. Comparison of Vocabulary:
7. Table 28.1 compares keywords from in-house EAP materials with non-academic texts
from the British National Corpus. It shows that while general academic words like
"academic" and "project" are common, discipline-specific vocabulary is lacking in the in-
house materials.
8. Conclusion on Suitability of In-House Materials
The comparison indicates that while in-house EAP materials cover general academic
vocabulary, they often miss the specialized terms needed for comprehensive academic
language proficiency. Materials derived from specialized corpora tailored to specific
academic disciplines would be more effective in teaching this specialized vocabulary.
Using a Corpus for Teaching Academic Vocabulary

Developing In-House Materials
Many English for Academic Purposes (EAP) units are creating their own materials for teaching
academic writing. However, these materials often don't address differences between various
academic fields. This section discusses a collection of science and engineering journal articles
and how it can be used to teach vocabulary and common phrases.
Using Journal Articles for Teaching
PhD students use academic journal articles for their research. These articles can also help teach
academic vocabulary. Universities provide free access to these articles, which are available
online.
Creating the Collection
A special collection was made for first-year PhD students in science and engineering at the
University of Nottingham. This collection helps create word lists and examples for teaching
vocabulary. It contains 11,624,741 words from different science and engineering fields.
List of Important Words
Table 28.2 shows a list of important words found in the science and engineering articles,
identified using WordSmith Tools 5.0. These words are commonly used in science and
engineering.
Choosing Words to Teach
Frequency (how often a word is used) is just one way to pick vocabulary to teach. Other factors
include:
Different meanings: How many meanings a word has.
Familiarity: How well-known the word is to students.
Availability: How easy it is to find the word in different contexts.
Difficulty: How hard the word is to learn and use.
Final Selection of Words
One of the authors selected fifty words from the keyword list that are useful for thesis writing.
This selection was based on experience and intuition. Table 28.3 shows this final list.
Table 28.2: Common Words in Science and Engineering
Producing Concordance Lines

Once a list of important words is chosen, the next step in creating learning materials is to make
concordance lines for students to study. Concordance lines show how words are used in real
sentences. The materials also include tasks that help students notice patterns and understand how
the words are used. These tasks have clear instructions and questions to guide students in
analyzing the data.
Data-Driven Learning (DDL)
Data-driven learning (DDL) means students learn grammar and vocabulary by looking at real
examples. This method helps students discover patterns in language use on their own, making
them more independent. They don't rely on the teacher to explain the rules (Tribble 1990;
Tribble and Jones 1990; Johns 1991a, 1991b).
Using Concordances in Teaching
Recent studies have shown that concordances can be used to create EAP (English for Academic
Purposes) and ESP (English for Specific Purposes) teaching materials. Thurstun and Candlin
(1998) used concordances with problem-solving activities to help students study academic
vocabulary independently. They found that seeing words in different contexts helped students
understand how words are used together. Nation (2001) also emphasized that encountering new
words multiple times helps students learn to use them in writing.
Criticisms and Benefits of Corpus-Based Material
Some people criticize using corpus-based materials in classrooms (Cook 1998; Widdowson
2000). However, many studies have found that concordances help identify language features that
are hard to notice otherwise (Hyland 1998, 2003; Thompson and Tribble 2001; Hoon and
Hirvela 2004).
Choosing Words for Concordances
We need to look closely at the list of 50 keywords in Table 28.3. From this list, a few words
were selected for concordance tasks to help PhD students study. The selected words were:
average
behaviour
consequently
higher
positive
presented
response
shown
study
These words are not specific to any one discipline, but they were chosen because:
a. They are frequently used across different disciplines. b. They are common in both the corpus
data and a sample text previously given to students. c. They are often found in the corpus data,
the sample text, and the Academic Word List (Coxhead 2000).
Since the students had no previous experience with concordances, using familiar words was
hoped to make the learning process easier and less intimidating.
Example Concordance Task
Figure 28.1 shows sample concordance lines for the word "average," along with a task to help
students notice patterns and understand how the word is used.
Sample task based on concordances of average
1. Frequency of Use:
o Objective: Determine if "average" is more often a noun or an adjective.
o Explanation: Students count instances of "average" as a noun vs. an adjective in
concordances to understand its common grammatical role in scientific texts.
2. Identifying Examples:
o Objective: Find specific examples of "average" used in different contexts.
o Examples:
 As the result of a sum divided by count.
 Referring to a varying but stable number or size.
 Describing a typical person or thing.
 Indicating a normal amount or quality for a group.
 Describing something as neither very good nor bad.
o Explanation: Students locate and highlight these usages to understand different
meanings of "average."
3. Article Usage:
o Objective: Analyze the common article usage (definite "the" vs. indefinite "a/an")
with "average" as a noun.
o Explanation: By examining contexts where "average" appears as a noun,
students identify if it usually follows "the" or "a/an," understanding its
grammatical patterns.
4. Prepositional Phrases:
o Objective: Identify instances where "average" is part of a prepositional phrase.
o Explanation: Students find phrases where "average" is near prepositions like
"of," "in," or "for" to learn its syntactic structure in academic writing.
5. Section in Journal Articles:
o Objective: Predict sections of journal articles likely to contain "average."
o Explanation: Students speculate on where "average" appears in journal articles
(e.g., results, discussion) based on writing conventions.
6. Progressive Challenge:
o Objective: Prepare for complex tasks by mastering the analysis of "average."
o Explanation: Completing this task builds foundational skills for analyzing more
complex language patterns in their academic fields.
Conclusion
This task enhances vocabulary acquisition and analytical skills in corpus-based learning.
Students explore academic vocabulary like "average," preparing them to use specialized
language effectively in their academic and professional work.

Chapter 28

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 28

Uploaded by

Copyright:

Available Formats

Chapter 28

What vocabulary is important for my learner?

Challenges in Vocabulary Research:

2. Vocabulary materials and formulaic language

 Using ready-made phrases is seen as mentally efficient (Ellis 2003).

Accepting the importance of formulaic sequences poses challenges for educators:

 They must identify which word combinations qualify as formulaic.

 Intuition: Recognizing sequences that analysts or informants perceive as formulaic.

In corpus linguistics, frequency of occurrence is commonly used as the main measure. It is

Yet, relying solely on frequency also has limitations:

 It may not reflect every speaker's language experience.

 Corpus Representativeness and Variation:

 Frequency vs. Formulaicity:

 Challenges in Vocabulary Development:

 Current State and Future Directions:

What type of corpus is suitable for academic vocabulary learning:

1. Discrepancy in Language Usage:

8. Conclusion on Suitability of In-House Materials

Using a Corpus for Teaching Academic Vocabulary

Producing Concordance Lines

You might also like