Professional Documents
Culture Documents
Teaching and Assessment of Grammar Handout
Teaching and Assessment of Grammar Handout
The examples above are specifically YES/NO INTERROGATIVES because they elicit a response that is either
yes or no.
WH- INTERROGATIVES, on the other hand, are introduced by a wh-word, and they elicit an open-ended
response:
What happened?
Where do you work?
Who won the Cup Final in 1997?
The four-sentence types exhibit different syntactic forms. For now, it is worth pointing out that there is
not necessarily a one-to-one relationship between the form of a sentence and its discourse function.
For instance, the following sentence has the declarative form:
Conversely, rhetorical questions have the form of an interrogative, but they are statements:
Coordination joins two independent clauses that contain related ideas of equal importance.
Original sentence: I spent my entire paycheck last week. I am staying home this weekend.
In their current form, these sentences contain two separate ideas that may or may not be related. Am I staying
home this week because I spent my paycheck, or is there another reason for my lack of enthusiasm to leave
the house? To indicate a relationship between the two ideas, we can use the coordinating conjunction so:
Revised sentence: I spent my entire paycheck last week, so I am staying home this weekend.
To help you remember the seven coordinating conjunctions, think of the acronym FANBOYS: for, and, nor,
but, or, yet, so. Remember that when you use coordinating conjunction in a sentence, a comma should
precede it.
Subordination joins two sentences with related ideas by merging them into the main clause (a complete
sentence) and a dependent clause (a construction that relies on the main clause to complete its meaning).
Coordination allows a writer to give equal weight to the two ideas that are being combined, and subordination
enables a writer to emphasize one idea over the other. Take a look at the following sentences:
Original sentences: Tracy stopped to help the injured man. She would be late for work.
To illustrate that these two ideas are related, we can rewrite them as a single sentence using the subordinating
conjunction even though.
Revised sentence: Even though Tracy would be late for work, she stopped to help the injured man.
Negative forms
Negation is the process that turns an affirmative statement into its opposite denial.
Verbs in English are negated by placing the word not after an auxiliary or modal.
Examples:
Short
Long negative forms negative
forms
do not don't
are aren't
not
is not isn't
did didn't
not
have haven't
not
had hadn't
not
should shouldn't
not
would wouldn't
not
will won't
not
1. Functional grammar puts together the patterns of the language and the things you can do with them.
Therefore, it is based on the relation between the structure of a language and the various functions that
the language performs. Functional grammar is all about language use. It is about communicative
grammar that learners can use in the typical situations that they find themselves in as they go about
their daily lives. Moreover, it‘s an approach in which grammar is not seen as a set of rules, but rather
as a communicative resource.
Grammar in the ESL Curriculum; Methods of Teaching Grammar; and Error Correction
Speech and writing are distinctly different manifestations of language. Language as speech is intuitive,
natural, dynamic, evanescent, and situated. Speech can be altered mid‐utterance to account for understanding
and processing by others. Language as writing is de‐situated, requiring time for planning, organizing,
composing, editing, and revising.
An example
Though/Although - both of these are correct grammatically, in the spoken form, though is six times
more frequent than although. Why is this? It's not enough to say 'it's shorter and easier to say' or 'it's only one
syllable'. We need to analyze it a little more.
Though is the 175th most commonly used word in British English and the 190th in American English.
(It should be noted that if a word is ranked 1 to 2000, it is very important - we can't do without it.)
There are two parts of spoken grammar:
● form (syntax)
● function (there will be functions of spoken grammar that aren't necessary for writing)
Though has two meanings - therefore, it is used more. It can be used at the very end of a sentence. So,
its greater frequency may be due to its flexibility. Using though at the end of a sentence is very rare in written
English. (Form)
Though is also much more commonly used to resume a conversation that has been interrupted.
(Function)
We don't notice what we say in the same way as we do when we write. Spoken grammar is flexible in
its word order. This is good news for language learners. Spoken grammar is much less strict than
written.
Let's look at the word know as an example. Know is the 14th most commonly used word in spoken
British English and the 22nd in American English. Know is a transitive verb and most of its uses in
writing have an object. Conversely, most of its uses in speech have no object. Its most common use is
in the expression, 'You know'. A similar situation arises with the verbs, 'see' and 'mean'.
In spoken language, we have common knowledge - gauging what the other person understands,
sharing a common view. So, we constantly use checking phrases like 'Do you see?' or 'You know
what I mean'.
Another illustration is absolutely which appears four times more frequently in spoken than in written
English. In spoken English, absolutely is used as 'yes-plus' - as a stand-alone sentence. It is used as
an engaged yes or an interested yes - it turns you into an active listener rather than a robotic one. It
can also be used in the negative - absolutely not - increasing its frequency considerably.
Spoken grammar also has 'response tokens' not used in written grammar - wonderful, certainly,
great, definitely, etc. These are very important to effective oral communication.
In conversation, people have no difficulty understanding such things as: 'His cousin in London, her
boyfriend, his parents, bought him a car for his birthday.' Such constructions with multiple subjects and
lots of different noun phrases are not found in writing. When we write them down, they look strange,
but in speaking, they sound fine.
Diagramming Sentences
● Grammatical structures were very carefully sequenced from basic to more complex (based on
linguistic description) and vocabulary was strictly limited in the early stages of learning.
● Consonant with the then-current behavioral school of psychology, audiolingual proponents assumed
that language learning was habit formation and overlearning; thus mimicry of forms and
memorization of certain sentence patterns were used extensively to present rules inductively.
● A variety of manipulative drill types was practiced to minimize (or prevent altogether) learners' errors,
which were viewed as bad habits that would be hard to break if they become established.
● Teachers were told that they should correct all errors that they were not able to prevent. The focus on
instruction rarely moved beyond the sentence level.
The cognitive code approach (Jakobovits, 1968, 1970), largely a reaction to the behaviorist features of
audiolingualism, was influenced by the works of linguists like Chomsky (1959) and psycholinguists like Miller
(1973).
● Language learning was viewed as hypothesis formation and rule acquisition, rather than habit
formation.
● Grammar was considered important, and rules were presented either deductively or inductively
depending on the preferences of the learners.
● Errors were viewed as inevitable by-products of language learning and as something that the
teacher and learning could use constructively in the learning process.
● Error analysis and correction were seen as appropriate classroom activities with the teacher facilitating
peer and self-correction as much as possible.
● The source of errors was seen not only as a transfer from the first language but also as normal
language development.
● The focus was still largely sentence-oriented, and material writers often drew on Chomsky‘s early
work in generative grammar (1957, 1965).
The comprehension approach (Wintz, 1981) represents attempts by many language methodologists working
in the U.S. during the 1970s and 1980s to recreate the first language acquisition experience for the second/
foreign language learner.
● Comprehension is primary and that it should thus precede any production epitomizes this
approach; a pedagogical offshoot is a view that comprehension can best be taught initially by delaying
production in the target language while encouraging the learner to use meaningful nonverbal responses
to demonstrate comprehension.
● Some practitioners of this approach carefully sequence grammatical structures and lexical items in
their instructional programs (Asher, 1977; Winitz, no date); they thus present grammar inductively.
● Others propose that a semantically based syllabus be followed instead and that all grammar
instruction be excluded from the classroom since they feel that it does not facilitate language
acquisition; at best it merely helps learners to monitor or become aware of the forms they use.
● Proponents of this approach believe that error correction is unnecessary, perhaps even
counterproductive, since they feel that errors will gradually self-correct as learners are exposed to
ever more complex, rich and meaningful input in the target language
The communicative approach, which came to the fore in the mid-1970s, originates in the work of
anthropological linguists in the U.S. (Hymes, 1972) and functional linguists in Britain (Halliday, 1973), all of
whom view language as an instrument of communication.
● Those who have applied this philosophy of language teaching, claiming that communication is the
goal of second or foreign language instruction and that the syllabus of a language should not be
organized around grammar but around the subject matter, tasks/projects, or semantic notions and/or
pragmatic functions. In other words, language instruction should be content-based, meaningful,
contextualized, and discourse-based (rather than sentence-based).
● The teacher‘s role is primarily to facilitate language use and communication; it is only secondarily to
provide feedback and correct learner errors.
The inductive approach represents a different style of teaching where the new grammatical structures
or rules are presented to the students in a real language context (Goner, Phillips, and Walters 135).
The students learn the use of the structure through the practice of the language in context, and later
realize the rules from the practical examples. For example, if the structure to be presented is the
comparative form, the teacher would begin the lesson by drawing a figure on the board and saying,
"This is Jim. He is tall." Then, the teacher would draw another taller figure next to the first saying, "This
is Bill. He is taller than Jim." The teacher would then provide many examples using students and items
from the classroom, famous people, or anything within the normal daily life of the students, to create an
understanding of the use of the structure. The students repeat after the teacher, after each of the
different examples, and eventually practice the structures meaningfully in groups or pairs. (Goner,
Phillips, and Walters 135-136) With this approach, the teacher's role is to provide meaningful contexts
to encourage demonstration of the rule, while the students evolve the rules from the examples of its use
and continued practice (Rivers and Temperley 110).
This approach starts with some examples from which a rule is inferred. In grammar teaching, teachers
present the examples at the beginning then generalizing rules from the given samples. The inductive
approach is often correlated with the Direct Method and Natural Approach in English teaching. In both
methods, grammar is presented in such a way the learners experience it. "In the Direct method,
therefore, the rules of the language are supposedly acquired out of the experience of the understanding
and repeating examples which have been systematically graded for difficulty and put into a clear
context." (Thornburry16, 2002). Here is an example of teaching present continuous tense by using
the Direct Method.
The Use of Scripted Dialogues, Authentic Texts, Dictoglosses, and Genre Analysis
to Teach L2 Grammar
Sources of Texts
There are at least two implications to a text-level view of language. The first is that if learners are going to
be able to make sense of grammar, they will need to be exposed to it in its contexts of use, and at the very
least, this means in texts.
Secondly, if learners are to achieve a functional command of a second language, they will need to be able
to understand and produce not just isolated sentences, but whole texts in that language. But a text-based
approach to grammar is not without its problems. These problems relate principally to the choice of texts.
There are at least four possible sources of texts: the course book; authentic sources such as newspapers,
songs, literary texts, the Internet, etc; the teacher; and the students themselves.
A. Scripted Dialogues
The first rule in using a text (scripted dialogue, to be specific) for the introduction of a new grammatical
form is that the students understand the text. In most cases, the teacher chooses a text that she estimates is
within the students' range. At low levels, this will usually mean a scripted text, i.e. one that has been specially
written with learners in mind. Teachers should also choose a text with a high frequency of instances of the
targeted grammar item. This will help learners notice the new item and may lead them to work out the rules by
induction (inductively).
But simply giving the students the chosen text is no guarantee that they will understand it. The teacher
needs to apply a series of steps. Steps 1 to 3 are the checking stage, during which the teacher guides the
learners to a clearer understanding of the general gist of the text through a carefully staged series of tasks.
From Step 4 onwards, she prepares students to home in on the target language: the instances in the text
where the grammatical items are presented numerous times. From Steps 1 to 5, each successive listening to
the conversation requires learners to attend more and more closely to form. As a rule of thumb, listening
tasks should generally move from a meaning-focus to a form-focus.
Having isolated and highlighted the structure in Steps 5 and 6, she then set tasks that require learners to
demonstrate their understanding of both the form and the meaning of the new item. At this production stage,
the progression is from form-focus to meaning-focus. It is as if, having taken the language item out of its
natural habitat (its context), the sooner it gets put back into a context, the better.
B. Authentic Texts
A teacher should choose a text which is both authentic and rich in examples of the grammatical
item that needs to be taught. Because it is authentic rather than simplified, the teacher has to work a little
harder to make it comprehensible, but, for the sake of presenting language in its context of use, this is an effort
that is arguably worth making. As pointed out earlier, authentic texts offer learners examples of real language
use, undistorted by the heavy hand of the grammarian.
In Steps 1 and 2, the teacher aims to achieve a minimum level of understanding, without which any
discussion of the targeted language would be pointless. As in the example for scripted dialogues, the shift of
focus is from meaning to form, and it is in Step 3 that the shift is engineered. But even while the focus is on the
form, the teacher is quick to remind students how and why it is used. To consolidate this relationship between
form and use, he directs them back to the text (Step 4), which they use as a resource to expand their
understanding of the passive. Note that there are one or two slipper examples in the text: is, for example, the
wounds had become infected an example of the passive? Strictly speaking, it is passive in meaning but not in
form. Is Jessica is self-employed passive? This looks like a passive, but here self-employed is being used as
an adjective. It is often the case that authentic materials throw up examples that resist neat categorization. The
teacher‘s choices here include: a) removing these from the text, or rephrasing them; b) explaining why there
are exceptions; c) enlisting a more general rule that covers all these uses. Most experienced teachers would
probably opt for plan b, in this instance.
Step 5 tests the ability of learners to produce the appropriate forms in context. The teacher has chosen a
writing task rather than a speaking one, partly because the passive is not used in spoken English to the extent
that it is in written English, but also because a writing exercise allows learners more thinking time, important
when meeting relatively complex structures such as the passive. They then have a chance to personalize the
theme through a speaking and writing activity (Step 6): the writing also serves as a way of testing whether the
lesson‘s linguistic aim has been achieved.
C. Dictogloss
Digtogloss was formulated by Wajnryb in 1990 to emphasize grammar, it involves students in listening
to a short text read at normal speed then reconstructing as well as paraphrasing or interpreting (the
„gloss‟-part) the text. According to Wajnryb (1990), the task focuses not only on learning in a whole class
setting (on learner output) but also on learner interaction. In implementing the Dictogloss technique, teachers
easily fit the stages of Dictogloss tasks creatively into students‟ needs. In the different stages of Dictogloss,
learners may be involved in listening, remembering, and/or writing. In this research, the writer explores the
reconstructing stages of the task.
Wajnryb (1990) has stated that Dictogloss is a recent technique in language teaching which takes a little
step after the dictation technique (hence part of its name), which consists of asking learners to reconstruct a
dictated text and to capture as much as possible of the information content accurately and in an acceptable
linguistic form. Wajnryb (1990) has added that with this technique, students get a more precise understanding
of the grammar items than in any other technique, and compared to other traditional approaches, this
technique uses both the negotiation of meaning and form. There are co-operative strategies in the technique
that lead the learners to stay active and engaged in the learning processes. Small (2003, p. 57) define
Dictogloss as an activity in which short pieces of language are read out at normal speed to students. Similarly,
Cross (2002, p. 17) has declared that Dictogloss is known as grammar dictation or as a task-based,
communicative
teaching procedure.
Swain and Lapkin (1998) in extensive research on learning outcomes in a French immersion program
found that Dictogloss was effective in helping students internalize their linguistic knowledge by making them
aware of language form and function. As others have said, Dictogloss encourages beneficial interaction during
collaborative tasks by providing explicit information about grammatical forms before learners carry out the
tasks, training learners to notice and repair their language errors, and modeling how learners interact with each
other.
D. Genre Analysis
Language is context-sensitive. To understand language, we need to have some knowledge of its context.
Context can also determine the kind of language that is used. For example, a request for a loan will be worded
differently if it is made to a friend rather than to a bank manager. The study of ways in which social
contexts impact language choices is called genre analysis. A genre is a type of text whose overall
structure and whose grammatical and lexical features have been determined by the contexts in which it is
used, and which over time have become institutionalized.
An example of using genre analysis in grammar instruction is through the Internet news bulletin – to teach
ways in which news is reported. A genre is a text type whose features have become conventionalized over
time. A sports commentary, an e-mail message, a political speech, and an Internet news bulletin are all
examples of different genres. Instances of a genre share common characteristics, such as their overall
organization, degree of formality, and grammatical features. These characteristics distinguish them from other
genres. A genre analysis approach not only respects the integrity of the whole text but regards the feature of a
text as being directly influenced by its communicative function and its context of use. Thus, the way the text is
organized, and the way choices are made at the level of grammar and vocabulary, will be determined by such
factors as the relationship between speaker and listener (or reader and writer), the topic, the medium (e.g.
spoken or written), and the purpose of the exchange. For example, the ‗monkey‘ text would take a different
form if it were a phone conversation between the owners and the police, reporting the incident.
The genre analysis approach implies that grammatical choices are not arbitrary but are dependent on
higher-order decisions, e.g. the kind of text, the audience, the topic. Whereas traditionally it has been the
custom to teach grammar independently of its contexts of use, a genre analysis approach sees grammar as
subservient to the text and its social function. It is, therefore, best taught and practiced as just another feature
of that kind of text.
There are three broad types of form-focused instruction as shown in the table below:
Alternatively, focus on form can be incidental, where attention to form in the context of a communicative
activity is not predetermined but rather occurs in accordance with the participants‘ linguistic needs as the
activity proceeds. In this approach, it is likely that attention will be given to a wide variety of grammatical
structures during any one task and thus will be extensive. Focus on form implies no separate grammar lessons
but rather grammar teaching integrated into a curriculum consisting of communicative tasks.
There is considerable theoretical disagreement regarding which of these types of instruction is most
effective in developing implicit knowledge of grammar. Long (1988) and Doughty (2001) have argued strongly
that focus on form is best equipped to promote interlanguage development because the acquisition of implicit
knowledge occurs as a result of learners attending to linguistic form at the same time they are engaged with
understanding and producing meaningful messages.
In short, focus on form instruction is a type of instruction that, on the one hand, holds up the importance of
communicative language teaching principles such as authentic communication and student-centeredness, and,
on the other hand, maintains the value of the occasional and overt study of problematic L2 grammatical forms,
which is more reminiscent of non-communicative teaching (Long, 1991). Furthermore, Long and Robinson
(1998) argue that the responsibility of helping learners attend to and understand problematic L2 grammatical
forms falls not only on their teachers but also on their peers. In other words, Long (1991) and Long and
Robinson (1998) claim that formal L2 instruction should give most of its attention to exposing students to oral
and written discourse that mirrors real-life, such as doing job interviews, writing a letter to friends, and
engaging in classroom debates; nonetheless, when it is observed that learners are experiencing difficulties in
the comprehension and/or production of certain L2 grammatical forms, teachers and their peers are obligated
to assist them to notice their erroneous use and/or comprehension of these forms and supply them with the
proper explanations and models of them. Moreover, teachers can help their students and learners can help
their peers notice the forms that they currently lack, yet should know to further their overall L2 grammatical
development.
In this section, only three out of the eleven teaching techniques of focus on the form will be introduced. Figure
3 below indicates the degree of obtrusiveness of each technique (Doughty and Williams 258). Obtrusiveness,
in this case, means that grammar structures are presented explicitly by using metalinguistic terms (see Fig. 3).
Figure 3 shows that the most implicit technique is the Input flood, whereas the most explicit technique is the
Garden path.
Unobtrusive Obtrusive
Input flood X
Task-essential language X
Input enhancement X
Negotiation X
Recast X
Output enhancement X
Interaction enhancement X
Dictogloss X
Consciousness-raising X
tasks X
Input procession X
Garden path
Technique 2: Recast
Recast is similar to a child's L1 acquisition wherein the mother frequently recasts her child‘s incorrect
utterance and presents a correct one (Long and Robinson 25). Ellis defines recast as ―reformulations of
deviant learner utterances‖ (―Investigating Form-Focused‖10). In other words, a teacher reformulates a
learner‘s incorrect form into a correct form indirectly, not saying that their utterance was wrong. Through this
recast, learners are more likely to notice the gap between their incomplete inter-language and fluent L2 (Long
and Robinson). Catherine Doughty and Elizabeth Varela also explain that recast is "potentially effective since
the aim is to add attention to form to a primarily communicative task rather than to depart from an already
communicative goal to discuss a linguistic feature‖.
Example:
STUDENT: And they found out the one woman run away.
TEACHER: OK, the woman was running away. [Recast]
STUDENT: Running away
Exception
Beautiful the *beautifulest Point out the error here
TOPIC 6: Practicing Grammar via Drills, Information Gap Activities, and Personalization Tasks
―Grammar analysis [i.e., teaching labels for grammar concepts, dissecting sentences] and detecting errors
for isolated sentences do not seem to be beneficial‖ (Eisenberg, 2007).
How Do I Do This?
3. Combining Sentences
Another approach is to provide students with two or more sentences and prompt them to create a single,
longer sentence (Strong, 1986). There are two types:
Cued Combining: The teacher underlines components to be combined and/or gives students to use
(e.g., conjunctions).
Example: I sometimes wonder SOMETHING. Superheroes do exist. (WHETHER) –> I sometimes wonder
whether superheroes do exist.
Open Combining: The teacher doesn‘t give specific instructions and allows the student to creatively
combine the sentence.
Example: I like to eat cereal. I watch TV. –> I like to eat cereal before I watch TV.
Students can also be prompted to expand sentences (Gould, 2001). The therapist gives the student a simple
sentence to start with and has the student build the sentence by increasing the length and complexity.
Example: I saw a monkey. –> I saw a silly monkey eating bananas at the zoo.
I. Teaching Grammar: Information Gap Activities
Teachers are often searching for activities to make their classroom more
interactive; language teachers in particular are also looking for activities that
promote target language use. Info Gap activities are excellent activities as
they force the students to ask each other questions; these activities help
make the language classroom experience more meaningful and authentic.
This section will explain in more detail what Info Gap activities are and why
they are useful; it will also give some examples of Info Gap activities for any
language classroom.
1. 20 questions
One student thinks of an item or object. The other students must ask questions to figure out what item the
student is thinking of. The questions should be "yes" or "no" questions. If the students can't guess the item
within 20 questions, the student who's thinking of the item wins the game.
If you do this as a whole class, make sure you keep track of how many questions have been asked. In small
groups or pairs, the student thinking of the item should keep a tally of how many questions are asked.
Example:
Student A: Okay. Go!
Student B: Is it alive?
Student A: No.
Student C: Is it bigger than my desk?
Student A: Yes.
Student D: Is it…?
2. Draw This
4. Job Interview
Accuracy
To achieve accuracy, the learner needs to devote some attention to form, i.e. to 'getting it right'.
Attention is a limited commodity, and speaking in a second language is a very demanding skill.
Learners have only limited attentional resources, and it is often difficult for them to focus on form and
meaning at the same time. There is inevitably some trade-off between the two. So, for learners to be
able to devote time and attention to form, it helps if they are not worrying too much about meaning.
That suggests that practice activities focused on accuracy might work best if learners are already
familiar with the meanings they are expressing. This, in turn, suggests that expecting learners to be
accurate with newly presented grammar is a tall order. It may be the case that accuracy practice should
come later in the process, when learners have been thoroughly familiarized with the new material
through, for example, reading and listening tasks.
As we said, accuracy requires attention. Attention needs time. Research suggests that learners are
more accurate the more time they have available. They can use this time to plan, monitor, and fine-tune
their output. Therefore, rushing students through accurate practice activities may be counterproductive.
Classroom activities traditionally associated with accuracy, such as drilling, may not help accuracy that
much, especially where learners are being drilled in newly presented material.
Finally, learners need to value accuracy. That is, they need to see that without it, they risk being
unintelligible. This means that they need unambiguous feedback when they make mistakes that
threaten intelligibility. By correcting learners' errors, teachers not only provide this feedback, but they
convey the message that accuracy is important. Knowing they are being carefully monitored often helps
learners pay more attention to form.
To summarize, then, a practice activity that is good for improving accuracy will have these characteristics:
Attention to form: the practice activity should motivate learners to want to be accurate, and they
should not be so focused on what they are saying that they have no left-over attention to allocate to
how they are saying it.
Familiarity: learners need to be familiar with the language that they are trying to get right.
Thinking time: monitoring for accuracy is easier and therefore more successful if there is sufficient
time available to think and reflect.
Feedback: learners need unambiguous messages as to how accurate they are--this traditionally takes
the form of correction.
Fluency
Fluency is a skill: it is the ability to process language speedily and easily. Fluency develops as the
learner learns to automize knowledge. One way they do this is to use pre-assembled chunks of
language. Chunks may be picked up as single units, in much the same way as individual words are
learned. Common expressions like What's the matter? and D'you know what I mean? are typically
learned as chunks. Chunks may also be acquired when utterances are first assembled according to
grammar rules, and then later automized. Fluency activities are aimed at this process of automatization.
Too much attention to form may jeopardize fluency. So practice activities aimed at developing fluency
need to divert attention away from form. One way of doing this is to design practice tasks where the
focus is primarily on meaning. By requiring learners to focus on what they are saying, less attention is
available to dwell on how they are saying it. In this way, the conditions for automatization are created.
One way of engineering a focus on meaning is through the use of information gap tasks. Real
communication is motivated by the need to bridge gaps: I need to know something -- you have the
information -- I ask you and you tell me. In information gap tasks the production of language is
motivated by a communicative purpose, rather than by the need to display grammar knowledge for its
own sake. A communicative purpose might be: to find something out or to get someone to do
something, or to offer to do something. It follows that the exchange is a reciprocal one -- there is as
much a need to listen as there is to speak. This, in turn, means that speakers have to be mutually
intelligible (not always a condition in drill-type activities). Furthermore, there is an element of the
unpredictable involved -- what if you don't have the answer I am looking for, or you refuse my request,
or you reject my offer?
To summarize: where fluency is the goal, practice activities should have these characteristics:
Attention to meaning: the practice activity should encourage learners to pay attention less to the form
of what they are saying (which may slow them down) and more to the meaning.
Authenticity: the activity should attempt to simulate the psychological conditions of real-life language
use. That is, the learners should be producing and interpreting language under real-time constraints,
and with a measure of unpredictability.
Communicative purpose: to help meet these last two conditions, the activity should have a
communicative purpose. That is, there should be a built-in need to interact.
Chunking: at least some of the language the learners are practicing should be in the form of short
memorizable chunks which can be automized.
Repetition: for automatization to occur, the practice activity should have an element of built-in
repetition, so that learners should produce a high volume of the targeted forms.
Restructuring
Restructuring involves integrating new information into old. Traditionally, restructuring was meant to
happen at the presentation stage. That is, learners were expected to learn a new rule, and straightaway
incorporates it into their 'mental grammar'. More recently there has been some skepticism as to whether this
happens. There is a growing belief that restructuring can occur during practice activities. One school of thought
argues that communicative activities (such as information gap tasks) provide a fertile site for restructuring. This
is because such activities problematize learning: what if you don't understand my question, or I don't
understand your answer? This communication breakdown forces the learner to take stock and re-think. In turn,
it offers the potential for negotiation. Negotiation of meaning -- the collaborative work done to make the
message comprehensible -- is thought to trigger restructuring. Some early proponents of the communicative
approach considered that this was all that was necessary for language acquisition to take place. Restructuring
is sometimes experienced by learners as a kind of flash of understanding, but more often, and less
dramatically, it is the dawning realization that they have moved up another notch in terms of their command of
the language.
Problematizing: having to deal with a problem often seems to trigger restructuring. For example, when
learners are put in a situation where the message they are trying to convey is misinterpreted, they may
be forced to reassess their grasp of a rule. Moreover, the input they get as they negotiate the meaning
of what they are trying to express may also help reorganize the state of their mental grammar.
Push: the activity should push learners to 'out-perform their competence' – that is, to produce or
understand language that is a notch more complex than they would normally produce or understand.
Scaffolding: there should be sufficient support (or scaffolding) to provide the security to take risks with
the language. This means the practice activity should try to balance the new with the familiar.
Scaffolding could, for example, take the form of telling a familiar story but from a different perspective.
Teachers often provide students with scaffolding in the way they interact with them, repeating,
rephrasing, or expanding what they are saying to carry on a conversation.
Language learners make mistakes. This seems to happen regardless of the teacher's skills and
perseverance. It seems to be an inevitable part of learning a language. Most teachers believe that
ignoring these mistakes might put at risk the learner's linguistic development. Current research tends to
support this view. Not to ignore mistakes, however, often means having to make several on-the-sport
decisions. These can be summed up in the form of the 'in-flight' questions a teacher might ask when
faced with a student's possible error:
The Sunday night past, the doorbell rangs, I opened the door and I had a big surprise, my brother was
stopping in the door. He was changing a lot of. He was having a long hair but him looking was very
interesting. Now, he's twenty five years, and he's lower. We speaked all night and we remembered a lot
of thinks. At last when I went to the bed was the four o'clock.
While it is clear that text is non-standard (by native speaker standards) it is not always an easy task to
identify the individual errors themselves. Take for example, I had a big surprise. At first sight, there
seems to be nothing wrong with this. It is a grammatically well-formed sentence--that is, the words are
in the right order, the tense is correct, and the subject and verb agree. Moreover, the meaning is clear
unambiguous. But would a native speaker ever say it? According to corpus evidence (that is,
databases of spoken and written texts), something can be a big surprise, a person can be in for a big
surprise, you can have a big surprise for someone, but instances of I had a big surprise simply do not
exist. Should we conclude, therefore, that it is wrong? The answer is yes if we imagine a scale a
'wrongness' ranging from 'completely wrong' to 'this is OK, but a native speaker would never say it'.
However, no corpus is big enough to include all possible sentences, and, at the same time, new ways
of saying things are being constantly invented. This is a case, therefore, when the teacher has to use
considerable discretion.
Once an error has been identified, the next step is to classify it. Learners may make mistakes at the
level of individual words, in the way they put sentences together, or at the level of whole texts. At the
word level, learners make mistakes either because they have chosen the wrong word for the meaning
they want to express (My brother was stopping in the door instead of standing), or they have chosen
the wrong form of the word (lower instead of lawyer, thinks instead of things). These are lexical errors.
Lexical errors also include mistakes in the way words are combined: the Sunday night past instead of
last Sunday night. Grammar errors, on the other hand, cover such things as mistakes in verb form and
tense (the doorbell rangs, we speaked), and in sentence structure: was the four o'clock, where the
subject of the clause (of) has been left out. There is also a category of errors called discourse errors
which relate to the way sentences are organized and linked to make whole texts. For example, the
student extract above, at last, suggests that what follows is the solution to a problem: eventually would
have been better in this context.
Responding to errors
What options has the teacher got when faced with a student's error? Let's imagine that, in the course of a
classroom activity, a student has been describing a person's appearance and said: He has a long hair.
Here are some possible responses that the teacher might consider:
1. No. This is negative feedback, but it offers the students no clue as to what was wrong. The teacher
may be assuming that the student has simply made a slip under pressure and that this does not,
therefore, be able to self-correct. There are, of course, other ways of signaling that a mistake has been
made without having to say No. A facial expression, shake of the head, etc, might work just as well.
Some teachers try to soften the negative force of no by, for example, making an mmmm noise to
indicate: Well, that's not entirely correct but thanks anyway. Unfortunately, this may leave the student
wondering Have I made a mistake or haven't I?
2. He has long hair. This is a correction in the strictest sense of the word. The teacher simply repairs the
student's utterance--perhaps in the interest of maintaining the flow of the talk, but at the same time,
reminding the learner not to focus only on meaning at the expense of form.
3. No article. The teacher's move is directed at pinpointing the kind of error the student has made to
prompt self-correction, or, if that fails, peer-correction-- when learners correct each other. This is
where metalanguage (the use of grammatical terminology) comes in handy: words like article,
preposition, verb, tense, etc. provide an economical means of giving feedback -- assuming, of course,
that students are already familiar with these terms.
4. No. Anyone? An unambiguous feedback signal plus an invitation for peer correction. By excluding
the option of self-correction, however, the teacher risks humiliating the original student: perhaps the
teacher knows the student well enough to rule out self-correction for this error.
5. He has...? In other words, the teacher is replaying the student's utterance up to the point where the
error occurred, intending to isolate the error as a clue for self-correction. This technique can be
reinforced by finger-coding, where the teacher marks out each word on her fingers, indicating with her
fingers the part of the phrase or sentence that needs repair.
6. He has a long hair? Another common teacher strategy is to echo the mistake but with a quizzical
intonation. This is perhaps less threatening than saying No, but often learners fail to interpret this as
an invitation to self-correct, and think that the teacher is simply questioning the truth of what you have
just said. They might then respond Yes, he has a very long hair. Down to here.
7. I'm sorry, I didn't understand? Variations on this response include Sorry? He what? Excuse me? Etc.
These are known as clarification responses and, of course, occur frequently in real conversation. As
a correction device, they signal to the student that the meaning of their message is unclear, suggesting
that it may have been distorted due to some problem of form. It is therefore a friendlier way of signaling
a mistake. Research suggests that when learners re-cast their message after receiving a clarification
request, it usually tends to improve, despite their not having been told explicitly that a mistake has been
made, much less what kind of mistake it was. This suggests that the policy of 'acting a bit thick' (on the
part of the teacher) might have positive dividends in terms of self-correction.
8. Just one? Like this? (draws bald man with one long hair) Ha ha... The teacher has pretended to
interpret the student's utterance literally, to show the student, the unintended effect of the error, on the
principle that, once the student appreciates the difference between he has long hair and he has a long
hair, he will be less likely to make the same mistake again. This is possible only with those mistakes
which do make a difference in meaning -- such as he's lower in the text we started with. There is, of
course, the danger of humiliating the student, but, if handled sensitively, this kind of feedback can be
extremely effective.
9. A long hair is just one single hair, like you find in your soup. For the hair on your head, you wouldn't use
an article: He has long hair. The teacher uses the error to make an impromptu teaching point. This is an
example of reactive teaching, where instruction is in response to students‘ errors rather than trying to
pre-empt them. Of course, if the teacher were to do this at every mistake, the classes would not only
become very teacher-centered, but the students might become reluctant to open their mouths.
10. Oh, he has long hair, has he? This technique (sometimes called reformulation) is an example of
covert feedback, disguised as a conversational aside. The hope is, that the student will take the veiled
correction on board but will not be inhibited from continuing the flow of talk. Typically, this is the way
parents seem to connect their children – by offering an expanded version of the child‘s utterance:
Child: Teddy hat.
Mother: Yes, Teddy‘s got a hat on, hasn‘t he?
Some theorists argue that these expansions and reformulations help provide a temporary scaffold for
the child‘s developing language competence. The problem is that learners may simply not recognize
the intention nor notice the difference between their utterance and the teacher‘s reformulation.
11. Good. Strange as this seems, it is a very common way that teachers provide feedback on student
production, especially in activities where the focus is more on meaning than on form. For example, it
is not difficult to imagine a sequence like this:
Teacher: What does Mick Jagger look like?
Student: He has a long hair.
Teacher: Good. Anything else.
Student: He has a big lips.
Teacher: Good
Etc.
The intention behind goof (or any of its alternatives, such as OK) is to acknowledge the students‘
contribution, irrespective of either its accuracy or even of its meaning. But, if construed as positive
feedback, it may lull learners into a false sense of security, and, worse, initiate the process of
fossilization.
To sum up, then, the following categories of errors have been identified:
lexical errors
grammar errors
discourse errors and, in the case of spoken language:
pronunciation errors
It is not always the case that errors fall neatly into the above categories, and there is often considerable
overlap between these categories.
Identifying the cause of an error can be equally problematic. Speakers of Spanish may recognize, in the
above text, the influence of the writer's first language (his L1) on his second language (his L2). For
example, the lack of the indefinite article in he's lower (for he's a lawyer) suggests that the learner has
borrowed the Spanish construction (es abogado) in which the indefinite article (un) is not used. Such
instances of L1 influence on L2 production are examples of transfer. They do not necessarily result in
errors -- there is such a thing as a positive transfer. He's lower is an example of negative transfer or
what was once called L1 interference.
The case of rangs, however, cannot be accounted for by reference to the learner's L1. Nor can
speaked. Both errors derive from over-applying (or overgeneralizing) an L2 rule. In the case of rangs,
the learner has overgeneralized the third person -s rule in the present (he rings) and applied it to the
past. In the case of speaked he has overgeneralized the past tense -ed ending. Such errors seem to be
influenced not by factors external to the second language such as the learner's first language but by the
nature of the second language itself. They suggest that the learner is working according to L2 rules and
this is evidence that a process of hypothesis formation and testing is underway. These developmental
errors are not dissimilar to the kinds of errors children make when they are learning their mother
tongue:
He go to sleep.
Are dogs can wiggle their tails?
Daddy broked it.
These two kinds of errors -- transfer and developmental -- account for the bulk of errors learners make.
Such errors can range from the fairly hit-and-miss (him looking was very interesting) to errors that seem
to show evidence of a rule being fairly systematically (but not yet accurately) applied. Thus: my brother
was stopping, he was changing, he was having a long hair. These are all examples of a verb form (past
continuous) being overused, but in a systematic way. It is as if the learner had formed a rule to the
effect that, 'when talking about past states--as opposed to events--use was + -ing'.
It is probably these systematic errors, rather than the random ones, that best respond to correction.
Correction can provide the feedback the learner needs to help confirm or reject a hypothesis, or to
tighten the application of a rule that is being applied fairly loosely. Of course, it is not always clear
whether an error is the product of random processes or the product of a developing but inexact system.
Nor is it always clear how inexact this system is. For example, it may be the case that the learner
knows the right rule but, in the heat of the moment, has failed to apply it. One way of testing this is to
see whether the learner can self-correct: could the writer of the text above change speaked to spoke,
for example, if told that speaked was wrong? If so, this suggests that the rule is both systematic and
correctly formulated in the learner's mind, but it hasn't yet become automatic.
The next issue to address is the question of priorities. Which errors matter, and which don't? This is
rather subjective: some errors are likely to distract or even irritate the reader or listener while others go
largely unnoticed. For example, speakers of languages in which nouns are distinguished by gender
(e.g. un banane, une pomme) frequently say they are irritated by gender mistakes such as une banane.
A fairer, but still fairly subjective, criterion might be the one of intelligibility: to what extent does the error
interfere with, or distort, the speaker's (or writer's) message? In the text above it is difficult, even
impossible, to recover the meaning of lower (for lawyer) from the context. On the other hand, the
doorbell rangs is fairly unproblematic. It may cause a momentary hiccup in communication, but it is not
severe enough to threaten it.
It should be apparent by now that there are many complex decisions that teachers have to make when
monitoring learner production. It is not surprising that the way they respond to error tends to be more
often than intuitive than consciously considered. But before addressing the question as to how to
respond, it may pay to look briefly at teachers' and students' attitudes to error and correction.
Few people like being wrong and yet there seems to be no way of learning a language without being
wrong a lot of the time. Not many people like being corrected either, yet to leave mistakes uncorrected
flies in the face of the intuitions and expectations of teachers and students alike. This accounts for
some of the problems associated with error and correction.
Attitudes to error run deep and lie at the heart of teachers' intuitions about language learning. Many
people still believe that errors are contagious and that learners are at risk of catching the errors other
learners make. It is often this fear of error infection that underlies many students' dislike of pair and
group work. On the other hand, many teachers believe that correcting errors is a form of interference,
especially in fluency activities. Some teachers go further and argue that correction of any sort creates a
judgmental--and therefore stressful--classroom atmosphere, and should be avoided altogether.
These different attitudes find an echo in the shifts of thinking that have taken place amongst
researchers and materials writers. Recent thinking sees errors as being evidence of developmental
processes rather than the result of bad habit formation. This sea change in attitudes is well captured in
the introductions to ELT coursebooks. Here is a selection:
'The student should be trained to learn by making as few mistakes as possible... He must be trained to
adopt correct learning habits right from the start.' (from First Things First by L. Alexander)
'Provided students communicate effectively, they should not be given a sense of failure because they
make mistakes.'
(from The Cambridge English Course, 1, Teacher's Book by Swan and Walter)
'Don't expect learners to go straight from ignorance to knowledge. Learning takes time and is not
achieved in one go. Be prepared to accept partial learning as an important stage on the way to full learning.'
(from Project English 2, Teacher's Book by Hutchinson)
'Making mistakes is an important and positive part of learning a language. Only by experimenting with
the language and receiving feedback can students begin to work out how the language works.'
(from Blueprint Intermediate, Teacher's Book by Abbs and Freebairn)
Certainly, the current methodology is much more tolerant of error. But the tide may be turning yet again.
Studies of learners whose language development has fossilized -- that is, has stopped at a point well
short of the target -- suggest that the lack of negative feedback may have been a factor. Negative
feedback is simply indicating No, you can't say that when a learner makes an error. Positive feedback,
on the other hand, is when learners are told when they are right. If the only messages learners get are
positive, it may be the case that there is no incentive to restructure their mental grammar. The
restructuring mechanisms close down. Hence it is now generally accepted that a focus on form (not
just on meaning) is necessary to guard against fossilization. A focus on the form includes giving
learners clear messages about their errors.
TOPIC 8: Integrating Grammar in the Classroom in Theory and Practice
How does grammar fit into the overall context of a language lesson? Once upon a time, the grammar
lesson was the language lesson, and so the question wouldn't have been asked. Typically, lessons followed
the pattern: grammar explanation followed by exercises. Or, what came to be known as presentation and
practice. The practice stage was aimed at achieving accuracy. When it was recognized that accuracy alone is
not enough to achieve mastery of a second language, a third element was added – production, the aim of
which fluency (as we discussed in our previous lesson). The standard model for the language lesson became:
This kind of organization is typical of many published English language teaching courses. It has a logic
that is appealing both to teachers and learners, and it reflects the way that other skills – such as playing tennis
or using a computer – are learned. That is, knowledge becomes skills through successive stages of practice.
Moreover, this model allows the teacher to control the content and pace of the lesson, which for a new teacher,
in particular, helps them cope with the unpredictability of classroom life. It provides a convenient template onto
which any number of lessons can be mapped.
Nevertheless, the PPP model has been criticized because of some of the assumptions it makes about
language and language learning. It assumes, for example, that language is best learned in incremental steps,
one 'bit of grammar' at a time, and that the teacher, by choosing what bit of grammar to focus on, can influence
the process. Research suggests, however, that language acquisition is more complex, less linear, and less
amenable to teacher intervention. The PPP model also assumes that accuracy precedes fluency. However, all
learners go through a long stage of making mistakes. Meanwhile, they may be perfectly capable of conveying
their intended meanings fluently. As in first language learning, accuracy seems to be relatively late acquired—a
kind of fine-tuning of a system that is already up-and-running. Delaying communication until accuracy is
achieved may be counterproductive. Delaying communication until accuracy is achieved may be
counterproductive. Rather than as preparation for communication, it seems that it is by means of
communication that the learner's language system establishes itself and develops.
An Alternative Model
As we have seen, PPP represents an accuracy-to-fluency model of instruction. An alternative model
stands this progression on its head and adopts a fluency-to-accuracy sequence. Put simply, the learning cycle
begins with the meanings that the learners want to convey. They try to express these meanings using their
available resources. They are then given guidance as to how to do this better. This guidance may include
explicit grammar instruction. Through successive stages of trial, error, and feedback the learner's output is
fine-tuned for accuracy.
Proponents of the communicative approach proposed a fluency-first model of instruction that is called
task-based: first, the learners perform a communicative task that the teacher has set them; the teacher then
uses this to identify language features learners could have used to communicate their intentions more
effectively. These features are taught and practiced before students re-perform the original (or a similar) task:
In this kind of lesson, the language items that are selected for special attention arise solely out of an
assessment of the learners' communicative difficulties, rather than having been predetermined by a grammar
syllabus.
But if the grammar is not pre-programmed, how is teaching organized? One approach is to organize the
syllabus around the tasks. Thus, the syllabus objectives are expressed in terms that relate to real language
use (telling a story, booking a hotel room, etc.) rather than in grammar terms (present perfect, adverbs of
frequency, etc.)
Task-based learning is not without its problems, however. For a start, what criteria determine the
selection of tasks, the ordering of tasks, and the evaluation of tasks? More problematic still are the
management problems associated with setting and monitoring tasks. It is partly due to these problems that
task-based teaching has had a mixed reception. Nevertheless, many teachers are finding ways of marrying
elements of a task-based approach with the traditional grammar syllabus.
TOPIC 9: Nature of Grammar, Standard/ Non-standard English, History of Prescriptive Grammar
NATURE OF GRAMMAR
The term is used to refer to the rules or principles by There are some fallacies concerning the
which a language works, that is, its system or nature of grammar:
structure; the systematic study and description of a One of the fallacies is that some
language. A set of rules and examples dealing with languages do not have grammar or have
little grammar. The grammar defines the
the syntax and word structures (morphology) of a main operations of the language, then
language. we can arrange that each language has
its grammar and is adequate
Grammar is more than just order and hierarchy; it is a Another fallacy about the change of
way of expressing complex multidimensional grammar is that it results in deterioration
schemas in one dimension. The need to communicate or alternatively evolution
these schemas is the concern of language, but how A final fallacy involves equating the
they are communicated is the concern of grammar. grammar of the spoken language with
Because grammar does not necessarily rely on the that of the written language.
preexistence of language, the elements of grammar
can be prototyped as features of other mental
systems before language appears. These elements
can then be exacted as needed for language. So the
genesis of language and the genesis of grammar do
not necessarily need to be considered as a single
process.
These intellectual tendencies were seen in the approach to standardize, refine, and fix English. People first
began to consider the grammar of English in this period. It wasn't fixed to rule (like that of Latin and other
'dead' languages). There was a large degree of language variation "even among educated speakers" and this
was seen as a bad thing (well, it still is in many circles). There was a desire to 'ascertain' the language [reduce
it to rule, settle disputed usage questions, and fix it permanently in this 'perfect' form]
18th century England--Latin was still considered the language of educated people, but the English empire had
become quite powerful, and London was the most important city in England. This forced the London dialect
into "important world language" status. To make English "better", people often tried to make English more like
Latin.
In the absence of an academy, many individuals attempted to right the wrongs of English and establish a
standard. Now for the first time, an effort was made to engage the general public in discussion of such matters.
At this point, English still had no dictionary and no descriptive grammar.
Grammarians:
● 1755--Samuel Johnson published the first English dictionary. It was far from ideal by today's
standards, but a major achievement at the time.
● 1761--Joseph Priestly published The Rudiments of English Grammar.
● 1762--Robert Lowth published Short Introduction to English Grammar.
● 1763--John Ash, Grammatical Institutes
● 1764--Noah Webster, A Grammatical Institute of the English Language, Part II, in America.
These were the first English grammars not written for foreigners or to teach Latin. Grammarians hoped to
codify the principles of language and reduce it to rule, settle disputed points, and point out common errors.
They essentially tried to make absolute what was common but not universal in a speech of the time.
The idea began to circulate that usage was the most important standard for considering grammar. That is,
what people say is the best indicator of what is right. Joseph Priestly was the strongest advocate of this
position in the 18th century. Some might call him radical even today. George Campbell also argued this point.
"For what is the grammar of any language? It is no other than a collection of general observations methodically
digested.‖ These early grammarians failed to recognize the importance of usage, did not understand
processes of linguistic change, and because of these, approached their task in the wrong way--logic is not the
way to determine what is right, and forcing people to use one linguistic form over another is never successful.
Prescriptive Rules of English first set out in the latter half of the 18th century. Eg.
● lie - intransitive / lay - transitive
● differentiation of between and among
● use of comparative when only two things are being compared
● condemnation of this here and that there
● condemnation of double negative--Lowth first stated the rule that two negatives make a positive
(NOT!)
We all know that a language has different dialects and pronunciations in various areas where it is spoken
depending on the culture and ways of people. But are you aware of the terms standard and non-standard
language? Or more specifically Standard and non-Standard English? Apart from the differences in spellings,
dialects, and pronunciations, a language differs in its form depending on the audience it is used for.
The formal type of English language that is mostly In contrast, non-Standard English is the
spoken and written in government agencies and opposite of Standard English. It is used in
environments is called Standard English. Apart from
everyday life by anyone from a little kid to a 70-
government institutions, Standard English is also
engaged in media conversations, school year-old person with basic words that are
announcements, and international communications. common and easily understandable by the local
community. Non-Standard English does not
Standard English is not an alien language but is very make use of complex terms and is sometimes
similar to the normal English language that we use in missing the proper punctuation as well.
our daily life. The only difference between the two is
that Standard English makes use of complicated terms One major difference between Standard English
which are otherwise not very common in our everyday and non-Standard English is that standard
communications. This makes this language very formal English does not have differences depending on
and perfect for settings like government authorities,
the area or community it is spoken in and is
media, and international dealings.
used in the same way throughout the world.
In other words, you can say that Standard English is Whereas non-Standard English has word
the language that is used by educated speakers in their preferences depending on the area and the
speeches, researches, interviews, or any other kind of locals that it is spoken by.
public discourse.
TOPIC 10: Theoretical Views on L2 Grammar Acquisition: The Noticing Hypothesis, Its Application in
Computer Assisted Language Learning
The noticing hypothesis is a concept in second-language acquisition proposed by Richard Schmidt in 1990.
He stated that learners cannot learn the grammatical features of a language unless they notice them. Noticing
alone does not mean that learners automatically acquire language; rather, the hypothesis states that noticing is
the essential starting point for acquisition. There is debate over whether learners must consciously notice
something, or whether the noticing can be subconscious to some degree.
8
Application of Computer Assisted Language Learning
Classroom discussion is a relevant application of computers in the language classroom. Kelm (as cited
in Chapelle, 2001) provides an example in which students were given a short story to serve as the topic
for a computer-assisted classroom discussion to acquire Portuguese. Students received questions
through the computer to test their reading comprehension, open and maintain classroom discussion
using the target language. Students also provided their ideas and feelings of the story via computers to
their classmates and teachers. This allowed them to work at their own pace. This classroom discussion
provides students with authentic tasks that enhance the proper conditions of SLA. Likewise, the fact
that the task of the activity was intended to provide students with the opportunity of using the target
language without the teacher-frontedness is seen as positive for language acquisition.
Now, what evidence suggests that learners have acquired the target language through CALL activities?
Doughty (as cited in Chapelle, 2001) compared the effects and results of input. A group of students
received input via CALL activities. Attention was drawn to relative clauses and grammatical structures
through highlighting on the computer screen. The other group received input through other means. The
group working with CALL instruction performed better on grammatical tests than the other group.
One can conclude that "these results provide evidence for the argument that calls materials with
carefully selected and highlighted target forms can offer superior language learning potential than those
in which learner's attention is not directed to form" (Chapelle, 2001, p. 69).
Finally, Beatty (2003) offers some principles for teaching CALL. These principles are, indeed,
significant for language teachers intending to use CALL in their lessons and institutions and promote
SLA. These are:
(1) Evaluate the appropriateness of the software program or computer-based resource. One could say
that this responsibility relies on instructors. Elements like cost, feedback, pedagogical approach,
authenticity, objectives, and others are to be taken into consideration. The students can certainly
participate and share this responsibility so teachers and administrators have a clear idea of
learners' reactions and motivation towards specific technologies.
(2) Create an environment in which CALL is supported. Beatty (2003) suggests arranging "the CALL
classroom to maximize interactions" (Beatty, 2003, p. 253). CALL classroom shall be organized
(semi-circles, stations, or others) so students have the opportunity to freely interact, share computer
screens, and create changes for "scaffolded learning" (Beatty, 2003, p. 253).
(3) Monitor learner participation in CALL programs and encourage autonomy. Computers offer a great
opportunity for monitoring and providing feedback to learners in electronic ways. In addition, there is
usually not enough time for comprehensively learning a language so students can use CALL to
work outside the classroom at their own pace.
(4) Encourage the use of CALL programs for collaboration and learners' interaction. Providing tasks in
which learners have to interact simultaneously with computers and other students becomes
significant. The internet is also a place where collaboration can occur via electronic mail, chat,
blogs, and threaded discussions (synchronous and asynchronous modes of communication). These
are relevant aspects of the implications and applications of CALL on SLA.
TOPIC 11: Strategies for Classroom-based Evaluation
Classroom-Based Evaluation
Teachers are actively and continuously involved in second language evaluation – sometimes as the
person making the actual decisions; sometimes in collecting relevant information for others who will
make the decisions; or sometimes in helping others make decisions by offering interpretations of
students' performance. Even when teachers are not the actual decision-makers, they are affected. For
example, someone else may be responsible for the placement of students in second language classes,
but teachers are responsible for teaching the students who are placed in their classes.
Parents, other teachers, noninstructional educational professionals (such as counselors and remedial
specialists), and students themselves are also important participants in an evaluation. Teachers often
consult or collaborate with other such people as part of the evaluation process: they consult with
parents to decide whether more or different kinds of homework are called for to assist a student who is
doing poorly; they consult with education specialists to decide whether second language learners
should receive additional or special instruction, and they may consult with school principals or
administrators responsible for placing students in different school programs. Teachers also plan
activities that help students assess their progress – for example, student conferences.
Second language evaluation relies on many different kinds of information. Although information about
student achievement is certainly relevant, it is not the only, or necessarily the most important,
information for making all decisions. Other factors can also be important – student behavior in class,
their attitudes toward school or themselves, their goals and needs concerning the outcomes of second
language learning, and their work habits, learning styles, and strategies.
Purpose of evaluating
Collecting information
Interpreting information
Purpose of evaluating
The purpose is a critical aspect of classroom-based evaluation, along with information, interpretation,
and decision making, see Fig. 1.
Figure 2 above shows the notions of instructional purposes, plans, and practices because
instruction – whether we consider the instruction of a course, a unit, or a lesson – consists of these three
components. The purposes identify the objectives of instruction – the "why." The plans describe the means of
attaining those objectives – the "how." And practices are what takes place in the classroom – the "what." We
also discuss other factors that, strictly speaking, are not part of classroom instruction itself but that,
nevertheless, can have a significant effect on second language teaching and learning. For example,
community values and attitudes toward second language learning as well as incoming students' current levels
of proficiency in the target language can determine the appropriateness of a particular second language
course. Our current theories about teaching and learning may influence the effectiveness of the instructional
approach of a second language course. We refer to these additional factors as "input factors." These four
aspects of instruction (purposes, plans, practices, and input factors) are summarized in Fig. 2. The sources of
influence listed in Fig. 2 are not necessarily the only pertinent ones. You may want to modify or add to our list
sources of influences that can be found in your classrooms and communities.
Paper or Project Prospectus - A prospectus is a brief, structured first-draft plan for a term paper or
term project. The Paper Prospectus prompts students to think through elements of the assignment,
such as the topic, purpose, intended audience, major questions to be answered, basic organization,
and time and resources required. The Project Prospectus focuses on tasks to be accomplished, skills
to be improved, and products to be developed.
Sub-classifications of validity
a. Concurrent validity
A test is said to have concurrent validity if the scores it gives correlate highly with a
recognized external criterion that measures the same area of knowledge or ability.
b. Construct validity
A test is said to have construct validity if scores can be shown to reflect a theory about the
nature of a construct or its relation to other constructs. It could be predicted, for example, that
two valid tests of listening comprehension would rank learners in the same way, but each would
have a weaker relationship with scores on a test of grammatical competence.
c. Content validity
A test is said to have content validity of the items or tasks of which it is made up constitute a
representative sample of items or tasks for the area for knowledge or ability to be tested. These
are often related to a syllabus or course.
d. Convergent validity
A test is said to have convergent validity when there is a high correlation between scores
achieved in it and those achieved in a different test measuring the same construct (irrespective
of method). This can be considered an aspect of construct validity.
e. Criterion-related validity
A test is said to have criterion-related validity if a relationship can be demonstrated between
test scores and some external criterion which is believed to be a measure of the same ability.
Information on criterion-relatedness is also used in determining how well a test predicts future
behavior.
f. Discriminant validity
A test is said to have discriminant validity if the correlation it has with tests of a different trait
is lower than the correlation with tests of the same trait, irrespective of the testing method. This
can be considered an aspect of construct validity.
g. Face validity
The extent to which a test appears to candidates, or those choosing it on behalf of
candidates, to be an acceptable measure of the ability they wish to measure. This is a subjective
judgment rather than one based on any objective analyses of the test, and face validity is often
considered not to be a true form of validity. It is sometimes referred to as ‗test appeal‘.
h. Predictive validity
An indication of how well a test predicts future performance in a relevant skill.
6. Reliability: Refers to the consistency and stability with which a test measures performance
Some variables influence test reliability:
a. Specificity – questions should not be open to different interpretations
b. Differentiation – the test discriminates between good and poor students
c. Difficulty – the test has an adequate level of difficulty
d. Length – the test contains enough items. In multiple-choice at least 40 test items are
required
e. Time – students should have sufficient time to perform a test/task
f. Item construction – a well-constructed question is better than a poor one
Over the past years, the focus of test construction has shifted from reliability to validity and more
specifically construct validity. Additionally, tests are increasingly considered as part of the educational
practice.
The more reliable a test is; the less random error it contains. A test that contains systematic error, e.g.
bias against a certain group, maybe reliable, but not valid.
2. Correlations
Correlations are illustrated by scatter plots which are similar to line graphs in that they use
horizontal and vertical axes to plot data points; serving a very specific purpose. Scatter plots show
how much one variable is affected by another. The relationship between two variables is called
their correlation.
A correlation indicates the strength and direction of a linear relationship between two random
variables. Correlations are always situated on the -1 to 1 spectrum.
The closer a correlation is to either end of the spectrum, the stronger the relationship.
A relationship is statistically significant if Sig. ≤ 0.5.
Scatter plots usually consist of a large body of data. The closer the data points come when
plotted to make a straight line, the higher the correlation between the two variables, or the stronger
the relationship. See Figures below.
A perfect positive correlation is given the value of 1. A perfect negative correlation is given the
value of -1. If there is no correlation present, the value given is 0. The closer the number is to 1 or -1,
the stronger the correlation, or the stronger the relationship between the variables. The closer the
number is to 0, the weaker the correlation.
If the data points make a straight line going from the origin out to high x- or y- values, then the
variables are said to have a positive correlation. If the line goes from a high value on the y-axis
down to a high value on the x-axis, the variables have a negative correlation.
In language tests, correlations can merely serve as an indicator of reliability, but very low
correlations mostly mean that something is wrong.
If the correlations are generally significant on the highest level (99%), except for example the
listening test, this may mean that the listening test does not differentiate between the most able
and the least able test takers.
3. Item reliability
An item reliability analysis indicates the discriminatory potential of a test item (i.e. does it
differentiate between the most able and the least able test takers?)
As in standard correlations, a very reliable item (with a highly discriminatory capacity) would
score close to -1 or 1. Items are considered unreliable if they score in between .3 to -.3
Qualitative Evaluation
A. Purpose
Not all topics in the language (grammar) can be measured statistically. Viewpoints, actions, and
characteristics can't always be represented numerically and so need a qualitative approach. In test
analysis interviews, usability tests and close reading are very useful tools to identify the strengths and
weaknesses of a task.
Because of its approach, qualitative evaluation may reveal data that would not emerge from
quantitative evaluation. There are various ways in which tests can be analyzed qualitatively.
B. Categories of Approaches
1. Reflection
This approach is aimed at gaining an insight into the thinking processes and opinions of the test
taker.
2. Verbal reports
Verbal reports or verbal protocols are a way of collecting qualitative data. They offer an insight
into the thought processes of informants.
Several variables can be distinguished:
Talk aloud: informants voice their thoughts
Think aloud: informants voice their thoughts as well as other information, i.e. physical
movements
Concurrent: the verbal report is given in real-time
Retrospective: the verbal report is given afterward
Mediated: the researcher occasionally intervenes
Non-mediated: the researcher does not intervene
Some pointers when using verbal reports in test analysis:
Before:
Consider language: will the informant be able to voice his/her thoughts in an L2?
Is a concurrent verbal report more suitable than a retrospective one or vice versa?
During:
Tell the informants what a verbal report is
Give the informants the opportunity to practice before the real report starts
Give feedback after the tryouts
After:
Try to process the data as quickly as possible
In any case, make notes for future reference
Some pitfalls to look out for:
Verbal reports may obstruct the test simulation
Vague reports might be of no value
Certain variables might go unnoticed during the verbal report
Get the informants‘ contact details so you can ask for additional information
Too much data is hard to process. Try organizing it by working with initial hypotheses
3. Diary studies
Informants keep a diary which allows researchers to get an insight into their thoughts. Diaries
are not often used in test validation research but they have proven their worth in research into
learning processes.
There are several varieties:
Unstructured: the informant is free to write what he/she wants in whatever format
Guided: the researcher gives the informant guidelines
Structured: the researcher offers the informant a diary form with closed and open-form
questions
When using diaries in test analysis, note that the data can be hard to process. It can be very
hard to conclude especially in unstructured diaries. Very structured diaries on the other hand
only offer the information you specifically ask for.
Self-assessment promotes students' abilities to assume more responsibility to identify where they
believe they have been successful and where they believe they require assistance. Discussing
students' self-assessments with them allows the teacher to see how they value their work and to ask
questions that encourage students to reflect upon their experiences and set goals for new learning.
Peer assessment allows students to collaborate and learn from others. Through discussions with
peers, high school students can verbalize their concerns and ideas in a way that helps them clarify their
thoughts and decide in which direction to proceed.
2. Discourse analysis
DA is the analysis of "text and talk as social practices" and is mainly concerned with power
relations, gender inequalities, etc. in DA the transcript of interaction is analyzed for adjacency pairs,
turn-taking, and repair. Special attention is paid to:
The effect of examiner behavior on test-taker performance
The effect of test-taker characteristics on performance
The effect of task type on performance; and
Comparing test taker language ability outside of test to test performance
The potential pitfalls are similar to those of CA.
4. Task characteristics
This type of validation research helps to examine the test tasks and to determine to which extent
they correspond to the test goal.
The analysis is performed by several expert judges who determine the task quality.
Bachman & Palmer (1996) suggest a framework of analysis which considers the following:
The setting (physical setting, participants, time, etc)
The rubric (language, complexity, and clarity of evaluation procedure)
The test input (length and characteristics of language)
The expected response
Relationship between expected and actual response
When analyzing a test for its task characteristics, consider the following:
The framework should be adjusted to each different test
The judges have to be competent and experienced test designers
E. Feedback Methods
1. Questionnaires
Questionnaires gather data such as opinions and views that can also be gathered through
interviews. The main advantage of questionnaires is the possibility to use a very large informant
population.
There are roughly two kinds of questionnaires:
Closed – the informant replies to the questions by ticking boxes or by marking a scale
Open – the informant replies in his / her own words
The question type is one of the main decisions to ponder when designing a questionnaire. Keep
in mind that closed questions are easy to process but the information from them is limited to the
questions that are explicitly asked. Open questions on the other hand allow researchers to probe
for answers but are harder to process.
Before administering the actual questionnaire, it is useful to run it through the following process:
Consider all possible issues that your questionnaire should cover.
Write a draft.
Eliminate questions that do not address the questionnaire purpose.
Group the questions thematically to spot overlaps.
Format the questionnaire and administer it to a small group of target respondents for
feedback
Rewrite the questionnaire.
Always avoid:
Double-barreled questions
Unclear instructions
Questions that do not apply to the respondent
Questions that rely on memory
Hypothetical questions
Biased options
Checklists
Checklists are a way of determining whether all procedures have been gone through, whether
all necessary features are present, etc. A checklist can be a list of boxes to be checked. Checklists
can be used in test design to make sure nothing has been forgotten. They have proven to be useful
when comparing the test outcomes to the developers' predictions. Much like questionnaires,
checklists should be piloted before using them.
2. Interviews
They are a flexible way of gathering data. There are various kinds of interviews, depending on
the structure and the number of informants interviewed at the same time.
a. Unstructured – there is no fixed interview schedule, but rather a several themes that are to
be addressed
b. Semi-structured – the researcher follows a preset schedule but it is possible to deviate from
this when interesting issues arise
c. Structured – the interviewer goes through a fixed series of written questions without
deviating. This type of interview closely resembles a questionnaire
d. One on one – this kind of interview allows the researchers to zoom in on the views of
individual respondents
e. Group – the advantage of interviewing larger numbers at once is that group interactions
might spark observations that would have gone unnoticed
F. Referencing
Think about the following before the interview:
a. Interviewers should get the chance to practice their interview skills before the data collection
b. Ideally, the pilot settings resemble the actual conditions as accurately as possible
c. During the interview, it‘s useful to take note of the interview situation
d. The success of an interview largely depends on the interviewer-respondent interaction
e. Interviews are time-consuming
Referencing is assigning students a position in a rank order based on their score on a test. There are
various kinds of referencing:
1. Norm-referencing is the placement of learners in rank order, their assessment, and ranking in
relation to their peers.
2. Criterion-referencing is a reaction against norm-referencing in which the learner is assessed
purely in terms of his/her ability in the subject, irrespective of the ability of his/her peers.
3. The mastery criterion-referencing approach is one in which a single ‗minimum competence
standard‘ or ‗cut-off point‘ is set to divide learners into ‗masters‘ and ‗non-masters‘, with no
degrees of quality in the achievement of the objective being recognized.
4. A continuum criterion-referencing approach is an approach in which an individual ability is
referenced to a defined continuum of all relevant degrees of ability in the area in question.
TOPIC 13: Planning for and Implementing Classroom Observations; Portfolios; Conferences;
Anecdotal Records; Checklists, Rating Scales; and Interviews; Projects and Presentations
A. Classroom Observation
Teacher observation has been accepted readily in the past as a legitimate source of information for
recording and reporting student demonstrations of learning outcomes in early childhood education. As
the student progresses to later years of schooling, less and less attention typically is given to teacher
observation and more and more attention typically is given to formal assessment procedures involving
required tests and tasks taken under explicit constraints of context and time. However, teacher
observation is capable of providing substantial information on student demonstration of learning
outcomes at all levels of education.
For teacher observation to contribute to valid judgments concerning student learning outcomes,
evidence needs to be gathered and recorded systematically. Systematic gathering and recording of
evidence require preparation and foresight. This does not necessarily mean that all aspects of the
process of observation need to be anticipated but that the approach taken is deliberate rather than
happenstance. It is necessary, at least, to know in advance both what kinds of learning outcomes are
anticipated and how evidence will be recorded. Adequate records are essential for good assessment.
Teacher observation can be characterized as two types: incidental and planned.
Incidental observation occurs during the ongoing (deliberate) activities of teaching and learning and
the interactions between teacher and students. In other words, an unplanned opportunity emerges, in
the context of classroom activities, where the teacher observes some aspect of individual student
learning. Whether incidental observation can be used as a basis for formal assessment and reporting
may depend on the records that are kept.
Planned observation involves the deliberate planning of an opportunity for the teacher to observe
specific learning outcomes. This planned opportunity may occur in the context of regular classroom
activities or may occur through the setting of an assessment task (such as a practical or performance
activity)
Observation is a powerful teaching resource. When you teach with your eyes, ears, and mind open to what is
happening around you, observation makes you a much better teacher. Observation can
help you get to know children so that you can build relationships with them. When children sense that
you know them, they feel safe and secure and are more open to learning.
give you the information you need to make wise decisions about what and how to teach each child. You
can respond in just the right way if you take a moment to observe during an interaction.
enhance your knowledge of child development and learning.
help you gather evidence about children‘s progress toward meeting curriculum goals.
provide you with specific examples of what children know and can do that you can share with their
families. Family members love to hear stories about their child‘s accomplishments. When a child‘s
progress is slower or more advanced, it is important to have factual information to share that shows
exactly what the child can do and what the child might be ready to learn.
Add interest and excitement to your work. Let your curiosity about children guide you.
An essential requirement for all types of evidence is anticipating the kinds of learning outcomes that
may be demonstrated. This is particularly important where observation is incidental and where
judgments (rather than descriptions) are recorded. Syllabuses provide a framework of learning
outcomes that serve as the perceptual reference points for recognizing the characteristics of student
performance. The framework of learning outcomes makes available to the teacher concepts and
language for recognizing and describing what a student knows and can do. Learning the structure,
language, and concepts of the framework, therefore, is a key aspect of planning for teacher
observation, as it is too for teaching. Incidental observation necessarily involves little additional
planning, apart from the normal planning of classroom learning activities for students. Incidental
observation is opportunistic, capitalizing on revelations of student learning during regular classroom
learning activities. In this sense, it cannot be planned. It is essentially unanticipated. It can only be
recorded through descriptions in a logbook. Although there may sometimes be an artifact to provide
corroboration for the teacher's observation, any process details depend on the teacher's description.
Incidental observation is therefore the weakest form of teacher observation and would preferably be
used only as supplementary evidence to support other forms of evidence. Relying on incidental
observation alone would be unsatisfactory (see caveats below).
Planned observation can involve planning for 'in situ' observation (in learning situations) or planning for
set assessment tasks. There is little to distinguish these two situations in practical terms. However, as
assessment becomes more important, particularly in Years 8 to 10, students may need to know when
they are being assessed, since they may otherwise choose not to show their actual capabilities. The
absence of demonstration of learning outcomes might not indicate incapability of demonstrating those
learning outcomes but lack of appropriate challenge or opportunity. Formal assessment occasions
would appear to become more important in the secondary school than in the primary school, at least for
the present.8
For all planned observations, whether 'in situ' or set tasks, thought needs to be given to how the event
and/or the observations will be recorded. Consideration needs to be given to whether a direct record
will be kept and what form of observation record will be made. The validity of teacher observations is
strengthened by preparing an observation sheet that allows the systematic recording of observations
and judgments. An observation sheet may include checklists of learning outcomes and/or categories for
describing student activities and performances. Learning outcomes might be made more explicit by
listing their elaborations, components or criteria, that is, by providing more detail on the characteristics
of the desired learning outcome.
The disadvantages are outweighed by the advantages. They can be overcome, in any case, by careful
design of the observation sheet, tailoring it to the current stage of student development, and allowing
space for additional observations to be recorded. Observation sheets should be used as a tentative
organizing structure for recording teacher observations rather than a limiting framework for the actual
observations.
Space also needs to be provided on the observation sheet for including descriptive details of the
context. These details need to include any characteristics of the setting or the occasion that could have
influenced the student's performance, either positively or negatively, and that might be relevant in
making a judgment about whether the student has demonstrated particular learning outcomes. The
details can be physical (e.g., uncomfortable surroundings), psychological (e.g., personal attributes in
stressful situations), or social (e.g., other events in the life of the school or the student).
Through all of this, it must be remembered that any written record of observations is necessarily
selective. Only certain features of student performance are likely to be noticed and can be recorded.
Therefore, having a clear understanding and ready access to the framework of expected learning
outcomes is essential. One technique for reducing the cognitive demands of open observation is
‗spotlighting‘. This means targeting specific learning outcomes (across several levels of a strand) on
particular occasions. This has the added advantage of ensuring systematic coverage of all relevant
learning outcomes. However, it should not be pursued so religiously that evidence of other learning
outcomes outside the spotlighting target is ignored.
B. Anecdotal Records
Anecdotal records are notes written by the teacher regarding student language behavior or learning. They
document and describe significant daily events and relevant aspects of student activity and progress. These
notes can be taken during student activities or at the end of the day. Formats for collection should be flexible
and easy to use.
Teachers may choose to keep running written observations for each student or they may use a more
structured approach, constructing charts that focus each observation on the collection of specific data. A
combination of open-ended notes and structured forms may also be used. It is important to date all
observations recorded.
C. Checklists
Observation checklists, usually completed while students are engaged in specific activities or processes, are
lists of specific criteria that teachers focus on at a particular time or during a particular process. Checklists are
used to record whether students have acquired specific knowledge, skills, processes, abilities, and attitudes.
Checklists inform teachers about where their instruction has been successful and where students need
assistance or further instruction. Formats for checklists should be varied and easy to use.
Rating scales record the extent to which specific criteria have been achieved by the student or are present in
the student's work. Rating scales also record the quality of the student's performance at a given time or within
a given process. Rating scales are similar to checklists, and teachers can often convert checklists into rating
scales by assigning numerical values to the various criteria listed. They can be designed as number lines or as
holistic scales or rubrics. Rubrics include criteria that describe each level of the rating scale and are used to
determine student progress in comparison to these expectations. All formats for rating student progress should
be concise and clear.
E. Portfolios
Portfolios are collections of relevant work that reflect students' efforts, development, and progress over a
designated period. Portfolios provide students, teachers, parents, and administrators with a broad picture of
each student's growth over time, including the student's abilities, knowledge, skills, and attitudes. Students
should be involved in the selection of work to be included, goal setting for personal learning, and self-
assessment. The teacher can encourage critical thinking by having students decide which of their works to
include in their portfolios and explain why they chose those particular items. Instruction and assessment are
integrated as students and teachers collaborate to compile relevant and individual portfolios for each student.
F. Interviews or Conferences
Teacher-student interviews or conferences are productive means of assessing individual achievement and
needs. During these discussions, teachers can discover students' perceptions of their processes and products
of learning.
Interview questions can be developed to meet the needs of specific students and to fit the curriculum
objectives.
Examples of questions that help students reflect upon their speaking, listening, and viewing experiences
include the following:
Which speaking, listening, and viewing activities did you participate in this week? Which did you enjoy/
dislike? Why?
Which oracy activities did you find most difficult? Why? Did you solve the difficulties? How?
In which speaking activity do you think you did your best? What makes you think so?
What type of speaking activities would you like to learn to do better?
Criteria should be developed and/or discussed with students at the outset of activities such as written reports,
visual representations, oral representations, or projects which combine more than one aspect of language use
and understanding. Teachers may assess the attitudes, skill development, knowledge, or learning processes
demonstrated by students as they engage in language activities. Data gathered during student activities can be
recorded as anecdotal notes, on checklists, rating scales, or by using a combination of these.
TOPIC 14: Using Tests: Norm-referenced Tests vs. Criterion-referenced Tests; Objective-referenced
Tests and Domain-referenced tests
INTRODUCTION
Summative assessment is the process of evaluating (and grading) the learning of students at a point
in time. It is testing which often occurs at the end of a term or course, used primarily to provide
information about how much the student has learned and how well the course was taught (Brown 2003;
Hughes 2003; Wojtczak 2002). Testing is said to be direct when it requires the candidate to perform
precisely the skill that is to be measured whereas indirect testing attempts to measure the abilities
which underlie the skill in which we are interested (Henning 1987; Hughes 2003; Wojtczak 2002).
Discrete point testing refers to the testing of one element at a time, item by item.
Integrative testing, by contrast, requires the candidate to combine many language elements in the
completion of the task (Brown 2003; Henning 1987; Hughes 2003; Wojtczak 2002). A test that is scored
by comparing the response of the student with an established set of correct responses is known as an
objective test. In contrast, a subjective test is scored by opinion and personal judgment (Bachman
1995; Hughes 2003).
The traditional test asks students what they can recall and produce while the alternate test asks
students what they can do with the language, how they integrate and produce it (Brown 2003). High-
stakes tests are those which impact a large number of people or programs whereas low-stakes tests
have a relatively minor impact on people or programs (Wojtczak 2002).
Norm-referenced tests and criterion-referenced tests are the language testing approaches that
provide information about the knowledge and skills of the students tested. A norm-referenced test is a
process of evaluating (and grading) the learning of students by judging (and ranking) them against the
performance of their peers. The criteria-referenced test is the process of evaluating (and grading) the
learning of students against a set of pre-specified criteria (Brown 2003; Hughes 2003; Huitt 1996;
Wojtczak 2002).
There is another way to look at different kinds of language tests which are summarized in figure 2 below:
Figure 2: Kinds of tests adapted from Bachman (1995) & Hughes (2003)
NRT tells us where a student stands compared to other students in her performance and it is only
useful to take certain types of decisions (Bachman 1995; Kubiszyn & Borich 2007). The examples of NRTs
include IQ tests, developmental-screening tests (used to identify learning disabilities in young children or
determine eligibility for special educational services), cognitive ability tests, readiness tests, etc. SAT (Stanford
Achievement Test), CAT (California Achievement Test), MAT (Metropolitan Achievement Test), TOEFL,
IELTS, etc. are best practices of NRT. To be simpler, theater auditions, course placement, program eligibility,
or school admissions and job interviews are NRTs because their goal is to identify the best candidate
compared to the other candidates, not to determine how many of the candidates meet a fixed list of standards.
Educators use NRTs to evaluate the effectiveness of teaching programs, to help determine students'
preparedness for programs, and to determine the diagnosis of disabilities for eligibility.
Second, a criterion is the standard of performance or cut-point for decision-making that is expected for
passing the test/course. Here, CRT would be used to assess whether students pass or fail at a certain criterion
level or cut-point (Bond 1996). CRT is a test that measures a test taker's performance according to a particular
standard or criterion that has been agreed upon. The test taker must reach this level of performance to pass
the test, and a test taker's score is interpreted with reference to the criterion score, rather than to the scores of
other test takers (Richard & Schmidt 2002). Hence, CRT is an approach of evaluation through which a
learner's performance is measured with respect to the same criterion in the classroom (Brown 1976; Mrunalini
2013; Salvia & Ysseldik 2007). For instance, if we conclude the test performance that a particular student
achieved in the classroom as '90 percent, it is an approach of evaluating through CRT. The popular way to
show CRT is the percentage (Mrunalini 2013; Salvia & Ysseldik 2007).
CRT tells us about a student's level of proficiency in or mastery over a set of skills and helps us decide
whether a student needs more or less work over a set of skills saying nothing about the student's place
compared to other students (Bachman 1995; Kubiszyn & Borich 2007). For instance, if a test is designed to
evaluate how well students demonstrate mastery of the specified content (e.g. types of tense) it is CRT. Most
everyday tests, quizzes, and final exams conducted in the classroom teaching can be taken as CRT. A 'Basic
Writing' CRT would include questions based on what was supposed to be taught in writing classes. It would not
include 'speaking' or 'advanced writing' questions. Students who took the 'Basic Writing' course could pass this
test if they were taught well and they studied enough and the test was well-prepared.
NRT is also good for ranking and sorting students for administrative purposes (Anastasi 1988). It is
intended to judge the school's performance and the school's accountability of providing learning
standards and maintaining the quality of education. The test is also used to determine a young child's
readiness for preschool or kindergarten. These tests may be designed to measure oral-language ability,
visual-motor skills, and cognitive and social development. NRTs are administered for admission
decisions at the entry-level and promotional decisions at the exit level.
At the policy level, NRTs are very useful because based on NRT data, programs are selected and
evaluated; remedial and gifted strategies are developed; funding is appropriated for teachers'
professional development, and textbooks are prepared. In the contrast, the representative arguments
on strengths typically made by proponents of CRT are also bulky. CRT is good to measure specific
skills, objectives, or domains (Bond 1996; Kubiszyn & Borich 2007; Linn 2000; Sanders & Horn 1995;
Swanson & Watson 1982). It gives direction to learn how well students are learning. It is good to
determine learning progress if students have learning gaps or academic deficits that need to be
addressed (Bond 1996).
CRT gives direction to teaching and re-teaching. Instructors can use the test results to determine how
well they are teaching the curriculum and where they are lagging (Bond 1996). CRT helps measure the
academic achievement of students usually to compare academic performance among schools, districts,
and states. The results provide a basis for determining how much is being learned by students and how
well the educational system is producing desired results (Cohen, Manion & Morrison 2004). Through
open-ended questions, this test promotes higher-level cognitive skills such as critical thinking, problem-
solving, reasoning, analysis, or interpretation (Sanders & Horn 1995). CRT can give parents more
information about what exactly their children have learned, what competencies still need to be
mastered, and which ones have been already mastered (Bond, 1996). It is well suited for training
programs to assess the learning of trainees. It is used to determine whether a person is qualified to
receive a certificate or not (Swanson & Watson 1982).
Overreliance on NRT results can lead to inadvertent discrimination against minority groups and low-
income student populations, both of which tend to face more educational obstacles than non-minority
students and higher-income households (Bond 1996). The test is considered biased by the experts
because those questions are eliminated which low scorers might get right (Huitt 1996). The results of
NRT are shown in the score band and not on the true score; therefore, it has measurement error. NRTs
encourage teachers to view students in terms of a bell curve, which can lead them to lower academic
expectations for certain groups of students, particularly special-needs students, English-language
learners, or minority groups. And when academic expectations are consistently lowered year after year,
students in these groups may never catch up to their peers, creating a self-fulfilling prophecy (The
Glossary of education …2014).
NRT has to be finished within a time limit which may favor or disfavor an individual student. Although
the CRT has many advantages, it does have some bottlenecks. The kinds of arguments typically made
by the critics of NRT are worth considering. CRT does not allow for comparing the performance of
students in a particular location with national norms. For example, a school would be unable to
compare 5th-grade achievement levels in a district and therefore be unable to measure how a school is
performing against other schools (Huitt 1996). If not institutional, it costs a lot of money, time, and effort.
Creating a specific curriculum takes time and money to hire more staff, and most likely the staff should
be professionally competent (The Glossary of education …2014). CRT needs efficient leadership and
collaboration, and lack of these can cause problems. For instance, if a school is creating assessments
for special education students with no well-trained professionals, they might not be able to create
assessments that are learner-centered (Bond 1996). It is difficult for curriculum developers to know
what is working and what is not working because tests tend to be different from one school to another.
It would require years of collecting data to know what is lacking and what is not. It may slow the
process of curriculum change if tests are constantly changed (Corbett & Wilson, 1991).
The process of determining proficiency levels and passing scores on CRT can be highly subjective or
misleading— and the potential consequences can be significant, particularly if the tests are used to
make high-stakes decisions about students, teachers, and schools. Because reported ―proficiency‖
rises and falls in direct relation to the standards or cut-off scores used to make a proficiency
determination, it is possible to manipulate the perception and interpretation of test results by elevating
or lowering either standards and passing scores. And when educators are evaluated based on test
scores, their job security may rest on potentially misleading or flawed results.
Even the reputations of national education systems can be negatively affected when a large percentage
of students fail to achieve "proficiency" on international assessments (Huitt 1996). The subjective
nature of proficiency levels allows the CRT to be exploited for political purposes to make it appear that
schools are either doing better or worse than they are. For example, some states have been accused of
lowering proficiency standards of standardized tests to increase the number of students achieving
"proficiency", and thereby avoid the consequences—negative press, public criticism, large numbers of
students being held back or denied diplomas that may result from large numbers of students failing to
achieve expected or required proficiency levels (The Glossary of education …2014).
In NRT, item construction usually does not develop from task analysis; test items may or may not be
related to the objectives of instruction (intervention). In CRT, items developed from task analysis; test
items are related to the objectives of instruction (Bond 1996; Linn 2000; Montgomery & Connolly 1987;
Sanders & Horn 1995). Scoring of NRT is based on standards relative to a group; variability of scores
(ie, means and standard deviations) is desired with normal distribution. Scoring of CRT is based on
absolute standards; variability of scores is not obtained because perfect or near-perfect scores are
desired (Bond 1996; Linn 2000; Montgomery & Connolly 1987; Sanders & Horn 1995). In an NRT
percentile rank is used for relative ranking whereas in a CRT percent is used for performance (Bond
1996; Kubiszyn & Borich 2007; Linn 2000; Montgomery & Connolly 1987; Sanders & Horn 1995). NRTs
are breadth but not depth in content specification whereas CRTs are depth but not breadth in the
content specification (Bond 1996; Linn 2000; Sanders & Horn 1995).
A variety of different types of decisions are made in almost any language program, and language tests
of various kinds can help in making those decisions. To test appropriately, I argue that each teacher
must be very clear about his/her purpose for making a given decision and then match the correct type
of test to that purpose. If my purpose is to measure weight, I will use some sort of weighing device. If I
want to measure linear distance, I will use a ruler or odometer. In this section, I summarize the main
points that teachers must keep in mind when matching the appropriate measuring tool (NRT or CRT in
this case) with the types of decisions they must make about their students. The main points to consider
are shown in Table 1.2. As the discussion develops, I briefly cover each point as it applies to four types
of decisions.
In administering language programs, I have found myself making just four kinds of decisions:
proficiency, placement, achievement, and diagnostic. Since these are also the four types of tests
identified in Alderson, Krahnke, and Stansfield (1987) as the most commonly used types of tests in our
field, I call them the primary language testing functions and focus on them in the remainder of this
chapter. These testing functions correspond neatly to the NRT and CRT categories as follows: NRTs
aid in making program-level decisions (that is, proficiency and placement decisions) and CRTs are
most effective in making classroom-level decisions (that is, diagnostic and achievement). As I will
explain, these testing categories and functions provide a useful framework for thinking about decision-
making in language programs.
Of course, other categories of tests do exist. For instance, aptitude tests, intelligence tests, learning
strategy tests, attitude tests, and so forth do not fit into these four language testing functions. Generally,
these other types of tests are not administered in language programs so I do not discuss them in this
book. Instead, proficiency, placement, achievement, and diagnostic testing will be my focus because a
command of these testing functions will provide all the tools needed for decision-making in most
language programs. This approach should not only help teachers to learn about language testing but
also should help them to make responsible proficiency decisions, placement decisions, achievement
decisions, and diagnostic decisions about their students.
Program-Level Decisions
Proficiency decisions.
Sometimes, teachers and administrators need to make decisions based on the students‘ general levels
of language proficiency. The focus of such decisions is usually on the general knowledge or skills prerequisite
to entry or exit from some type of institution, for example, American universities. Such proficiency decisions
are necessary for setting up entrance and exit standards for a curriculum, in adjusting the level of program
objectives to the student's abilities, or in making comparisons between programs. In other words, teachers and
administrators must make a variety of curricular and administrative decisions based on overall proficiency
information.
Proficiency decisions are often based on proficiency tests specifically designed for such decisions. By
definition, then, a proficiency test assesses the general knowledge or skills commonly required or
prerequisite to entry into (or exemption from) a group of similar institutions. One example is the Test of
English as a Foreign Language (TOEFL), which is used by many American universities that have
English language proficiency prerequisites in common (see Educational Testing Service 1992, 1994).
Understandably, such tests are very general and cannot be related to the goals and objectives of any
particular language program. Another example of the general nature of proficiency tests is the ACTFL
Proficiency Guidelines (American Council on the Teaching of Foreign Languages 1986). Although
proficiency tests may contain subtests for each skill, the testing of the skills remains very general, and
the resulting scores can only serve as overall indicators of proficiency.
Since proficiency decisions require knowing the general level of proficiency of language students in
comparison to other students, the test must provide scores that form a wide distribution so that
interpretations of the differences among students will be as fair as possible. Thus, I argue that
proficiency decisions should be made based on norm-referenced proficiency tests because NRTs have
all the qualities desirable for such decisions (refer to Table 1.1). Proficiency decisions based on large-
scale standardized tests may sometimes seem unfair to teachers because of the arbitrary way that they
are handled in some settings, but like it or not, such proficiency decisions are often necessary: (a) to
protect the integrity of the institutions involved, (b) to keep students from getting in over their heads,
and (c) to prevent students from entering programs that they do not need.
Proficiency decisions most often occur when a program must relate to the external world in some way.
The students are arriving. How will they fit into the program? And when the students leave the program,
is their level of proficiency high enough to enable them to succeed linguistically in other institutions?
Sometimes, comparisons are also made among different language programs. Since proficiency tests,
by definition, are general, rather than geared to any particular program, they could serve to compare
regional branches of a particular language teaching delivery system. Consider what would happen if the
central office for a nationwide chain of ESL business English schools wanted to compare the
effectiveness of all its centers. To make such decisions about the relative merit of the various centers,
the administrators in charge would probably want to use some form of business English proficiency
test.
However, extreme care must be exercised in making comparisons among different language programs
because of the very fact that such tests are not geared to any particular language program. By chance,
the test could fit the teaching and content of one program relatively closely; as a consequence, the
students in that program might score high on average. By chance, the test might not match the
curriculum of another program quite so well; consequently, the students would score low on that
particular proficiency test. The question is: Should one program be judged less effective than another
simply because the teaching and learning that is going on in that program (though perfectly effective
and useful) are not adequately assessed by the test? Of course not. Hence, great care must be used in
making such comparisons with special attention to the validity and appropriateness of the tests to the
decisions being made.
Because of the general nature of proficiency decisions, a proficiency test must be designed so that the
general abilities or skills of students are reflected in a wide distribution of scores. Only with such a wide
distribution can decision-makers make fair comparisons among the students, or groups of students.
This need for a widespread of scores most often leads testers to create tests that produce normal
distributions of scores. All of which is to argue that proficiency tests should usually be norm-referenced.
Proficiency decisions should never be undertaken lightly. Instead, these decisions must be based on
the best obtainable proficiency test scores as well as other information about the students. Proficiency
decisions can dramatically affect students‘ lives, so slipshod decision making in this area would be
particularly unprofessional.
Placement decisions
Placement decisions usually have the goal of grouping together students of similar ability levels.
Teachers benefit from placement decisions because their classes contain students with relatively
homogeneous ability levels. As a result, teachers can focus on the problems and learning points
appropriate for that level of the student. To that end, placement tests are designed to help decide what
each student's appropriate level will be within a specific program, skill area, or course. The purpose of
such tests is to reveal which students have more of, or less of, particular knowledge or skill so that
students with similar levels of ability can be grouped.
Examining the similarities and differences between proficiency and placement testing will help to clarify
the role of placement tests. To begin with, a proficiency test and a placement test might at first glance
look very similar because they are both testing fairly general material. However, a proficiency test tends
to be very, very general, because it is designed to assess extremely wide bands of abilities. In contrast,
a placement test must be more specifically related to a given program, particularly in terms of the
relatively narrow range of abilities assessed and the content of the curriculum, so that it efficiently
separates the students into level groupings within that program.
Put another way, a general proficiency test might be useful for determining which language program is
most appropriate for a student; once in that program, a placement test would be necessary to
determine the level of study from which the student would most benefit. Both proficiency and placement
tests should be norm-referenced instruments because decisions must be made on the students‘ relative
knowledge or skill levels. However, as demonstrated in Brown (1984b), the degree to which a test is
effective in spreading students out is directly related to the degree to which that test fits the ability levels
of the students. Hence, a proficiency test would typically be norm-referenced to a population of
students with a very wide band of language abilities and a variety of purposes for using the language.
In contrast, a placement test would typically be norm-referenced to a narrower band of abilities and
purposes-usually the abilities and purposes of students at the beginning of studies in a particular
language program.
Consider, for example, the English Language Institute (ELI) at the University of Hawaii at Manoa
(UHM). All of the international students at UHM have been fully admitted by the time they arrive. To
have been admitted, they must have taken the TOEFL and scored at least 500. From our point of view,
language proficiency test scores are used to determine whether these students are eligible to study in
the ELI and follow a few courses at UHM. Those students who score 600 or above on the TOEFL are
told that they are completely exempt from ELI training. Thus, I can safely say that most of the ELI
students at UHM have scored between 500 and 600 on the TOEFL.
Within the ELI, there are three tracks, each of which is focused on one skill (reading, writing, or
listening) and also up to three levels within each track. As a result, the placement decisions and the
tests upon which they are based must be much more focused than the information provided by TOEFL
scores. The placement tests must provide information on each of the three skills involved as well as on
the language needed by students in the relatively narrow proficiency range reflected in their TOEFL
scores, which were between 500 and 600. I see a big difference between our general proficiency
decisions and our placement decisions. While the contrasts between proficiency and placement
decisions may not be quite so clear in all programs, my definitions and ways of distinguishing between
proficiency and placement decisions should help teachers to think about the program-level decisions
and testing in their language programs. If a particular program is designed with levels that include
beginners as well as very advanced learners a general proficiency test might adequately serve as a
placement instrument. However, such a wide range of abilities is not common in the programs that I
know about and, even when appropriately measuring such general abilities, each test must be
examined in terms of how well it fits the abilities of the students and how well it matches what is taught
in the classrooms.
If there is a mismatch between the placement test and what is taught in a program (as found in Brown
1981), the danger is that the groupings of similar ability levels will simply not occur. For instance,
consider an elementary school ESL program in which a general grammar test is used for placement. If
the focus of the program is on oral communication at three levels, and a pencil and paper test is used
to place the children into those levels, numerous problems may arise. Such a test is placing the
children into levels based on their written grammar abilities. While grammar ability may be related to
oral proficiency, other factors may be more important to successful oral communication. Such testing
practices could result in the oral abilities of the children in all three of the (grammar-placed) levels being
about the same in terms of average abilities as well as a range of abilities.
Classroom-Level Decisions
Achievement decisions
All language teachers are in the business of fostering achievement in the form of language learning.
The purpose of most language programs is to maximize the possibilities for students to achieve a high
degree of language learning. Hence, sooner or later, most language teachers will find themselves
interested in making achievement decisions. Achievement decisions are decisions about the amount
of learning that students have done. Such decisions may involve who will be advanced to the next level
of study or which students should graduate. Teachers may find themselves wanting to make rational
decisions that will help to improve achievement in their language programs. Or they may find a need to
make and justify changes in curriculum design, staffing, facilities, materials, equipment, and so on.
Such decisions should most often be made with the aid of achievement test scores.
Making decisions about the achievement of students and about ways to improve that achievement
usually involves testing to find out how much each person has learned within the program. Thus, an
achievement test must be designed with very specific references to a particular course. This link with
a specific program usually means that the achievement tests will be directly based on course objectives
and will therefore be criterion-referenced. Such tests will typically be administered at the end of a
course to determine how effectively students have mastered the instructional objectives.
Achievement tests must be not only very specifically designed to measure the objectives of a given
course but also flexible enough to help teachers readily respond to what they learn from the test about
the students' abilities, the student's needs, and the students' learning of the course objectives. In other
words, a good achievement test can tell teachers a great deal about their students' achievements and
the adequacy of the course. Hence, while achievement tests should be used to make decisions about
students' levels of learning, they can also be used to affect curriculum changes and to test those
changes continually against the program realities.
Diagnostic decisions.
From time to time, teachers may also take an interest in assessing the strengths and weaknesses of
each student vis-&vis the instructional objectives for purposes of correcting an individual's deficiencies
"before it is too late." Diagnostic decisions are aimed at fostering achievement by promoting strengths
and eliminating the weaknesses of individual students. Naturally, the primary concern of the teacher
must be the entire group of students collectively, but some attention can also be given to each student.
This last category of decision is concerned with diagnosing problems that students may be having in
the learning process. While diagnostic decisions are related to achievement, diagnostic testing often
requires more detailed information about the very specific areas in which students have strengths and
weaknesses. The purpose is to help students and their teachers to focus their efforts where they will be
most effective.
As with an achievement test, a diagnostic test is designed to determine the degree to which the
specific instructional objectives of the course have been accomplished. Hence, it should be criterion-
referenced in nature. While achievement decisions are usually focused on the degree to which the
objectives have been accomplished at the end of the program or course, diagnostic decisions are
normally made along the way as the students are learning the language. As a result, diagnostic tests
are typically administered at the beginning or in the middle of a language course. If constructed to
reflect the instructional objectives, one CRT in three equivalent forms could serve as a diagnostic tool
at the beginning and midpoints in a course and as an achievement test at the end. Perhaps the most
effective use of a diagnostic test is to report the performance level on each objective (in a percentage)
to each student so that he or she can decide how and where to invest time and energy most profitably.
Objective-Referenced Tests
Objective-referenced tests are very similar to criterion-referenced tests in that the questions appearing
on both are selected because they relate to rather narrow, highly specific learning objectives. Both
contain items that measure clearly defined objectives, but objective-referenced tests differ from
criterion-referenced in that they have no pre-determined performance standard associated with the
scores. Their purpose is to survey the tasks that students can perform in different areas of the
curriculum. Administered periodically, these tests, or the individual test items, provide useful information
for assessing the curriculum and for determining general educational progress.
Domain-Referenced Tests
Domain-referenced tests are used to estimate performance on a universe of items similar to those used
on the test. As such, the content area of the test is rather explicitly defined such as, for example, word
recognition ability at the primary level or reading comprehension ability at the intermediate level. A large
pool of items is developed for the domain and items are randomly sampled from the pool for placement
on a particular test. Scores are reported as the percentage of items that a student could get correct in
the total pool.
CONCLUSION
It should be noted that individual items on the four types of tests can be quite similar both in structure and in
content. As such, it is sometimes difficult to differentiate between the tests on appearance alone. The major
differences are related to how the scores are interpreted as well as to the usefulness of the test results in
making various types of decisions. For example, a norm-referenced test is designed primarily to allow for
comparisons to be made between individuals and groups of students. Items appearing on these tests have
been selected because they have been found to maximize small differences between students. Any questions
that are ineffective at detecting small differences between the achievement levels of students are eliminated
during the test development phase. Objective-referenced and criterion-referenced tests, on the other hand, are
primarily concerned with content coverage. Items are selected or rejected based on whether or not they are
judged to measure a component of the knowledge or skills specified in the learning objectives to which the
tests are referenced. These tests provide valuable information concerning what a student can and cannot do.
However, they tend to be less efficient than norm-referenced tests when the scores are to be used to detect
small differences between the students for comparative purposes. It is extremely important, then, to ensure
that the purposes of testing are clear before a test is developed or selected.
TOPIC 15: No Child Left Behind (NCLB): Testing Requirements and Difficulties
No Child Left Behind (NCLB): What You Need to Know
At a Glance
No Child Left Behind (NCLB) was the main law for K–12 general education in the United States from
2002–2015.
The law held schools accountable for how kids learned and achieved.
The law was controversial in part because it penalized schools that didn‘t show improvement.
The No Child Left Behind Act of 2001 (NCLB) was in effect from 2002–2015. It was a version of the
Elementary and Secondary Education Act (ESEA). NCLB was replaced by the Every Student Succeeds Act
in 2015.
When NCLB was the law, it affected every public school in the United States. Its goal was to level the
playing field for students who are disadvantaged, including:
Students in poverty
Minorities
Students receiving special education services
Those who speak and understand limited or no English
NCLB was controversial. Here‘s an overview of how the law affected students with learning and thinking
differences.
Understanding accommodations
Accommodations are tools and procedures that provide equal access to instruction and assessment for
students with disabilities. They are provided to "level the playing field." Without accommodations, students with
disabilities may not be able to access grade-level instruction and participate fully in assessments.
Choosing accommodations
All students with disabilities (those with active IEPs or 504 Plans) are entitled to the appropriate
accommodations that allow them to fully participate in state- and district-wide testing.
Who decides?
The student's IEP/504 team selects the accommodations for both instruction and assessments.
Accommodations should be chosen based on the individual student's needs, not based on the disability
category, grade level, or instructional setting. Once selected, accommodations should be used consistently for
instruction and assessment. Each teacher and others responsible for the implementation of the
accommodations must be informed of the specific accommodations that must be provided.
Selecting accommodations
Determining necessary accommodations should be part of the development of each IEP or 504 Plan.
These questions should be considered in the selection process:
What are the student's learning strengths and needs?
How do the student's learning needs affect the achievement of the grade-level content standards?
What specialized instruction does the student need to achieve the grade-level content standards?
Next, discuss and review the accommodations the student has already been using. Ask these questions:
What accommodations is the student regularly using in the classroom and on tests?
What is the student's perception of how well an accommodation has worked?
Has the student been willing to use the accommodation?
What are the perceptions of the parents, teachers, and others about how the accommodations appear
to have worked?
Have difficulties been administering the selected accommodations?
When deciding on new accommodations, plan how and when the student will learn to use each new
accommodation. Be sure there is plenty of time to learn to use an accommodation before it will be part of the
administration of a state- and district-wide assessment.
Assessment facts
Many states have chosen to add "stakes" for students to their standards and assessment systems. In
some states, students are required to pass one or more high school assessments as a condition of receiving a
diploma. Some states require students to achieve at certain levels on assessments to be promoted to
subsequent grades. It is imperative for parents to understand the implications of student performance on tests
required by your state.
Evaluating accommodations
Evaluating how effective the accommodations are should be an ongoing process — only by closely
reviewing the impact of accommodation can improvements happen. IEP or 504 teams should not assume that
accommodation selection carries over from year to year. Each year the team should review:
Each accommodation and the results of tests when the accommodation was used
Student's perception of how well each accommodation is working
Effective combinations of accommodations
Perceptions of teachers, paraprofessionals, and other specialists about how the accommodations
appear to be working.
1. Is my child using accommodations during classroom instruction that will not be allowed when
taking state- or district-wide assessments?
Because of the nature of certain accommodations, they are only allowed for instruction, not testing. If a
student is accustomed to using such accommodations, the IEP team needs to make certain the student
understands that a particular accommodation(s) won't be available during testing and work to find acceptable
accommodations that can support the student during testing in a comparable manner.
Be sure that accommodations don't lead to inappropriate testing practices such as:
Coaching students during testing
Editing student work
Allowing a student to answer fewer questions
Giving clues to test answers in any way
Reducing the number of responses required
Changing the content by paraphrasing or offering additional information
2. Are the assessment accommodations selected for my child considered "standard" or "non-
standard"?
There is tremendous variance across states regarding testing accommodation policies. Be sure to obtain a
copy of your state guidelines and policies regarding assessment accommodations. These guidelines should
include information on whether accommodations are considered "standard" or "non-standard" as well as
information on any accommodations that might invalidate a test score.