Professional Documents
Culture Documents
Washback or Backwash
Washback or Backwash
Messick (1996), who defined washback as the “extent to which a test influences
language teachers and learners to do things they would not necessarily otherwise do
that promote or inhibit [emphasis added] language learning” (p. 241, as cited in
Alderson & Wall, 1993, p. 117).
Wall and Alderson also noted that “tests can be powerful determiners, both
positively and negatively, [ According to Messick (1996), “for optimal positive
washback there should be little, if any, difference between activities involved in
learning the language and activities involved in preparing for the test” (pp. 241–
242).
However, the lack of simple, one-to-one relationships in such complex systems was
highlighted by Messick (1996): “A poor test may be associated with positive effects
and a good test with negative effects because of other things that are done or not
done in the education system” (p. 242). In terms of complexity and validity,
Alderson and Wall (1993) argued that Washback is “likely to be a complex
phenomenon which cannot be related directly to a test’s validity” (p. 116). The
washback effect should, therefore, refer to the effects of the test itself on aspects of
teaching and learning.
Negative Washback
Tests in general, and perhaps language tests in particular, are often criticized
for their negative influence on teaching—so-called “negative washback”— which has
long been identified as a potential problem.
Positive Washback
Like most areas of language testing, for each argument in favor or opposed to a
particular position, there is a counterargument. There are, then, researchers who
strongly believe that it is feasible and desirable to bring about beneficial changes in
teaching by changing examinations, representing the “positive washback” scenario,
which is closely related to “measurement- driven instruction” in general education.
In this case, teachers and learners have a positive attitude toward the examination
or test, and work willingly and collaboratively toward its objectives.
Such influences were linked to test validity by Shohamy (1993a), who pointed out
that “the need to include aspects of test use in construct validation originates in the
fact that testing is not an isolated event; rather, it is connected to a whole set of
variables that interact in the educational process” (p. 2). Similarly, Linn (1992)
encouraged the measurement research community “to make the case that the
introduction of any new high-stakes examination system should pay greater
attention to investigations of both the intended and unintended consequences of
the system than was typical of previous test-based reform efforts” (p. 29).
As a result of this complexity, Messick (1989) recommended a unified validity
concept, which requires that when an assessment model is designed to make
inferences about a certain construct, the inferences drawn from that model should
not only derive from test score interpretation, but also from other variables
operating within the social context (Bracey, 1989; Cooley, 1991; Cronbach, 1988;
Gardner, 1992; Gifford & O’Connor, 1992; Linn, Baker, & Dunbar, 1991; Messick,
1992). The importance of collaboration was also highlighted by Messick (1975):
“Researchers, other educators, and policy makers must work together to develop
means of evaluating educational effectiveness that accurately represent a school or
district’s progress toward a broad range of important educational goals” (p. 956).
(b) Processes—any actions taken by the participants which may contribute to the
process of learning
(c) Products—what is learned (facts, skills, etc.) and the quality of the learning
Note. Adapted from Hughes, 1993, p. 2. Cited in Bailey (1996).
(a) Dimensions
Watanabe (1997b) conceptualized washback on the following dimensions, each of
which represents one of the various aspects of its nature.
Intensity. Washback may be strong or weak. If the test has a strong effect, then it
will determine everything that happens in the classroom, and lead all teachers to
teach in the same way toward the exams. On the other hand, if a test has a weak
effect, then it will affect only a part of the classroom events, or only some teachers
and students, but not others. If the examination produces an effect only on some
teachers, it is likely that the effect is mediated by certain teacher factors. The
research to date indicates the presence of washback toward the weak end of the
continuum. It has also been suggested that the intensity of washback may be a
function of how high or low are the stakes (Cheng, 1998a).
Length. The influence of exams, if it is found to exist, may last for a short period of
time, or for a long time. For instance, if the influence of an entrance examination is
present only while the test takers are preparing for the test, and the influence
disappears after entering the institution, this is short-term washback. However, if
the influence of entrance exams on students continues after they enter the
institution, this is long-term washback.
Intentionality.
Messick (1989) implied that there is unintended as well as intended washback when
he wrote, “Judging validity in terms of whether a test does the job it is employed to
do . . . requires evaluation of the intended or unintended social consequences of test
interpretation and use. The appropriateness of the intended testing purpose and the
possible occurrence of unintended outcomes and side effects are the major issues”
(p. 84). McNamara (1996) also holds a similar view, stating that “High priority
needs to be given to the collection of evidence about the intended and unintended
effects of assessments on the ways teachers and students spend their time and think
about the goals of education” (p. 22). The researcher has to investigate not only
intended washback but also unintended washback.