Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Washback or backwash

Washback or backwash, a term now commonly used in applied linguistics, refers to


the influence of testing on teaching and learning (Alderson & Wall, 1993), and has
become an increasingly prevalent and prominent phenomenon in education—“what
is assessed becomes what is valued, which becomes what is taught” (McEwen,
1995a, p. 42). There seems to be at least two major types or areas of washback or
backwash studies—those relating to traditional, multiple-choice, large-scale tests,
which are perceived to have had mainly negative influences on the quality of
teaching and learning (Madaus & Kellaghan, 1992; Nolan, Haladyna, & Haas, 1992;
Shepard, 1990), and those studies where a specific test or examination1 has been
modified and improved upon (e.g., performance-based assessment), in order to
exert a positive influence on teaching and learning (Linn & Herman, 1997; Sanders
& Horn, 1995). The second type of studies has shown, however, positive, negative,
or no influence on teaching and learning. Furthermore, many of those studies have
turned to focus on understanding the mechanism of how washback or backwash is
used to change teaching and learning Washback (Alderson & Wall, 1993) or
backwash (Biggs, 1995, 1996) here refers to the influence of testing on teaching and
learning. The concept is rooted in the notion that tests or examinations can and
should drive teaching, and hence learning, and is also referred to as measurement-
driven instruction (Popham, 1987).
Wall (1997) distinguished between test impact and test washback in terms of the
scope of the effects. According to Wall, impact refers to “. . . any of the effects that a
test may have on individuals, policies or practices, within the classroom, the school,
the educational system or society as a whole” (see Stecher, Chun, & Barron, chap. 4,
this volume), whereas Washback (or backwash) is defined as “the effects of tests on
teaching and learning” (Wall, 1997, p. 291).

Messick (1996), who defined washback as the “extent to which a test influences
language teachers and learners to do things they would not necessarily otherwise do
that promote or inhibit [emphasis added] language learning” (p. 241, as cited in
Alderson & Wall, 1993, p. 117).

Wall and Alderson also noted that “tests can be powerful determiners, both
positively and negatively, [ According to Messick (1996), “for optimal positive
washback there should be little, if any, difference between activities involved in
learning the language and activities involved in preparing for the test” (pp. 241–
242).

However, the lack of simple, one-to-one relationships in such complex systems was
highlighted by Messick (1996): “A poor test may be associated with positive effects
and a good test with negative effects because of other things that are done or not
done in the education system” (p. 242). In terms of complexity and validity,
Alderson and Wall (1993) argued that Washback is “likely to be a complex
phenomenon which cannot be related directly to a test’s validity” (p. 116). The
washback effect should, therefore, refer to the effects of the test itself on aspects of
teaching and learning.

Negative Washback
Tests in general, and perhaps language tests in particular, are often criticized
for their negative influence on teaching—so-called “negative washback”— which has
long been identified as a potential problem.

Positive Washback
Like most areas of language testing, for each argument in favor or opposed to a
particular position, there is a counterargument. There are, then, researchers who
strongly believe that it is feasible and desirable to bring about beneficial changes in
teaching by changing examinations, representing the “positive washback” scenario,
which is closely related to “measurement- driven instruction” in general education.
In this case, teachers and learners have a positive attitude toward the examination
or test, and work willingly and collaboratively toward its objectives.

WASHBACK: FUNCTIONS AND MECHANISMS


Traditionally, tests have come at the end of the teaching and learning process for
evaluative purposes. However, with the widespread expansion and proliferation of
high-stakes public examination systems, the direction seems to have been largely
reversed. Testing can come first in the teaching and learning process. Particularly
when tests are used as levers for change, new materials need to be designed to
match the purposes of a new test, and school administrative and management staff,
teachers, and students are generally required to learn to work in alternative ways,
and often work harder, to achieve high scores on the test. In addition to these
changes, many more changes in the teaching and learning context can occur as the
result of a new test, although the consequences and effects may be independent
of the original intentions of the test designers, due to the complex interplay of
forces and factors both within and beyond the school.

Such influences were linked to test validity by Shohamy (1993a), who pointed out
that “the need to include aspects of test use in construct validation originates in the
fact that testing is not an isolated event; rather, it is connected to a whole set of
variables that interact in the educational process” (p. 2). Similarly, Linn (1992)
encouraged the measurement research community “to make the case that the
introduction of any new high-stakes examination system should pay greater
attention to investigations of both the intended and unintended consequences of
the system than was typical of previous test-based reform efforts” (p. 29).
As a result of this complexity, Messick (1989) recommended a unified validity
concept, which requires that when an assessment model is designed to make
inferences about a certain construct, the inferences drawn from that model should
not only derive from test score interpretation, but also from other variables
operating within the social context (Bracey, 1989; Cooley, 1991; Cronbach, 1988;
Gardner, 1992; Gifford & O’Connor, 1992; Linn, Baker, & Dunbar, 1991; Messick,
1992). The importance of collaboration was also highlighted by Messick (1975):
“Researchers, other educators, and policy makers must work together to develop
means of evaluating educational effectiveness that accurately represent a school or
district’s progress toward a broad range of important educational goals” (p. 956).

The Tracheotomy Backwash Model


(a) Participants—students, classroom teachers, administrators, materials
developers and publishers, whose perceptions and attitudes toward their work may
be affected by a test

(b) Processes—any actions taken by the participants which may contribute to the
process of learning

(c) Products—what is learned (facts, skills, etc.) and the quality of the learning
Note. Adapted from Hughes, 1993, p. 2. Cited in Bailey (1996).

WASHBACK: THE CURRENT TRENDS


IN ASSESSMENT
One of the main functions of assessment is generally believed to be as one form of
leverage for educational change, which has often led to top-down educational
reform strategies by employing “better” kinds of assessment practices (James,
2000; Linn, 2000; Noble & Smith, 1994a). Assessment practices are currently
undergoing a major paradigm shift in many parts of the world, which can be
described as a reaction to the perceived shortcomings of the prevailing paradigm,
with its emphasis on standardized testing (Biggs, 1992, 1996; Genesee, 1994).
Alternative or authentic assessment methods have thus emerged as systematic
attempts to measure learners’ abilities to use previously acquired knowledge in
solving novel problems or completing specific tasks, as part of this use of
assessment to reform curriculum and improve instruction at the school and
classroom level (Linn, 1983, 1992; Lock, 2001; Noble & Smith, 1994a, 1994b;
Popham, 1983).
Therefore, I begin with an outline of the complexity of the phenomenon called
washback.

(a) Dimensions
Watanabe (1997b) conceptualized washback on the following dimensions, each of
which represents one of the various aspects of its nature.

Specificity. Washback may be general or specific. General Washback means a


type of effect that may be produced by any test. For example, if there is a hypothesis
that a test motivates students to study harder than they would otherwise, washback
here relates to any type of exam, hence, general washback. Specific washback, on
the other hand, refers to a type of washback that relates to only one specific aspect
of a test or one specific test type. For example, a belief that if a listening component
is included in the test, the students and teachers will emphasize this aspect in their
learning or teaching.

Intensity. Washback may be strong or weak. If the test has a strong effect, then it
will determine everything that happens in the classroom, and lead all teachers to
teach in the same way toward the exams. On the other hand, if a test has a weak
effect, then it will affect only a part of the classroom events, or only some teachers
and students, but not others. If the examination produces an effect only on some
teachers, it is likely that the effect is mediated by certain teacher factors. The
research to date indicates the presence of washback toward the weak end of the
continuum. It has also been suggested that the intensity of washback may be a
function of how high or low are the stakes (Cheng, 1998a).

Length. The influence of exams, if it is found to exist, may last for a short period of
time, or for a long time. For instance, if the influence of an entrance examination is
present only while the test takers are preparing for the test, and the influence
disappears after entering the institution, this is short-term washback. However, if
the influence of entrance exams on students continues after they enter the
institution, this is long-term washback.

Intentionality.
Messick (1989) implied that there is unintended as well as intended washback when
he wrote, “Judging validity in terms of whether a test does the job it is employed to
do . . . requires evaluation of the intended or unintended social consequences of test
interpretation and use. The appropriateness of the intended testing purpose and the
possible occurrence of unintended outcomes and side effects are the major issues”
(p. 84). McNamara (1996) also holds a similar view, stating that “High priority
needs to be given to the collection of evidence about the intended and unintended
effects of assessments on the ways teachers and students spend their time and think
about the goals of education” (p. 22). The researcher has to investigate not only
intended washback but also unintended washback.

Value. Examination washback may be positive or negative. Because it is not


conceivable that the test writers intend to cause negative washback, intended
washback may normally be associated with positive washback, while unintended
washback is related to both negative and positive washback. When it comes to the
issue of value judgment, the washback research may be regarded as being a part of
evaluation studies. The distinction between positive and negative could usefully be
made only by referring to the audience. In other words, researchers need to be
ready to answer the question, “who the evaluation is for” (Alderson, 1992). For
example, one type of outcome may be evaluated as being positive by teachers,
whereas the same outcome may be judged to be negative by school principals. Thus,
it is important to identify the evaluator when it comes to passing value judgment
(see also chap. 1, this volume).

You might also like