Professional Documents
Culture Documents
TOEFL (Test of English As A Foreign Language) Listening Sub-Test
TOEFL (Test of English As A Foreign Language) Listening Sub-Test
English as a Foreign
Language) Listening Sub-Test
RICHARD BADGER
Framing the Issue
of English. While ETS aims for realism (i.e., how the materials are experienced)
rather than authenticity (i.e., the origin of the texts), materials in the iBT practice
book seem generally authentic as they have many elements of natural speech.
Candidates can take notes while listening but do not see the questions until the
recording has finished. The question formats in both sections are objective but may
require candidates to select one or more options from a range of choices, sequence
events, match multiple choices, or categorize objects or text extracts. The questions
are designed so that they can be understood without any background knowledge
but may involve the implications or purpose of a particular utterance.
In task 1 of the writing section, the integrative writing test, candidates summa-
rize a text of 230 to 300 words that they have listened to and relate it to information
they derive from a separate reading passage.
The two main issues related to the TOEFL are: Is it is a valid listening test? And
what is its impact on teaching, learning, and admissions policies in anglophone
universities? The next sections of this entry examine the validity of the TOEFL
listening sub-test and the TOEFL’s impact.
Making the Case
Test evaluation has traditionally been carried out on the criterion of validity, which
is the central concept in testing. Validity can be understood as a measure of the
correspondence between real-world facets and test facets. This is not so much a
quality of the test as of how the information from a test is used; and validity is best
addressed by considering the evidence upon which a decision based on a test score
could be justified (Chapelle, Enright, & Jamieson, 2008).
The listening sub-test tasks differ from listening in higher education institutions.
First, lectures are typically just under 1 hour, as opposed to the 3- to 5-minute-long
extracts in the test. Second, lectures are parts of modules selected by students and
are intended to convey discipline-specific arguments, unlike a listening test, which
aims to provide samples of a wide range of disciplines, covers the arts, life sci-
ences, physical sciences, and social sciences, and requires a general understanding
of the basic and pragmatic meaning of the extract (Bejar, Douglas, Jamieson,
Nissan, & Turner, 2000). Third, the listening sub-test of the TOEFL does not reflect
the extent to which listening is embedded in other forms of communication in
universities. For example, listening to a lecture leads into the creation of essays or
term papers. But these are, in part, addressed by the integrative writing task,
where one of the measures of success is the ability to connect and synthesize infor-
mation from different sources. The differences described in this paragraph seem to
be a necessary part of designing a practical test. It would be hard to imagine tests
where candidates listened to 1-hour lectures depending on their choice of major.
As practicable ways of bringing the TOEFL iBT closer to typical university listen-
ing are difficult to imagine, they will not be discussed further here.
Other aspects of the validity of the listening element of the TOEFL are more
easily addressed within a testing frame and might be considered more powerful
challenges to the validity of the TOEFL test. First, the listening sub-test focuses on
lectures, interactions between academics and students, and service exchanges
within an academic context. Lectures are the key listening task in university
programs. Sawaki and Nissan’s (2009) survey of 145 undergraduate and post-
graduate students in three American universities found that 42% of the under-
graduate and 52% of the postgraduate programs were lecture courses. They also
found that what students learned in lectures made an important contribution to
the assessment. The inclusion of listening activities other than lectures is a strength
of the TOEFL and addresses Lynch’s critique of academic listening exams in
general (Lynch, 2011). However, while the inclusion of service encounters has
plausibility, it is not entirely clear that this is the most important interaction out-
side the lecture theatre that involves listening. Many international students
prioritize social conversations, often with non-native speakers of English, over
service encounters. There is a lack of information about the range of interactions,
including service encounters, that are important for international students in
higher education. It would be helpful to have more research on the interactions in
which international students engage as a basis for the inclusion of service encoun-
ters in the listening section; but the omission of social encounters is at least
potentially problematic.
Second, the listening sub-test of the TOEFL does not accurately represent the
multimodal nature of lectures where diagrams, drawings, and videos are a routine
part. The use of photographs in the iBT is an attempt to address this issue but
remains a limited reflection of actual lectures where the majority of lecturers make
use of presentation software, most often PowerPoint. Research on whether
PowerPoint leads to better or worse recall of information has produced mixed
results. However, there is very little research on how students draw on PowerPoint
slides to support their own learning and none that I know of that examines how
PowerPoint presentations, handouts, and lecturers’ oral presentation are com-
bined by students as they make sense of lectures. For some students, reading the
PowerPoint slides is a more important part of lecture attendance than listening to
what the lecturer says, and this creates problems for what counts as an academic
listening test. There is also a lack of expertise on the use of multimodal recordings
in testing, but tests that incorporate video are now appearing, though there is still
little research as to how well they function as tests.
The TOEFL listening sub-test does not currently reflect the multimodal nature of
lectures. It is important that tests should be robust, particularly when there is a
dearth of evidence about this particular kind of “real-life language” use. However,
video is a well-established part of language teaching and could be an area of
TOEFL development.
Pedagogical Implications
In this section I look at the washback of the TOEFL on preparations courses and at
how the TOEFL is—or can be—used in the admissions process.
Washback or backwash is the impact of tests on courses that are intended to pre-
pare candidates for these tests. While many teachers believe that tests have consid-
erable impact, the way in which tests impact teaching depends on the beliefs and
practices of teachers and learners. This argument was investigated in a study car-
ried out by Wall and Horak (2006, 2008, 2011) on TOEFL preparation courses in six
countries in Central and Eastern Europe; the study aimed to understand how
teachers’ beliefs and practices changed as the TOEFL iBT replaced computer-based
testing (CBT). One of the reasons for the revision of the TOEFL, and in particular for
the introduction of the integrated listening–reading–writing task, was to create
positive washback (Wall & Horak, 2006). The two researchers found that, for the
TOEFL CBT, the most common activity in the listening class was for the students to
practice test-like listening items (Wall & Horak, 2006). The changes that teachers
were planning in response to the iBT version were designed to allow students to
take notes, to use longer listening passages, and to practice integrating information
from listening and reading texts; but, in practice, teachers did not provide much
support for the development of either note-taking skills or abilities to integrate
information from different sources (Wall & Horak, 2011). The main impact of the
TOEFL remained the fact that students were doing testlike activities, though there
was more student–student interaction because of the teaching materials (Wall &
Horak, 2011).
The rather depressing impact of the TOEFL on pedagogy might be addressed in
three ways. One is to improve teacher education for preparing candidates for the
TOEFL so that teachers are able to help their students develop the underlying
skills that the TOEFL listening sub-test is attempting to assess. For example, rather
than simply doing exam practice on integrating information sources, learners can
be scaffolded to develop the metacognitive and reflective abilities that would help
them synthesize what they have learnt from lectures and reading. Second, and in
line with Wall and Horak’s emphasis on the importance of teaching materials in
mediating the washback of a test, designing teaching materials related, say, to
effective note taking and developing strategies for combining spoken and written
information would be likely to lead to more effective preparation classes. Third,
some changes to test content are desirable, as current classes are not providing
students with the most effective preparation for study in anglophone universities,
as a result of the exclusion of purely social interactions and of the limited extent to
which the text reflects the multimodal nature of lectures. Developments of the test
to encompass varieties from non-native speakers of English and a wider range of
visual elements in the lecture extracts would go some way to addressing this
last issue.
Another way in which the TOEFL has an impact on broader society is through
its use for university admissions. TOEFL scores are often used as part of the selec-
tion for admission to study at anglophone universities, and it is important that
those involved in the admission process have a clear understanding of the limits
of the information that the TOEFL or similar tests can provide. However, one
important kind of evidence for the validity of the TOEFL is the extent to which it
can be used to predict future academic success. Wait and Gressel’s (2009) study of
over 6,000 students at an American university in the United Arab Emirates found
that higher TOEFL scores were associated with higher grade point averages
(GPAs) but that there were important differences between disciplines; for exam-
ple, the association was weaker for engineering students. Wait and Gressel also
found that there were many students whose academic performance defied the
general pattern. Cho and Bridgeman (2012) carried out a study of 2,594 interna-
tional students at universities in America and found that the TOEFL score
accounted for between 6% and 7% of the variance in GPA for postgraduate pro-
grams and 3% for undergraduate ones. Further analysis showed that students
with better TOEFL scores tended to have better GPAs. These results confirm that
there is a significant, though small, relationship between TOEFL and academic
performance. This is not surprising. The fact that a non-native speaker of English
has a good command of this language has no necessary connection with that
speaker’s academic ability as reflected in his or her grade point average (Cho &
Bridgeman, 2012). This means that admission processes should be wary about
overreliance on TOEFL scores.
References
Bejar, I., Douglas, D., Jamieson, J., Nissan, S., & Turner, J. (2000). TOEFL 2000 listening
framework: A working paper. Retrieved from https://www.ets.org/research/policy_
research_reports/publications/report/2000/iciu
Chapelle, C. A., Enright, M. K., & Jamieson, J. (2008). Building a validity argument for the Test
of English as a Foreign Language. London, England: Routledge.
Cho, Y., & Bridgeman, B. (2012). Relationship of TOEFL iBT® scores to academic performance:
Some evidence from American universities. Language Testing, 29, 421–42. doi:10.1177/
0265532211430368
Educational Testing Services. (2012). The official guide to the TOEFL Test (4 ed.). New York,
NY: McGrawHill.
Lynch, T. (2011). Academic listening in the 21st century: Reviewing a decade of research.
Journal of English for Academic Purposes, 10, 79–88.
Sawaki, Y., & Nissan, S. (2009). Criterion-related validity of the TOEFL iBT listening section.
Retrieved from https://www.ets.org/research/policy_research_reports/publications/
report/2009/hvea
Wait, I. W., & Gressel, J. W. (2009). Relationship between TOEFL score and academic success
for international engineering students. Journal of Engineering Education, 98, 389–98.
doi:10.1002/j.2168–9830.2009.tb01035.
Wall, D., & Horak, T. (2006). The impact of changes in the TOEFL examination on teaching and
learning in central and eastern Europe: Phase 1, the baseline study. Princeton, NJ: Educational
Testing Service.
Wall, D., & Horak, T. (2008). The impact of changes in the TOEFL examination on teaching and
learning in Central and Eastern Europe: Phase 2, coping with change. Princeton, NJ: Educational
Testing Service.
Wall, D., & Horak, T. (2011). The impact of changes in the TOEFL® Exam on teaching in a sample
of countries in Europe: Phase 3, the role of the coursebook, Phase 4, describing change. Princeton,
NJ: Educational Testing Service.