Professional Documents
Culture Documents
Assessing Pronunciation Pennington - Rogerson-Revell CH 6
Assessing Pronunciation Pennington - Rogerson-Revell CH 6
English Pronunciation
Teaching and Research
Contemporary Perspectives
Series Editors
Christopher N. Candlin
Macquarie University
Sydney, NSW, Australia
Jonathan Crichton
University of South Australia
Adelaide, SA, Australia
English
Pronunciation
Teaching and
Research
Contemporary Perspectives
Martha C. Pennington Pamela Rogerson-Revell
SOAS and Birkbeck College English
University of London University of Leicester
London, UK Leicester, UK
This Palgrave Macmillan imprint is published by the registered company Springer Nature Limited
The registered company address is: The Campus, 4 Crinan Street, London, N1 9XW, United Kingdom
Contents
6 Assessing Pronunciation 287
xi
xii Contents
A
uthor Index 465
Subject Index 481
6
Assessing Pronunciation
Introduction
As we have stressed in the preceding chapters, pronunciation is a com-
plex area of language proficiency and performance linked to a wide
range of human behaviors and incorporating both physiological and
social factors. This complexity presents challenges in pronunciation
assessment in terms of what and how to measure this aspect of language
proficiency. In addition, the fact that pronunciation can vary in relative
independence from other aspects of language proficiency raises issues
regarding the weight which should be given to an L2 learner’s pronun-
ciation in the overall assessment of language competence or achieve-
ment. At the present point in time, reconsiderations of the nature of
pronunciation and its contribution to language proficiency are having
an impact on language assessment, an area of pronunciation theory and
application that has developed over time as a tradition constituting a
variety of practices connected to language teaching, testing, and research.
Pronunciation experts and testing specialists are only just now address-
ing questions about what is to be assessed and how assessment is to be
carried out that were first raised in the 1990s but have become more
pressing as globalization has increased the demand for testing of lan-
guage competence for academic study and for specific types of
employment.
In this chapter, we review testing concepts and the current state of
pronunciation assessment and critique a number of standardized tests
that include a pronunciation component while also seeking to provide
our own perspectives on the “what” and “how” questions of pronuncia-
tion testing. We conclude with recommendations for research and prac-
tice that incorporate others’ critiques of and recommendations for
pronunciation testing along with our own. In particular, we suggest fur-
ther developing pronunciation assessment to include, on the one hand,
more narrowly or autonomously oriented tasks involving auditory per-
ception and imitation and, on the other hand, more broadly communica-
tive tasks involving adjustment of pronunciation in response to factors of
audience and context.
than multiple-choice items” (p. 1), though perceptual tests based on mul-
tiple choice or other procedures involving selecting the correct answer, such
as forced choice, are possible as an aspect of pronunciation assessment (see
below). In addition, in a multiple-choice type assessment, Messick (1996)
observes, “the selected option can only be appraised for correctness or
goodness with respect to a single criterion. There is no record, as in the typi-
cal performance assessment, of an extended process or product that can be
scored for multiple aspects of quality” (p. 2). Testing realized through dis-
crete-item and limited-choice response is the usual approach for measuring
a conceptual domain, where it is easy to quantify the result of measurement
on a scale of 0–100% correct response. Performance assessment is less easily
quantified and may involve more qualitative and holistic forms of evalua-
tion. When quantification is employed in performance assessment, it is
usually based on a scale with a relatively narrow range, typically 1–10 or
1–6, with possible gradations within each level of the rating scale.
Shavelson, Baxter, and Gao (1993) describe performance assessment as
involving a number of factors or facets: “we view a performance assess-
ment as a sample of student performance drawn from a complex universe
defined by a combination of all possible tasks, occasions, raters, and mea-
surement methods” (p. 216). Because differences in all of these factors
can result in differences in the outcome of assessment, the contribution
of each one must be carefully considered in the design of assessment tasks
and carefully managed in assessment procedure. Issues in the design of
pronunciation assessment tasks, whether in an independent evaluation or
as a component of a broader assessment of speaking, include those identi-
fied by Fulcher (2015): “rating scale development, construct definition,
operationalization, and validation” (p. 199). All of the test design issues
identified by Fulcher include considerations of the types of tasks used to
sample performance, such as how natural and representative they are, and
the way the resulting performance is assessed, such as the levels and crite-
ria of rating scales, their appropriateness and consistency, and how easy
or difficult they are for raters to use. Further issues that arise in assess-
ment procedure concern the uniformity or consistency of implementation,
including the conditions under which the assessment tasks are carried out
and the characteristics and performance of the raters and any instruments
involved.
Assessing Pronunciation 291
Validity is not a property of the test or assessment as such, but rather of the
meaning of the test scores. These scores are a function not only of the items
or stimulus conditions, but also of the persons responding as well as the
context of the assessment. In particular, what needs to be valid is the mean-
ing or interpretation of the scores as well as any implications for action that
this meaning entails…. [V]alidity is an evolving property and validation a
continuing process. (p. 1)
speaker control of the form and content of talk, makes for a more valid
and also a more challenging spoken language test, both from the point of
view of the test designer and the test taker.
Since the 1990s, applied linguists (e.g., Riggenbach, 1998; Tarone,
1998) have highlighted context-dependent variation affecting both L1 and
L2 speech and have called for spoken language tests to include a variety of
types of speaking tasks in order to sample a range of speaker competencies
and so to increase the validity of those tests. As Chalhoub-Deville (2003)
has argued, the context of communication activates different aspects of a
language user’s abilities and repertoire, so that performance is affected by
aspects of the context in which that performance is elicited. This fact has
been noted in pronunciation assessment, where there is increasing aware-
ness of the different types of performance that are elicited by different types
of tasks, such as reading words, sentences, or a paragraph aloud; responding
to specific prompts; communicating in a meaningful task; or spontaneous
speech outside a testing context. Shavelson et al. (1993) noted that incon-
sistencies in performance across tasks, known as task-sampling variability,
has been a problem in studies of writing, mathematics, and science achieve-
ment, where “[l]arge numbers of tasks are needed to get a generalizable
measure” (p. 217). The same is true for speaking performance and for the
pronunciation component of speaking. Ideally, assessment tasks should
assess learners’ pronunciation elicited under different conditions, to include
some range of variation. This would mesh well with an understanding of
pronunciation as being constrained by the requirements of both cognitive
processing and interactive communication (what Bygate, 1987, calls “reci-
procity”). It also meshes well with sociolinguistic and translanguaging views
of pronunciation (Pennington, 2015) as partly automatized and partly
under speakers’ control, an aspect of their expression of identity, affiliation,
and other aspects of meaning.
Measures of Pronunciation
Since the definition of pronunciation is not a settled matter, issues of the
construct of pronunciation arise in considerations of assessment. Even as
the matter of what pronunciation is remains undetermined, language
Assessing Pronunciation 297
Fluency
Intelligibility
Comprehensibility
this word (see Chaps. 3 and 8 for discussion of research using these con-
cepts). This broad notion of intelligibility, which is in essence compre-
hensibility as measured by listeners’ subjective scalar ratings, is the notion
underlying many proficiency test scales for spoken language perfor-
mance, as noted by Isaacs & Trofimovich (2012, p. 477), such as in the
Test of English as a Foreign Language (TOEFL) and the International
English Language Testing System (IELTS).
Whereas actual comprehension of speech—that is, “intelligibility” in
the sense of Munro et al. (2006), as “the extent to which a speaker’s utter-
ance is actually understood” (p. 112)—might be assessed by comprehen-
sion questions or true-false statements related to the speaker’s intended
message, comprehensibility is generally conceived of as listeners’ percep-
tions of how easy or hard it is to understand an L2 speaker, evaluated on
an ease-of-understanding scale, the approach taken by Munro and
Derwing (1995, 2001).
Perception
Multiple Measures
speech than spontaneous speech, and also that measures which are
appropriate (reliable and valid) for judging pronunciation in read
speech might be different from those which are appropriate for assess-
ing pronunciation in spontaneous speech. A later study within the
Dutch research group (van Doremalen, Cucchiarini, & Strik, 2010)
revealed a higher frequency of phonetic substitution and deletion
errors in read than the spontaneous speech, which may be due to
interference from routinized pronunciation patterns tied to written
language, or to the stress or unnaturalness which many feel when
reading aloud in an academic or testing environment. As van
Doremalen et al. (2010) comment, “If some of the errors observed in
L2 read speech are simply decoding errors caused by insufficient
knowledge of L2 orthography or interference from it, it is legitimate
to ask whether such errors are pronunciation errors at all.” Asking this
question, in the context of the results of this set of related studies,
suggests the complexity of pronunciation assessment, how many
issues are just becoming apparent, and how many questions remain to
be asked and answered. Questions which have not sufficiently been
addressed in relation to multiple measures in pronunciation assess-
ment are (i) which combined features of pronunciation competence
are most relevant for specific types of work and ESP contexts and (ii)
how these should be assessed.
increase the correlations between their ratings and those of human raters,
an important consideration for both their criterion-related and face valid-
ity that is also central to building an argument about whether the auto-
mated assessment matches the reality of people’s judgements of L2
speaking competence. If their ratings of pronunciation do not match
those of other accepted ways of assessing it, then their consequential
validity is at issue in the important sense that the testing of pronunciation
has become big business with significant consequences for people’s lives
and employment.
Left to their own devices, raters tend to vary in how they score the same
performance. The variability decreases if they are trained; and it decreases
over time through the process of social moderation. With repeated practice
raters start to interpret performances in the same way as their peers. But
when severed from the collective for a period of time, judges begin to reas-
sert their own individuality, and disagreement rises. (p. 201)
This suggests the need for raters to work together at one time and in
the same place, which limits the possibilities for administering an
assessment task or test which is scored by human raters: either the
processes of assessment and scoring both take place in the same time
and space, or speakers’ performance must be audio-recorded so that
the scoring can be done at a distance by a group of raters all working
together at the same time in another place. A third alternative is to
develop a scoring procedure that removes the element of individual
human judgement, by audio-recording speakers’ performance and
analyzing that performance by machine. There is still an element of
human judgement, as the interpretation of those analyses must be
tied to some meaningful evaluations of performance in terms of use-
ful descriptors or levels of performance or proficiency.
Where there is significant rater variation, even after training, this
tends to invalidate a test or assessment procedure, though for tests of
complex skills such as writing or speaking, a degree of interrater varia-
tion (up to about 30%) is generally expected and accepted. If the
interrater variation is systematic, that is, if it is attributable to specific
characteristics of raters (other than specific expertise), this calls a rat-
ing into question. Besides the complexity of the domain being mea-
sured, low inter-rater reliability can be caused by what is termed
“construct-irrelevant variance” that is related to listener background
variables (e.g., accent familiarity).
308 M. C. Pennington and P. Rogerson-Revell
Since the time of the sound spectrograph when “visible speech” was made
possible, new methods of analyzing pronunciation by machine have been
developed for research, teaching, and testing. Because of the great advan-
tages in being able to automate the testing process, computers have
310 M. C. Pennington and P. Rogerson-Revell
and practice, including for testing. Thus, the CEFR levels provide refer-
ence points for assessing performance on a number of other tests, includ-
ing IELTS and CAE, described below, as well as other major language
tests such as TOEFL. It is encouraging to see such an influential frame-
work affecting the structure of language tests.
The scale for pronunciation, termed the “Phonological Control Scale,”
developed in 2001, describes pronunciation proficiency for five of the six
CEFR levels, excluding the top level, as shown in Fig. 6.1. The phono-
logical control scale “has been critiqued by researchers as lacking con-
sistency, explicitness and a clear underlying construct” (Harding,
2017, p. 16). Harding (2017) carried out research with experienced
pronunciation raters, who found the scales unclear (using t erminology
that could not easily be matched to performance, such as “natural”
to describe intonation and “noticeable” to describe foreign accent),
inconsistent across levels (in terms of reference to intonation and
C2 As C1
C1 Can vary intonation and place sentence stress correctly in order to express finer
shades of meaning.
Fig. 6.1 The CEFR phonological control scale (Council of Europe, 2001, p. 117)
316 M. C. Pennington and P. Rogerson-Revell
C2 Can employ the full range of Can articulate Can exploit prosodic
phonological features in the target virtually all the features (e.g. stress,
language with a high level of control – sounds of the target rhythm and
including prosodic features such as language with clarity intonation)
word and sentence stress, rhythm and and precision appropriately and
intonation – so that the finer points of effectively in order to
his/her message are clear and convey finer shades
precise. Intelligibility and effective of meaning (e.g. to
conveyance of and enhancement of differentiate and
meaning are not affected in any way emphasise).
by features of accent that may be
retained from other language(s).
Fig. 6.2 Extract from the CEFR phonological control descriptor scale (Council of
Europe, 2017, p. 134)
318 M. C. Pennington and P. Rogerson-Revell
Our final example of a test using human raters is the Step 2 Clinical Skills
(CS) performance exam, which has been designed for a specific context
and purpose and which includes measures of pronunciation skill or pro-
ficiency as part of certification for medical practice in the United States.
The Step 2 CS exam is part of the U. S. Medical Licensing Examination
“required of all physicians who seek graduate training positions in the United
States, including graduates from medical schools located in the United States
and other countries” (van Zanten, 2011, p. 78). The exam evaluates candi-
dates on the dimensions of Spoken English Proficiency, Communication and
Interpersonal Skills, and Integrated Clinical Encounter, based on their inter-
actions with actors simulating patients whom the candidate must interact
with “as they would with actual patients, gathering relevant data through
questioning the patients about their illnesses, performing physical exams,
counseling patients on risky behaviors, and providing [them] with informa-
tion about possible diagnosis and follow-up plans” (van Zanten, 2011, p. 79).
Candidates perform 12 simulated interviews of up to 15 minutes each, and
are given a further 10 minutes to write up a patient note from the interview.
The raters of the test are the simulated patients (SPs), who undergo specific
rater training for this purpose.
As van Zanten (2011) describes it, the Spoken English Proficiency rating
is a “holistic judgment of…overall listener effort and how effectively the
physician communicated, not simply whether or not the physician had a
foreign accent” (p. 81), but with specific consideration of mispronuncia-
tions that might have caused difficulty in listener understanding or com-
munication breakdown. The candidate’s Communication and Interpersonal
Skills are measured in the dimensions of questioning skills, information-
sharing skills, and professional manner and rapport (p. 80). Some aspects of
pronunciation, especially prosody, may be evaluated indirectly in these three
categories, as questioning skills includes the candidate’s “facilitating remarks”
to the patient; information-sharing skills include “acknowledge[ing] patient
issues and concerns and…then clearly respond[ing];” and professional man-
ner and rapport includes “encourage[ing] additional questions or discus-
sion, …mak[ing] empathetic remarks, and…show[ing] consideration for
patient comfort during the physical examination” (pp. 80–81).
Assessing Pronunciation 321
Versant
One of the most extensively developed automated systems for oral profi-
ciency assessment is Pearson Education’s Versant technologies, originally
developed as the Phone Pass system at the Ordinate company. The Versant
system provides automated assessment by phone or on computer and com-
puterized scoring for assessing spoken English, Aviation English (designed
to assess the job-related spoken English proficiency of pilots and air traffic
controllers), and spoken language assessments for Arabic, Chinese, Dutch,
French, and Spanish.12 According to the Versant w ebsite, which includes
information about the form and the validation of the Versant tests (e.g.,
Bernstein, 2004; Suzuki et al., 2006), spoken proficiency is assessed by hav-
ing test takers perform a variety of speaking tasks, including reading sen-
tences aloud, repeating sentences, giving short answers to short questions,
building sentences (by reordering phrases given out of order), retelling brief
322 M. C. Pennington and P. Rogerson-Revell
across different task types within their model constructs and analyzable
features. Research is ongoing into the different constructs, measured
dimensions of speech, and model of speaking proficiency on which the
SpeechRater system is based (e.g., Loukina, Zechner, Chen, & Heilman,
2015; Xi, Higgins, Zechner, & Williamson, 2012).
Pronunciation Aptitude
Other than some of the Versant test tasks, pronunciation is not usually
assessed outside a classroom setting as an individual aspect of L2 proficiency
or achievement. The separate assessment of pronunciation is however the
focus of some of the subtests of language aptitude. The Modern Language
Aptitude Test (MLAT; Carroll & Sapon, 2002/1959) taps into perceptual
aspects of pronunciation in its first three subtests. According to a description
and sample items from the test provided by the Language Learning and
Testing Foundation16 the first part of the test is Number Learning, in which
test takers learn a set of simple and easily pronounceable number words for
an unknown language (e.g., /ba/ and /ti/) based on aural input and then use
this explicit knowledge to recognize those numbers in different combina-
tions. This subtest thus incorporates the ability to discriminate different
sounds (an analytical ability underlying perception of speech as intelligible
and all other spoken language abilities) and to retain these in memory.
The second part of the MLAT is Phonetic Script, in which test takers
learn sets of four one-syllable nonsense words (e.g. buk and geeb) given in
aural and written form using a made-up script (which has commonalities
with IPA symbols and other systems of pronunciation symbols) and then
must show recall by matching heard nonsense words to the correct script
out of the four choices. This subtest, like the first one, incorporates the abil-
ity to discriminate different sounds which underlies speech perception as
well as the ability to learn sound-symbol correspondences (an analytical
ability underlying literacy) and to retain these in memory—which is pre-
sumably aided for literate learners by the link to written graphemes.
The third part of the MLA, Spelling Clues, gives altered spellings of
English words based on their approximate pronunciation (e.g., kloz for
Assessing Pronunciation 325
“clothes” and restrnt for “restraint”), though no aural versions are pre-
sented, and asks test takers to find the word with the right meaning from
a list of five choices. It is interesting that this subtest is the one most often
used from among the MLAT individual subtests as a measure of pronun-
ciation aptitude, since it does not directly test either perception or pro-
duction of pronunciation but rather directly tests English language
orthographic and lexical knowledge.
The LLAMA Language Aptitude Test (Meara, 2005) includes two
subtests related to pronunciation, LLAMA D and LLAMMA E. The
LLAMA D subtest is based on a language which is expected to be
unknown to almost all L2 learners, a language of Northwestern British
Columbia, replacing the original version of the test based on Turkish,
which turned out to be known to many of the test takers (Meara,
2005, p. 2). The 2005 version is a new type of subtest described as
assessing “sound recognition” using names of flowers and other natu-
ral objects in the selected obscure Canadian Indian language that are
presented in the form of synthesized sounds, that is, computer-gener-
ated speech. Test takers hear 10 of these synthesized words in the first
phase of the subtest, and then are given further words which they
have to indicate as the same or different from those they heard before.
The LLAMA E subtest is similar to the MLAT subtest Phonetic Script
but with less similarity to any phonetic spelling conventions. Test tak-
ers hear 22 recorded syllables and learn symbols for these by clicking
on labeled buttons on a specially designed computer screen. Each
sound is represented by a combination of a letter symbol and a
numeral, such as “3e” or “9è.” In the test phase, recordings of 2-syllable
sequences are heard with two choices presented for their spelling,
such as “0è3è” or “3e9è” (Meara, 2005, pp. 11–13).
These tests of pronunciation aptitude are sometimes used to assess pro-
nunciation proficiency in research studies (e.g., Granena & Long, 2013;
Hu et al., 2013; Sáfár & Kormos, 2008; Saito, 2018), and these same
measures, or others (e.g., based on imitation), might be usefully consid-
ered in standardized tests of speaking proficiency that include pronuncia-
tion measures, as a way to gain a view of a learner’s pronunciation
proficiency separate from other aspects of speaking.
326 M. C. Pennington and P. Rogerson-Revell
Classroom Assessment
The assessment of pronunciation in a classroom setting may address
any of the different dimensions of pronunciation competence reviewed
above for production (accuracy, fluency, intelligibility, or comprehen-
sibility) or perception (segmental or prosodic), using one or more
focused tasks (e.g., minimal pair discrimination, word or sentence
repetition/imitation) or global tasks (e.g., picture description or ver-
bal response to questions). Classroom assessment may address pro-
nunciation in connection with other language skills (e.g., overall
speaking skills or other aspects of communicative competence) or as
an autonomous area of language, by means of tasks designed to factor
out lexical or grammatical knowledge or any kind of specific language
knowledge (e.g., by repetition or imitation of minimal pair syllables
or nonsense words, as in some tests of pronunciation aptitude). As in
all curriculum and classroom activities, student need and the specific
focus and goals of instruction are crucial determinants of what and
how to assess.
Taking into consideration the different facets of pronunciation and
the different ways that it can be assessed, some general guidelines can
be given for the assessment of pronunciation as part of classroom
practice.
We would add the need to further consider the ways in which pronun-
ciation contributes to strategic language behavior or competence, “the way
speakers use communicative resources to achieve their communicative
goals, within the constraints of their knowledge and of the situation in
which communication takes place” (Pennington, 2008, p. 164). Phonology
is a key element of linguistic and communicative competence in spoken
language that relates to listening comprehension and to written language as
well (through the orthographic system). Such a concept of pronunciation
goes far beyond that which is conceived in terms of either accuracy/native-
ness or intelligibility, suggesting that assessment should not only be tied to
tasks within specific contexts and purposes but also that pronunciation can
suitably be assessed in terms of the extent to which a speaker is able to alter
pronunciation under different circumstances, in order to achieve different
goals. One of the potential limitations of L2 speech, including L2 lingua
franca speech, is the degree of flexibility in speech style, which is an impor-
tant concern of language learners as they advance in proficiency and use the
L2 in more contexts with more different speakers. It is especially a concern
of those who wish to develop a translingual or plurilingual identity or per-
sona and to convey different aspects of their translingual or plurilingual
identity or competence through style-shifting, or specifically what might be
referred to as “pronunciation styling.” With the notable exception of IELTS,
limited flexibility in L2 performance has hardly been recognized by the
speech testing community, much less the subgroup of testers focused on
pronunciation. The expression of identity and affiliation through pronun-
ciation and how this might be assessed is a consideration that has received
little attention to date.
Thus, we see a need in pronunciation assessment to develop social per-
ception and production tasks involving considerations of context and
audience that will incorporate pronunciation pragmatics, such as percep-
tion and production of contextualization cues to metamessages (Chap. 1)
and a shifting focus on different aspects of a message, its meaning or its
form, in response to audience feedback, thereby incorporating aspects of
pronunciation proficiency needed for interactional competence (Young,
2000) and reciprocity (Bygate, 1987). This broader focus on pronuncia-
tion in social context will also address perception and expression of iden-
tity affiliation, accommodation, and rapport.
332 M. C. Pennington and P. Rogerson-Revell
Concluding Remarks
This chapter has considered language assessment in respect of the complex
view of pronunciation presented in Chap. 1. Most of the attention has
been trained on the ways in which pronunciation is assessed within speak-
ing proficiency more generally, as this has usually been the approach taken
in standardized tests and also in some contexts of practice, notably, in many
language classrooms. We have considered the nature of pronunciation and
how testers’ understanding of the nature and construct of pronunciation
impacts the reliability and validity of different approaches to assessment.
For the most part, language testers are not paying sufficient attention to
pronunciation, what it is, how it is measured and rated, and the validation
of pronunciation ratings in assessment instruments. A comparison of
human and machine ratings has pointed up the benefits and drawbacks of
each of these ways of assessing pronunciation and the need to continue
refining the techniques used in these different approaches.
In continuing to scrutinize and revise pronunciation assessments through
ongoing validation studies and other research, pronunciation and testing spe-
cialists will continue to improve those assessments even as they continue to
interrogate the construct of pronunciation and what competence in this area
of language and communication includes. In so doing, they will be impact-
ing the lives of the huge worldwide community of L2 learners and their
opportunities for further study and employment related to their assessed lan-
guage abilities, even as they are also contributing to improving understanding
of this pervasive surface aspect of language, the many kinds of deeper level
linguistic properties which it enacts, and the meanings which it signifies.
334 M. C. Pennington and P. Rogerson-Revell
Notes
1. Computer-based testing programs that measure how far a speaker’s per-
formance diverges from a set norm could perhaps be adapted to both
summative and formative assessment of pronunciation in such attempts
at “dialect adjustment.”
2. This is a vexed concept, as discussed in Chap. 1, that recognizes an accent
only in nonstandard or L2 speech and does not accord with the reality
that all speech is accented.
3. http://www.actfl.org/publications/guidelines-and-manuals/actfl-proficiency-
guidelines-2012/english/speaking
4. http://www.govtilr.org/skills/ILRscale2.htm
5. http://www.govtilr.org/FAQ.htm#14
6. A full report on the phonological scale revision process is available from
the CEFR https://rm.coe.int/phonological-scale-revision-process-report-
cefr/168073fff9
7. http://www.ielts.org/what-is-ielts/ielts-introduction
8. http://www.ielts.org/-/media/pdfs/speaking-band-descriptors.ashx?la=en
9. http://www.cambridgeenglish.org/exams/advanced/exam-format
10. http://www.cambridgeenglish.org/images/167804-cambridge-english-
advanced-handbook.pdf
11. https://www.examenglish.com/examscomparison.php
12. https://www.versanttest.com
13. http://www.ets.org/toefl/ibt/about/content
14. http://www.ets.org/s/toefl/pdf/toefl_speaking_rubrics.pdf
15. http://www.ets.org/research/topics/as_nlp/speech
16. http://lltf.net/wp-content/uploads/2011/09/Test_mlat.pdf
References
Abrahamsson, N., & Hyltenstam, K. (2009). Age of onset and nativelikeness in a
second language: Listener perception versus linguistic scrutiny. Language
Learning, 59(2), 249–306. https://doi.org/10.1111/j.1467-9922.2009.00507.x
American Council on the Teaching of Foreign Languages. (2012). ACTFL pro-
ficiency guidelines 2012. Alexandria, VA: American Council on the Teaching
of Foreign Languages.
Assessing Pronunciation 335
Anderson-Hsieh, J., & Koehler, K. (1988). The effect of foreign accent and
speaking rate on native speaker comprehension. Language Learning, 38(4),
561–613. https://doi.org/10.1111/j.1467-1770.1988.tb00167.x
Ballard, L., & Winke, P. (2017). Students’ attitudes towards English teachers’
accents: The interplay of accent familiarity, comprehensibility, intelligibility,
perceived native speaker status, and acceptability as a teacher. In T. Isaacs &
P. Trofimovich (Eds.), Second language pronunciation assessment:
Interdisciplinary perspectives (pp. 121–140). Bristol, UK and Blue Ridge
Summit, PA: Multilingual Matters.
Bernstein, J. (2004). Spoken language testing: Ordinate’s SET-10. Presentation
given at the TESOL Convention, April 2004, Long Beach, CA. Retrieved
September 2, 2017, from https://www.versanttest.com/technology/featured/3.
TESOL-04.SET.JB.pdf
Bernstein, J., Van Moere, A., & Cheng, J. (2010). Validating automated
speaking tests. Language Testing, 27(3), 355–377. https://doi.org/10.1177/
0265532210364404
Bosker, H. R., Pinget, A.-F., Quené, H., Sanders, T., & de Jong, N. H. (2012). What
makes speech sound fluent? The contributions of pauses, speed and repairs.
Language Testing, 30(2), 159–175. https://doi.org/10.1177/0265532212455394
Brown, J. D. (2005). Testing in language programs: A comprehensive guide to
English language assessment. Upper Saddle River, NJ: Prentice Hall Regents.
Browne, K., & Fulcher, G. (2017). Pronunciation and intelligibility in assessing
spoken fluency. In T. Isaacs & P. Trofimovich (Eds.), Second language pronun-
ciation assessment: Interdisciplinary perspectives (pp. 37–53). Bristol, UK and
Blue Ridge Summit, PA: Multilingual Matters.
Bygate, M. (1987). Speaking. Oxford: Oxford University Press.
Carey, M. D., Mannell, R. H., & Dunn, P. K. (2011). Does a rater’s familiarity with
a candidate’s pronunciation affect the rating in oral proficiency interviews?
Language Testing, 28(2), 201–219. https://doi.org/10.1177/0265532210393704
Carroll, J. B., & Sapon, S. (2002/1959). Modern language aptitude test. San
Antonio, TX and Bethesda, MD: Psychological Corporation and Second
Language Teaching, Inc.
Chalhoub-Deville, M. (2003). Second language interaction: Current perspec-
tives and future trends. Language Testing, 20(4), 369–383. https://doi.org/10
.1191/0265532203lt264oa
Council of Europe. (2001). Common European framework of reference for lan-
guages: Learning, teaching, assessment. Cambridge: Cambridge University
Press.
336 M. C. Pennington and P. Rogerson-Revell
Fulcher, G. (1996). Testing tasks: Issues in task design and the group oral. Language
Testing, 13(1), 23–51. https://doi.org/10.1177/026553229601300103
Fulcher, G. (2015). Assessing second language speaking. Language Teaching,
48(2), 198–216. https://doi.org/10.1017/S0261444814000391
Granena, G., & Long, M. H. (2013). Age of onset, length of residence, language
aptitude, and ultimate L2 attainment in three linguistic domains. Second Language
Research, 29(3), 311–343. https://doi.org/10.1177/0267658312461497
Harding, L. (2017). What do raters need in a pronunciation scale? The user’s
view. In T. Isaacs & P. Trofimovich (Eds.), Second language pronunciation
assessment: Interdisciplinary perspectives (pp. 12–34). Bristol, UK and Blue
Ridge Summit, PA: Multilingual Matters.
Hincks, R. (2001). Using speech recognition to evaluate skills in spoken English.
Lund University, Dept. of Linguistics Working Papers, 49, 58–61. Retrieved
August 15, 2017, from journals.lub.lu.se/index.php/LWPL/article/
viewFile/2367/1942
Hu, X., Ackermann, J., Martin, J. A., Erb, M., Winkler, S., & Reiterer, S. M.
(2013). Language aptitude for pronunciation in advanced second language
(L2) learners: Behavioural predictors and neural substrates. Brain and
Language, 127(3), 366–376. https://doi.org/10.1016/j.bandl.2012.11.006
Isaacs, T. (2014). Assessing pronunciation. In A. J. Kunnan (Ed.), The compan-
ion to language assessment (pp. 140–155). Hoboken, NJ: Wiley-Blackwell.
Isaacs, T., & Trofimovich, P. (2012). Deconstructing comprehensibility. Studies
in Second Language Acquisition, 34(4), 475–505. https://doi.org/10.1017/
S0272263112000150
Isaacs, T., & Trofimovich, P. (2017). Key themes, constructs and interdisciplin-
ary perspectives in second language pronunciation assessment. In T. Isaacs &
P. Trofimovich (Eds.), Second language pronunciation assessment:
Interdisciplinary perspectives (pp. 3–11). Bristol, UK and Blue Ridge Summit,
PA: Multilingual Matters.
Isaacs, T., Trofimovich, P., Yu, G., & Chereau, B. M. (2011). Examining the
linguistic aspects of speech that most efficiently discriminate between upper
levels of the revised IELTS pronunciation scale. IELTS Research Reports
Online, 4, 1–48. Retrieved September 2, 2017, from http://www.ielts.org/-/
media/research-reports/ielts_online_rr_2015-4.ashx
Jenkins, J. (2000). The phonology of English as an international language. Oxford:
Oxford University Press.
Johnson, M. (2000). Interaction in the oral proficiency interview: Problems
of validity. Pragmatics, 10(2), 215–231. https://doi.org/10.1075/prag.
10.2.03joh
338 M. C. Pennington and P. Rogerson-Revell