Professional Documents
Culture Documents
The Effectiveness of L2 Pronunciation Instruction: A Narrative Review
The Effectiveness of L2 Pronunciation Instruction: A Narrative Review
INTRODUCTION
Although L2 pronunciation has been a concern of learners and teachers for
centuries, empirical research was scant until the 1960s, when contrastive ana-
lysis became a popular source for language curriculum design (Munro and
Derwing 2011). By the turn of the 21st century, however, Derwing and
Munro (2005), among others, noted the limited amount of recent pronunci-
ation research. They argued that pronunciation instruction had become a cas-
ualty of Communicative Language Teaching, in which a focus on meaning was
prioritized over form-focused instruction, under the assumption that pronun-
ciation would improve through exposure.
Since 2005, the tide has shifted. Many more L2 pronunciation studies have
appeared in peer-reviewed venues, conference proceedings, and graduate
theses, laying the foundation for increasingly rigorous research. Further ce-
menting L2 pronunciation as an area worthy of investigation is the emergence
2 L2 PRONUNCIATION INSTRUCTION NARRATIVE REVIEW
Age of learners
Although age of learning is an important predictor of ultimate L2 pronunci-
ation attainment (Flege et al. 1995), it is rarely a variable of interest in L2
pronunciation research. Of the studies surveyed, 78 per cent reported the ef-
ficacy of pronunciation instruction for adult learners, while 12 per cent exam-
ined younger individuals. Only 56 per cent involving adults provided the
participants’ ages. The remainder largely indicated that learners were under-
graduate or graduate students, or company employees. A few studies reported
older (50+) participants, but generally did not address age-related differences
(e.g. Couper 2006; Henderson 2008; Saito 2013b). Most research with non-
adults (e.g. Kennedy 2003; Cardoso 2010; Lima 2010; Tsiartsioni 2010; Chen
Theoretical paradigm
Few studies had an explicit theoretical stance; consequently, we used method
of assessment to determine theoretical alignment: if the speech was assessed
relative to a native-like target [e.g. Voice Onset Time (VOT), pitch contours,
accent ratings, or error counts—meaning a segment is perceived as an unin-
tended sound] we categorized them as Nativeness studies, while research invol-
ving measures of comprehensibility (ratings) or intelligibility (e.g.
transcriptions) were classified as following the Intelligibility Principle. Most stu-
dies implicitly aligned with the Nativeness Principle (63 per cent) (e.g. Chang
2006; Ingels 2011), with fewer following the Intelligibility Principle (24 per cent)
(e.g. Parlak 2010; Saito 2011); 13 per cent had elements of both (e.g. Yates
2003; Trofimovich et al. 2009).
Scope of training
When we examined researchers’ choices of focus of instruction, we found
segmentals were investigated in 53 per cent of the studies (e.g. Elliot 1995;
Warsi 2002; Garcia 2005; Huthaily 2008; Gonzales-Bueno and Quintana-Lara
2011; Liu and Fu 2011), while 23 per cent focused on suprasegmentals (Harris
2002; Yanli 2008; Gomez Lacabex and Garcia Lecumberri 2010; Muller Levis
and Levis 2012) and 24 per cent dealt with both, usually in combined lessons
but occasionally as separate comparison groups (e.g. Derwing et al. 1998;
Derwing and Rossiter 2003; Akita 2005; Gordon et al. 2013). Several papers
r
centred on a single segmental (e.g. / /in English, intervocalic/d/in Spanish),
whereas others involved several segments (e.g. Lengeris 2009 reported on 14
vowels). There was no consistent pattern in choice of segmentals.
Suprasegmentals varied similarly; for instance, Chun et al. (2013) studied
only Mandarin tone, whereas the students in Derwing et al. (1998) were
taught numerous suprasegmentals. How the effects of instruction were mea-
sured typically correlated with whether the focus was an isolated aspect of
R. I. THOMSON AND T. M. DERWING 5
Training input
Classroom instruction versus CAPT
The studies surveyed comprise a mix of traditional classroom instruction
(61 per cent), and CAPT (39 per cent). Interestingly, 69 per cent of the
CAPT studies appeared in peer-reviewed venues, while only 43 per cent of
the classroom-based studies did.
pitch and intonation (de Bot and Mailfert 1982; de Bot 1983; Hardison 2004,
2005; Hirata and Kelly 2004; Chun et al. 2013), and global speech character-
istics (Hincks and Edlund 2009; Tanner and Landon 2009). In a more explora-
tory fashion, Automatic Speech Recognition (ASR) provided learners with
feedback on the intelligibility of their productions (Hincks 2003; Burleson
2007; Neri et al. 2008). In ASR, the computer identifies which words or seg-
ments are mispronounced. The extent to which ASR reflects human listener
responses is a topic of debate (Thomson 2011). Only two studies used
the Internet for pronunciation training: podcasts elicited peer-feedback in
one (Lord 2008), while Computer Mediated Communication (Bueno
Alastuey 2010) facilitated practice speaking with native speakers and other
Duration of instruction
Generally, the focus of instruction determined time devoted to training; dur-
ation ranged dramatically across studies. Classroom-based interventions lasted
between 30 min (George 2013) and 70 h (Parlak 2010). Furthermore, the
timeframe varied from one lesson on a single day (George 2013) to many
sessions over one year (Nagamine 2011). In CAPT studies, training was usually
much shorter, lasting 20 min (Guilloteau 1997) to 22 h (Stenson et al. 1992). In
one case, training occurred during a 45-min session (de Bot 1983), but lasted
12 sessions over 12 weeks in another (Ferrier et al. 1999). Length of training
was related to the number of features targeted. Most training occurred within
general language teaching programs, suggesting that some gains in L2 pronun-
ciation might be attributable to input from language classes; it was difficult to
know whether pronunciation training or classroom interactions led to pronun-
ciation improvement when there was no control group.
Nature of assessment
L2 speaking tasks
Reading-aloud tasks were by far the most common assessment of pronunci-
ation, employed in 73 per cent of studies. Although reading tasks were accom-
panied by other forms of assessment in some cases, 56 per cent used them
exclusively: a wordlist in 24 per cent of cases, a sentence or paragraph task in
another 25 per cent, and a combination of wordlist and sentence/paragraph
reading in 7 per cent.
Elicited imitation tasks were used in 12 per cent of studies, and were the
only form of assessment in 8 per cent. For example, to assess L2 English vowel
learning, learners listened to single syllables presented in a carrier phrase (e.g.
‘The next word is ___.’) which they then imitated in a new carrier phrase (e.g.
‘Now I say ___’) (Thomson 2011). To assess both suprasegmental and segmen-
tal features, Trofimovich et al. (2009) had learners imitate complete sentences.
R. I. THOMSON AND T. M. DERWING 7
Picture tasks were used in 9 per cent of the studies to elicit extemporaneous
productions, but were used exclusively in 4 per cent. This included picture-
naming of words in isolation (e.g. White 2006), and in sentences (e.g. Saito
and Lyster 2012a). To elicit longer stretches of extemporaneous speech, 7 per
cent of the studies required learners to produce a narrative describing a
sequence of pictures (e.g. Derwing et al. 1998).
Spontaneous speaking tasks (e.g. conversation) were used in 20 per cent of
the studies; 13 per cent used this task exclusively (e.g. Underwood and Wallace
2012). Some assessments allowed planning time. For example, Perlmutter’s
(1989) learners engaged in a one-minute discussion of a passage they had
just read; Henderson’s (2008) learners prepared speeches relating a short an-
Pronunciation measures
In 79 per cent of manuscripts, human listeners assessed L2 productions, while
in the remaining 21 per cent, acoustic measures were made. Of the studies
using human ratings, 75 per cent assessed discrete pronunciation features; 18
per cent used global evaluation, and 7 per cent used both. In acoustic studies
researchers measured discrete features of the speech signal, such as VOT
for stops (e.g. Suarez 2008), formants for vowel diphthongization (e.g.
Counselman 2010), and pitch (e.g. Hincks and Edlund 2009).
DISCUSSION
Participant demographics
Languages being learned/language background of learners
Individual differences across learners in L2 pronunciation studies do not fea-
ture prominently in the literature, although they are sometimes mentioned
(Bajuniemi 2013). Munro et al. (forthcoming, in press) argue that individual
differences are important because the mean learning trajectories for a given
sample may not represent even a single learner from within that sample.
However, understanding how individual differences affect learning trajectories
will allow results to be more readily generalizable to new learners.
More research examining L2 pronunciation instruction for languages other
than English is needed. Not only would such research benefit learners of those
languages, it may also reveal cross-linguistic developmental patterns vis-a-vis
language-specific issues. Correspondingly, more research controlling for the
learners’ L1 is needed. While some learning trajectories may be similar
R. I. THOMSON AND T. M. DERWING 9
regardless of the learners’ L1, difficulties may result from an interaction be-
tween the L1 and the L2. Furthermore, individual variation within an L1
group is generally high (Munro et al. in press). A better understanding of
what should constitute a common pronunciation curriculum for mixed L1
groups, which needs are L1-specific, and which are subject to individual vari-
ation would inform pedagogical practice by identifying pronunciation features
that develop naturalistically, those features that may benefit a majority of
learners, and elements to be addressed on a learner-by-learner basis.
Because it is a global lingua franca, English as an L2 has been researched
more extensively than other languages. One consequence is the development
of a selection criterion for English segmental instruction based on the notion of
Age of learners
Most studies target young adults and/or immigrant learners. More research
should investigate explicit pronunciation instruction with learners of different
ages. While the relationship between age and foreign accent is undeniable, it is
also clear that among learners of a similar age, the degree of exposure/experi-
ence with the L2 predicts strength of accent (Flege et al. 1995). How instruction
can positively affect the pronunciation of older learners should be investigated
(Derwing et al. 2014).
Theoretical paradigm
Almost all the studies we examined lacked an overt theoretical stance.
Although L2 speech science is usually situated in Flege’s Speech Learning
Model (1995) or Best’s Perceptual Assimilation Model (1995), classroom-
based studies are largely atheoretical. During the Audiolingual period, pronun-
ciation instruction was guided by contrastive analysis, but in the 1970s,
pronunciation fell out of vogue. Most practitioners who continued to teach
pronunciation relied on intuition and experience, but most researchers
ignored L2 pronunciation altogether. Now, with a revival of interest, few re-
searchers have adopted a theory [but note that Couper 2011 invokes cognitive
phonology, and Derwing et al. (2014) draw upon MacIntyre’s (2007)
Willingness to Communicate framework). Researchers tend to ask what the
10 L2 PRONUNCIATION INSTRUCTION NARRATIVE REVIEW
Scope of training
When we examined scope of training, we found studies that dealt with one or
two individual aspects of speech, while others covered several segmentals and/
or suprasegmentals. There are benefits to both approaches, but ultimately,
learners must use connected speech to communicate. Whether a single seg-
mental mispronunciation is deleterious to intelligibility is unclear; in contex-
tualized language, one mispronounced segment (in the absence of other
differences) will not usually cause difficulties for the listener. However, most
L2 speakers who have communication problems pronounce several segmentals
and suprasegmentals in ways that interfere with understanding.
Suprasegmental research is under-represented here; future studies should ad-
dress suprasegmental phenomena to a greater extent.
Training input
Classroom instruction versus CAPT
The studies contained a mix of classroom instruction and CAPT. Descriptions of
methodology in classroom research were often inadequate, making replication
difficult. Insufficient details limit language teachers’ ability to apply the
R. I. THOMSON AND T. M. DERWING 11
Duration of instruction
It is notoriously difficult to accurately measure the language input L2 learners
receive. Nevertheless, based on the information provided, we conclude that
the amount of pronunciation-specific input learners access is related to scope
of instruction. This suggests that global improvement in comprehensibility/
intelligibility requires weeks or even months of instruction, not hours or days.
It may also be the case that pronunciation will continue to improve after
explicit training, especially in instances where instruction raises learners’
awareness of pronunciation features. Delayed post-tests are necessary to meas-
ure whether instruction results in ongoing improvement relative to control
groups. For example, Couper’s (2006) delayed post-test, a semester after train-
ing was complete, showed that learners were still significantly better at elim-
inating epenthesis from their speech than before training. Conversely, Ruellot
(2011) found significant improvement in the production of L2 French vowels
after two 50-min training sessions, but learners returned to pre-training per-
formance in a delayed post-test just one week later.
Nature of assessment
L2 speaking tasks
The heavy reliance on reading-aloud tasks ensures that the pronunciation
feature of interest is assessed, but these tasks are not necessarily representative
of learners’ productions when they must retrieve vocabulary and grammar, in
addition to pronouncing more comprehensibly. Productions in a word list or
sentences may not generalize to spontaneous speech. Tests involving
12 L2 PRONUNCIATION INSTRUCTION NARRATIVE REVIEW
Pronunciation measures
How researchers measured improvement was unclear in some instances.
Designations such as ‘correct versus incorrect’ were not defined—was the in-
tention ‘close enough to be recognized as a particular phoneme’ or was it
‘native-like versus non-nativelike’?
A dilemma for many researchers was the issue of pre-test, post-test con-
learners may want to receive instruction (Thomson 2011); some may perceive
inequality if assigned to a control group, thus posing an ethical dilemma. In
such cases, the use of comparison groups may provide some insight beyond
what a single experimental group can reveal. The absence of a control group
is most deleterious in contexts where the training is lengthy and entails
general language learning in addition to pronunciation interventions. For
example, Stenson et al.’s (1992) study included 75 hours of practice with L2
speaking, 22 of which comprised pronunciation-specific training. In this con-
text, one cannot conclude that significant improvement occurred as a direct
result of pronunciation instruction. The learners may have improved without
training, given the other L2 input and practice they received. Conversely, we
ACKNOWLEDGEMENTS
The authors wish to thank Jenna Cheeseman for assistance with gathering literature for this
review. They also acknowledge the editors of this special issue for helpful feedback. We thank
two anonymous reviewers and Rod Ellis for comments on an earlier version. We acknowledge
Murray Munro and John Levis for their ongoing influence. Finally, we thank the participants in
our own research and SSHRCC for financial support.
REFERENCES
*Asterisked studies were included in the narrative *Bueno Alastuey, M. C. 2010. ‘Synchronous-
analysis. Voice Computer-Mediated Communication:
*Liu, Q. and Z. Fu. 2011. ‘The combined effect Students’ pronunciation,’ Asian EFL Journal
of instruction and monitor in improving pro- Teaching Articles 53: 35–50.
nunciation of potential English teachers,’ *Neri, A., C. Cucchiarini, and H. Strik. 2008.
English Language Teaching 4/3: 164–70. ‘The effectiveness of computer-based speech
*Lord, G. 2005. ‘(How) Can we teach foreign corrective feedback for improving segmental
language pronunciation? On the effects of a quality in L2 Dutch,’ ReCALL 20/2: 225–43.
Spanish Phonetics Course,’ Hispania 88/3: Norris, J. M. and L. Ortega. 2006. ‘The value
557–67. and practice of research synthesis for language
*Lord, G. 2008. ‘Podcasting communities and learning and teaching’ in J. M. Norris and
second language pronunciation,’ Foreign L. Ortega (eds): Synthesizing Research on
Language Annals 41/2: 374–89. Language Learning and Teaching. John
MacIntyre, P. D. 2007. ‘Willingness to commu- Benjamins, pp. 3–50.
nicate in the second language: Understanding *Parlak, Ö. 2010. ‘Does Pronunciation
the decision to speak as a volitional process,’ Instruction Promote Intelligibility and
NOTES ON CONTRIBUTORS
Ron Thomson is an associate professor of Applied Linguistics at Brock University. His
research interests include second language (L2) speech perception and production, L2
oral fluency, and Computer Assisted Pronunciation Teaching. He also has an interest in
ethical practice in the burgeoning accent reduction industry. Address for correspondence:
Ron Thomson, Department of Applied Linguistics, Brock University, St. Catharines, ON,
Canada. <Ron.thomson@brocku.ca>
Tracey Derwing is a professor emeritus in the TESL program at the University of Alberta,
and an adjunct professor of Linguistics at Simon Fraser University. She has conducted
numerous studies examining pronunciation and oral fluency development in second