The Routledge Handbook of Second Language Acquisition and Speaking

THE ROUTLEDGE HANDBOOK OF
SECOND LANGUAGE ACQUISITION

AND SPEAKING
This Handbook is a comprehensive volume outlining the foremost issues regarding research
and teaching of second language speaking, examining such diverse topics as cognitive
processing, articulation, knowledge of pragmatics, instruction in sub-components of
speaking (e.g., grammar, pronunciation, and vocabulary) and the attrition of the first
language. Outstanding academics have contributed chapters to provide an integrated and
inclusive perspective on oral language skills. Specialized contexts for speaking are also
explored (e.g., English as a Lingua Franca, workplace, and interpreting). The Routledge
Handbook of Second Language Acquisition and Speaking will be an indispensable resource for
students and scholars in applied linguistics, cognitive psychology, linguistics, and education.
Tracey M. Derwing is a Professor Emeritus of TESL at the University of Alberta, Canada,

and an Adjunct Professor of Linguistics at Simon Fraser University, Canada.
Murray J. Munro is a Professor of Linguistics at Simon Fraser University, Canada.
Ron I. Thomson is a Professor of Applied Linguistics and TESL at Brock University,

Canada.
ROUTLEDGE HANDBOOKS IN SECOND LANGUAGE
ACQUISITION
Series Editors: Susan M. Gass and Alison Mackey
Associate Editor: Kimberly L. Geeslin
The Routledge Handbooks in Second Language Acquisition are a comprehensive, must-have

survey of this core sub-discipline of applied linguistics. With a truly global reach and featuring
diverse contributing voices, each handbook provides an overview of both the fundamentals and
new directions for each topic.
The Routledge Handbook of Second Language Acquisition and Pragmatics
Edited by Naoko Taguchi
The Routledge Handbook of Second Language Acquisition and Corpora
Edited by Nicole Tracy-Ventura and Magali Paquot
The Routledge Handbook of Second Language Acquisition and Language Testing
Edited by Paula Winke and Tineke Brunfaut
The Routledge Handbook of Second Language Acquisition and Technology
Edited by Nicole Ziegler and Marta González-Lloret
The Routledge Handbook of Second Language Acquisition and Writing
Edited by Rosa M. Manchón and Charlene Polio
The Routledge Handbook of Second Language Acquisition and Speaking
Edited by Tracey M. Derwing, Murray J. Munro and Ron I. Thomson
The Routledge Handbook of Second Language Acquisition and Individual Differences
Edited by Shaofeng Li, Phil Hiver and Mostafa Papi
The Routledge Handbook of Second Language Acquisition and Sociolinguistics
Edited by Kimberly Geeslin
For more information about this series, please visit: https://www.routledge.com/Second-
Language-Acquisition-Research-Series/book-series/RHSLA
THE ROUTLEDGE
HANDBOOK OF SECOND
LANGUAGE ACQUISITION
AND SPEAKING
Edited by Tracey M. Derwing, Murray J. Munro, and

Ron I. Thomson
Cover image: Getty Images
First published 2022
by Routledge
605 Third Avenue, New York, NY 10158
and by Routledge
4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
Routledge is an imprint of the Taylor & Francis Group, an informa business
© 2022 Taylor & Francis
The right of Tracey M. Derwing, Murray J. Munro and Ron I. Thomson to be identified
as the authors of the editorial material, and of the authors for their individual chapters,
has been asserted in accordance with parts 77 and 78 of the Copyright, Designs and
Patents Act 1988.
All rights reserved. No part of this book may be reprinted or reproduced or utilised in
any form or by any electronic, mechanical, or other means, now known or hereafter
invented, including photocopying and recording, or in any information storage or
retrieval system, without permission in writing from the publishers.
Trademark notice: Product or corporate names may be trademarks or registered
trademarks, and are used only for identification and explanation without intent to
infringe.
Library of Congress Cataloguing-in-Publication Data
Names: Derwing, Tracey M., editor. | Munro, Murray J., editor. | Thomson, Ron I.,
editor.
Title: The Routledge handbook of second language acquisition and speaking / edited by
Tracey M. Derwing, Murray J. Munro and Ron I. Thomson.
Description: New York : Routledge, 2022. | Series: The Routledge handbooks in second
language acquisition | Includes bibliographical references and index.
Identifiers: LCCN 2021044160 (print) | LCCN 2021044161 (ebook) | ISBN
9780367900847 (hardback) | ISBN 9781032196718 (paperback) | ISBN 9781003022497
(ebook)
Subjects: LCSH: Second language acquisition. | Language and languages‐‐Study and
teaching‐‐Foreign speakers. | Language and languages‐‐Pronunciation by foreign
speakers. | Oral communication‐‐Study and teaching. | LCGFT: Essays.
Classification: LCC P118.2 .R6853 2022 (print) | LCC P118.2 (ebook) | DDC
418.0071‐‐dc23/eng/20211206
LC record available at https://lccn.loc.gov/2021044160
LC ebook record available at https://lccn.loc.gov/2021044161
ISBN: 978-0-367-90084-7 (hbk)
ISBN: 978-1-032-19671-8 (pbk)
ISBN: 978-1-003-02249-7 (ebk)
DOI: 10.4324/9781003022497
Typeset in Times New Roman

by MPS Limited, Dehradun
This book is dedicated to our spouses: Bruce Derwing, Alan Borden, and
Marcela Thomson.
CONTENTS
List of Figures xi
List of Tables xiii
Contributors xiv
Preface xx
Acknowledgements xxi
Editors’ Introduction 1
PART I
Theoretical Foundations and Processes Underlying Speaking 7
1 Bilingual Models of Speaking 9

Kees de Bot and Szilvia Bátyi
2 Psycholinguistic Processes in L2 Oral Production 24

Daphnée Simard
3 A Complex Dynamic Systems Theory Perspective on Speaking in

Second Language Development 39
Wander Lowie and Marjolijn Verspoor
4 Sociocultural Approaches to Speaking in SLA 54

Victoria Surtees and Patricia Duff
5 Aptitude and Individual Differences 68

Joan C. Mora
vii
Contents
6 Language Anxiety 83
Małgorzata Baran-Łucarz
PART II
Research Issues 97
7 Speaking Research Methodologies 99

Charles Nagle, Tracey M. Derwing, and Murray J. Munro
8 Spoken Corpora 112

Amanda Huensch and Shelley Staples
9 Speaking Assessment 130

Noriko Iwashita
PART III
Core Topics 145
10 Pronunciation Learning and Teaching 147

Tracey M. Derwing and Murray J. Munro
11 Speech Intelligibility 160

John M. Levis and Alif O. Silpachai
12 Speech Comprehensibility 174

Pavel Trofimovich, Talia Isaacs, Sara Kennedy, and Aki Tsunemoto
13 Fluency 188
Jimin Kahng
14 The Role of Prosody Across Languages 201

Yanjiao Zhu and Peggy Mok
15 Grammar for Speaking 215

June Ruivivar and Laura Collins
16 Conversational Interaction Studies 229

Jaemyung Goo
17 Pragmatics: Speaking as a Pragmalinguistic Resource 243

Kathleen Bardovi-Harlig
viii
Contents
PART IV
Teaching Speaking 259
18 Second Language Speaking Strategies 261

Sara Kennedy
19 Teaching Vocabulary 273

Marlise Horst
20 The Role of Formulaic Sequences in L2 Speaking 285

Duy Van Vu and Elke Peters
21 Technology for Speaking Development 299

Walcir Cardoso
22 Curriculum Issues in Teaching L2 Speaking 314

Jonathan Newton, Trang Le Diem Bui, Bao Trang Thi Nguyen, and
Thi Phuong Thao Tran
23 Oral Language Development in Immersion and Dual Language

Classrooms 328
Roy Lyster and Diane J. Tedick
24 Speaking and English as a Lingua Franca 344

Enric Llurda
PART V
Emerging Issues 357
25 Workplace Communication 359

Lynda Yates
26 The Relationship Between L2 Speech Perception and Production 372

Ron I. Thomson
27 The Relationship Between Gestures and Speaking in L2 Learning 386

Marianne Gullberg
28 Speech-Language Pathologists and L2 Speakers 399

Marie Nader
29 Child L2 Speakers with Language and Communication Disorders 413

Johanne Paradis
ix
Contents
30 Training Interpreters 427

Jim Hlavac
31 First Language Attrition 442

Monika Schmid
Index 455
x
FIGURES
1.1 Levelt’s SPEAKING model (1995). Reprinted with permission 10

1.2 The dog caught the ball 13
2.1 Adapted version of Levelt’s model (1983, 1989, 1999a, 2000) 26
2.2 Conceptualization of self-repair adapted from Levelt (1983). Example
from Simard et al. (2017) 28
2.3 Working Memory Measurement. Reprinted from Simard et al. (2020) with
permission 31
3.1 Moving correlations between syntactic complexity and accuracy for
participants (A) and (B). Reprinted from Yu and Lowie (2020) with
permission 44
3.2 Vowel diagram for English sounds, with the estimated Dutch rounded
front vowel added in the top left corner. (When pronounced in context,
most productions by Dutch speakers will be somewhat more central.) 45
3.3 Longitudinal measurement of the production of the participant’s Dutch
front rounded vowel /y/, represented by the first three formants: F1
(darkest shade ), F2 (medium shade) and F3 (lightest shade) 46
3.4 Moving min-max graph illustrating changes in the amount of variability in
a child’s use of spatial prepositions. Reprinted from Van Geert and Van
Dijk (2002) with permission 47
3.5 Moving window of correlation. Reprinted from Verspoor and Van Dijk
(2011) with permission 48
5.1 Sources of individual differences in L2 speech acquisition and performance 72
8.1 Example POS-tagged utterance in the CLAN software program 120
8.2 Praat TextGrid with annotation including a word tier (1), syllable tier (2),
and a consonant/vowel tier (3). Reprinted from Ghanem et al. (2020) with
permission 121
21.1 Dialogue between Theodore and Samantha (a virtual assistant) 300
21.2 Sample dialogue between ELIZA and a user. Generated via Norbert
Landsteiner’s implementation at Masswerk (www.masswerk.at/elizabot/
eliza.html) and published with permission 301
23.1 Variable emphases on content and language in the CAPA model (Tedick &
Lyster, 2020, p. 112) 338
xi
Figures
30.1 Process- and experience-based model of interpreter competence. Reprinted

from Albl-Mikasa (2013b, p. 10) with permission 437
31.1 Number of references on Google Scholar to “Second Language
Acquisition” and “Language Attrition,” 1981–2020 450
xii
TABLES
1.1 The speech production process 13

7.1 Elicitation types according to researcher’s degree of control 102
8.B Summary of Select Existing Corpora (modified from Huensch &
Staples, 2018) 127
11.1 Possible intelligibility interactions in with native and non-native speakers 161
11.2 Approaches to measuring intelligibility 166
13.1 Frequently used measures of utterance fluency 190
18.1 Problem-oriented L2 speaking strategies 262
18.2 Interaction-oriented L2 speaking strategies 262
23.1 Examples of scaffolding for comprehension and production 337
xiii
CONTRIBUTORS
Małgorzata Baran-Łucarz is an Assistant Professor at the Institute of English Studies,

University of Wrocław, Poland. She has over 20 years of experience in the fields of language
teaching methodology, FL teacher training, individual learner differences (e.g., anxiety,
motivation, willingness to communicate, and aptitude) particularly in relation to
pronunciation acquisition and teaching.
Kathleen Bardovi-Harlig is a Provost Professor of Second Language Studies at Indiana

University where she teaches and conducts research on pragmatics, second language
acquisition, and instructional pragmatics. Her work on pragmatics has appeared in
journals, edited volumes, and handbooks. She is a co-editor of Interlanguage Pragmatics:
Exploring Institutional Talk (Erlbaum).
Szilvia Bátyi is a Lecturer at the University of Pannonia, Veszprém, Hungary. Her research
interests include first language attrition, L1 and L2 speech production, and linguistic
landscape. She is the area editor of Hungarian in the Linguistic Minorities in Europe Online
peer-reviewed reference resource at Mouton de Gruyter.
Trang Le Diem Bui has a doctorate from the Victoria University of Wellington and is a
Senior Lecturer in the Faculty of Foreign Languages, An Giang University, Vietnam
National University, Ho Chi Minh City, Vietnam.
Walcir Cardoso is a Professor of Applied Linguistics at Concordia University. He conducts

research on the L2 acquisition of phonology, morphosyntax and vocabulary, and the effects
of computer technology (e.g., clickers, text-to-speech synthesizers, automatic speech
recognition, and intelligent personal assistants) on L2 learning.
Laura Collins is a Professor Emeritus at Concordia University, Montréal. Her research has
examined the impact on learning of different distributions of instructional time as well as the
input factors and practice opportunities that may facilitate/constrain classroom learning. She
is the Past-President of the American Association for Applied Linguistics.
xiv
Contributors
Kees de Bot is a Professor at the University of Pannonia. His interests range from bilingual
processing to language attrition, language development over the lifespan and language and
aging. He published a book on the history of Applied Linguistics with Routledge in 2015.
Tracey M. Derwing has extensively researched L2 pronunciation and fluency, especially the
relationships among intelligibility, comprehensibility, and accent. She has also investigated
native speakers’ speech modifications for L2 speakers and has conducted workplace studies
involving pragmatics and pronunciation.
Patricia Duff is a Professor of Applied Linguistics and an Associate Dean, Research, in the
Faculty of Education at the University of British Columbia. Her research, teaching, and
scholarship focus on sociocultural approaches to the teaching, learning, and use of languages
in transnational, multilingual contexts.
Jaemyung Goo is a Professor in the Department of English Education at Gwangju National

University of Education, Gwangju, South Korea. He received his PhD in Linguistics from
Georgetown University in May 2011. His research interests include instructed SLA, cognitive
individual differences and SLA, and task-based language teaching.
Marianne Gullberg is a Professor of Psycholinguistics at Lund University, Sweden. She

studies acquisition and processing in second language and multilingual speakers targeting
semantics, discourse, and gesture production and comprehension. She is co-founder of the
Nijmegen Gesture Centre, the first of its kind, and was vice-president of EuroSLA
2000–2007.
Jim Hlavac is a Senior Lecturer in Translation and Interpreting Studies, Monash University,
Melbourne and a certified and practising interpreter/translator. He has published widely in
the field of Translation and Interpreting Studies and in multilingualism, contact linguistics,
sociolinguistics, intercultural communication, pragmatics, and heritage/minority language
maintenance.
Marlise Horst is an Associate Professor of Applied Linguistics (retired) at Concordia

University in Montreal. Her main research and teaching interest is second language
vocabulary acquisition. Recently, she published a volume for teachers entitled Focus on
Vocabulary Learning. Prior to her career in Canada, she taught English overseas for many
years.
Amanda Huensch is an Assistant Professor in the Department of Linguistics at the University

of Pittsburgh. Her research examines the development of second language fluency, the
acquisition of second language phonology, and the pronunciation pedagogy practices of
foreign language instructors.
Talia Isaacs is an Associate Professor of Applied Linguistics and TESOL and Programme
Leader for the MA TESOL In-Service at the UCL Centre for Applied Linguistics, UCL
Institute of Education, University College London, UK. Her research centers on
pronunciation assessment, including understanding constructs and scoring systems using a
mixed methods approach.
xv
Contributors
Noriko Iwashita is an Associate Professor in Applied Linguistics in the School of Languages

and Cultures at The University of Queensland, Brisbane, Australia. Her research interests
include the interfaces of language assessment and SLA, peer interaction in classroom-based
research, and cross-linguistic investigation of four major language traits.
Jimin Kahng is an Assistant Professor of Applied Linguistics in the Department of Modern

Languages at the University of Mississippi. Her current research examines the development
of cognitive fluency in relation to utterance fluency and individual differences involved in its
development.
Sara Kennedy is a teacher and researcher at Concordia University in Montreal. Her research
interests include the teaching, learning, and assessment of second language speech, and
English and French as a lingua franca.
John M. Levis is a Professor in the Applied Linguistics and Technology program at Iowa
State University. His research interests are in L2 pronunciation teaching and speech
intelligibility. He authored Intelligibility, oral communication, and the teaching of
pronunciation (2018, Cambridge) and is an editor of the Journal of Second Language
Pronunciation.
Enric Llurda is a Professor of Applied Linguistics at the University of Lleida. His research
interests include non-native language teachers, English as a lingua franca, language attitudes,
multilingualism, translanguaging, internationalization and language education and policy in
higher education institutions. He is currently working on the development of disciplinary
literacies in English at university.
Wander Lowie holds a PhD in Applied Linguistics from the University of Groningen and is
the chair of Applied Linguistics at this university. His main research interest lies in the
application of Dynamic Systems Theory to second language. He has published more than 50
articles and book chapters and (co-)authored six books in field of Applied Linguistics. He is
an associate editor of The Modern Language Journal.
Roy Lyster is a Professor Emeritus of Second Language Education at McGill University in

Montreal. He is the author of Learning and Teaching Languages Through Content
(Benjamins) and Vers une approach intégrée en immersion (Les Éditions CEC), and co-
author with Diane J. Tedick of Scaffolding Language Development in Immersion and Dual
Language Classrooms (Routledge).
Peggy Mok is an Associate Professor at the Chinese University of Hong Kong. She is
interested in both speech production and perception, particularly with cross-linguistic and
psycholinguistic perspectives. She focuses more on speech prosody in the recent years.
Speech acquisition in different contexts is an important theme of her research.
Joan C. Mora is an Associate Professor in the Department of Modern Languages and

Literatures at the University of Barcelona, Spain. His research interests include cognitive
individual differences in L2 speech learning, L2 phonology and speaking fluency, L2
pronunciation learning and teaching and phonetic training methods.
xvi
Contributors
Murray J. Munro is a Professor of Linguistics at the Simon Fraser University, Vancouver,

where he has taught linguistics and phonetics for over 25 years. He authored Applying
Phonetics: Speech Science in Everday Life (Wiley, 2021). His research examining the uses of
phonetics to address practical problems has appeared in a variety of journals covering the
speech sciences and applied linguistics.
Marie Nader is a researcher in linguistics and psycholinguistics, and a lecturer at the

Université du Québec à Montréal. She is also a Speech–Language Pathologist with more
than 15 years of experience with L2 speakers. Her current research focuses on
comprehension and oral production of L2 individuals.
Charlie Nagle is an Associate Professor of Spanish and Applied Linguistics at Iowa State
University. His primary research area is second language pronunciation. He has published
on topics such as the perception–production link, individual differences in pronunciation
learning, and dynamic and interactive approaches to listener-based ratings.
Jonathan Newton is an Associate Professor and Programme Director for the MA in Applied
Linguistics/TESOL Programmes at the School of Linguistics and Applied Language Studies
(LALS), Victoria University of Wellington, New Zealand.
Bao Trang Thi Nguyen has a doctorate from Victoria University of Wellington and is a
lecturer at Faculty of English, Hue University of Foreign Languages, Vietnam.
Johanne Paradis is a Professor in the Department of Linguistics and an Adjunct Professor in

Communication Sciences and Disorders at the University of Alberta. Her research is
concerned with bilingualism in children with typical development and in children with
developmental disorders.
Elke Peters is an Associate Professor at KU Leuven. Her research interests involve deliberate
and incidental FL vocabulary learning inside and outside of the classroom and how different
types of input can contribute to vocabulary learning. She has published her research in
Language Learning, Studies in Second Language Acquisition, and TESOL Quarterly.
June Ruivivar holds a PhD in Education, with a specialization in applied linguistics, from
Concordia University, Montréal. Her research explores the acquisition of sociolinguistic
competence, the learning and teaching of spoken grammar and vernacular varieties, and
socio-affective issues in second language acquisition.
Monika Schmid is the Head of the Department of Language and Linguistics at the University
of Essex. She obtained her PhD in English Linguistics in 2000 from the Heinrich-Heine
Universität Düsseldorf with a thesis on first language attrition among German Jews. She has
since held positions at the Vrije Universiteit Amsterdam and at the Rijksuniversiteit
Groningen.
Alif O. Silpachai is a graduate student in the Applied Linguistics and Technology Program in
the English Department at Iowa State University. His research interests include the
production and the perception of suprasegmentals, especially lexical tones.
xvii
Contributors
Daphnée Simard (PhD ULaval) is a Full Professor of second language acquisition at

Université du Québec à Montréal. She investigates the role played by individual variables
such as attentional capacity and memory in second language acquisition, and, in particular,
oral production and written comprehension. She is currently Co-editor-in-chief of the
Canadian Modern Language Review/Revue canadienne des langues vivantes.
Shelley Staples is an Associate Professor of English Applied Linguistics and Second

Language Acquisition and Teaching at the University of Arizona. Her research focuses on
spoken and written learner corpora. Her recent work on spoken corpora emphasizes the
importance of incorporating pronunciation into analyses of L2 discourse.
Victoria Surtees is a Teaching and Learning Specialist in Internationalization at the

University of the Fraser Valley. Previously, she taught English in France, Montreal, and
Vancouver, and was a TESL instructor at the University of British Columbia. Her research
focuses on language learning in study abroad contexts.
Diane J. Tedick is a Professor of Second Language Education at the University of

Minnesota. She is the co-author with Roy Lyster of Scaffolding Language Development in
Immersion and Dual Language Classrooms (Routledge) and the co-editor of Pathways to
Multilingualism: Evolving Perspectives on Immersion Education and Immersion Education:
Practices, Policies, Possibilities (Multilingual Matters).
Ron I. Thomson is a Professor of Applied Linguistics at Brock University. His research

focuses on the development of oral skills by L2 English learners. He is also the creator of
www.englishaccentcoach.com, a free High Variability Phonetic Training (HVPT)
application, used around the world to improve L2 learners’ English pronunciation.
Thi Phuong Thao Tran has grown her interest and research in intercultural language teaching
and learning and intercultural understanding through her teaching at Can Tho University in
Vietnam, her PhD journey at Victoria University of Wellington in New Zealand and her
work supporting migrants in Australia.
Pavel Trofimovich is a Professor of Applied Linguistics in the Department of Education at

Concordia University, Montreal. His research focuses on cognitive aspects of second
language processing, second language pronunciation, sociolinguistic aspects of second
language acquisition, and the teaching of second language pronunciation.
Aki Tsunemoto is currently a PhD candidate at Concordia University, Montréal. She earned
her MA in TESOL at UCL Institute of Education, University College London, UK. Her
current research interests are second language speech assessment, psycholinguistic aspects of
speech interaction and second language pronunciation teaching.
Marjolijn Verspoor is a Professor Emeritus of English Language at the University of

Groningen, Netherlands, and Professor of Applied Linguistics at the University of Pannonia.
Her main research interests are second language development from a dynamic usage based
perspective and instructional approaches in foreign language teaching.
xviii
Contributors
Duy Van Vu is a PhD researcher in Linguistics at KU Leuven. His research focuses on second
language vocabulary acquisition and use. He is interested in how vocabulary is addressed in
second language textbooks and classrooms as well as how vocabulary can be acquired from
different modes of input.
Lynda Yates is a Honorary Professor of Linguistics at Macquarie University. Her research

interests include language in the workplace, the learning and teaching of spoken language –
particularly to migrants, intercultural communication, international education, and teacher
professional development. The latter has fuelled a strong commitment to the translation of
research into practice.
Yanjiao Zhu is a Lecturer in the School of Foreign Languages at the University of Electronic
Science and Technology of China. Her current research focuses on the acquisition of third
language speech, exploring the ways prior linguistic knowledge influence the development of
a new sound system.
xix
PREFACE
This Handbook is a comprehensive volume outlining the foremost issues regarding research
and teaching of second language speaking, examining such diverse topics as cognitive
processing, articulation, knowledge of pragmatics, instruction in sub-components of
speaking (e.g., grammar, pronunciation, and vocabulary) and the attrition of the first
language. Outstanding academics have contributed chapters to provide an integrated and
inclusive perspective on oral language skills. Specialized contexts for speaking are also
explored (e.g., English as a Lingua Franca, workplace, clinical settings, and interpretation).
It is our hope that The Routledge Handbook of Second Language Acquisition and Speaking
will become an indispensable resource for students and scholars in Applied Linguistics,
Cognitive Psychology, Linguistics, and Education.
On completion of an environmental scan, we determined that there are very few resources
that adequately address the breadth of research on second language (L2) speaking. Research
studies appear in disparate journals and edited volumes. This Handbook constitutes a
thorough treatment of speaking-related topics by leading experts in the field. Parts of the
Handbook will appeal to instructors of courses on cognitive processes underlying SLA; the
teaching of speaking; and L2 pronunciation and pragmatics.
Most chapters follow the same outline, in that they first introduce key definitions, followed
by descriptions of historical conceptualizations of a given topic, critical issues in the field,
current issues, research methods commonly used to probe these issues, recommendations for
practice and promising new directions. The authors have also provided some additional
references which they believe are some of the best, more extensive explorations into their
topics.
We three editors agree that we learned a lot in reading these contributions, so we
encourage you to dip into areas that you might not ordinarily seek out. The chapters are
fascinating and may trigger new ideas.
xx
ACKNOWLEDGEMENTS
First, we thank the authors of the chapters here. When we invited them to contribute to this
volume, we and they had no idea that all of our lives would be so disrupted a few months
later. The original due date for the chapters was the end of March, 2020. Suddenly, several
contributors were homeschooling multiple grades as well as converting their university
classes to an online platform, together with other adjustments associated with lockdown.
Many were burdened with additional administrative loads as a result of the pandemic.
Others had their daily routines interfered with in other ways. It is a tribute to each of our
contributors that they all sent excellent chapters our way, albeit with some delay (and thanks
to the Series Editors and Routledge for their understanding).
We are most grateful to the scholars who provided chapter reviews. These individuals
offered extremely helpful feedback to the contributors. They were also affected by the
adverse conditions brought on by the pandemic, but they nonetheless offered superb advice
in a timely manner. The reviewers are listed here alphabetically:
Susan Ballinger
Bill Crawford
Remi van Compernolle
Isabelle Darcy
Esther DeLeeuw
Jean-Marc DeWaele
Roger Gilabert
Talia Isaacs
Eva Karchava
Judit Kormos
Andrew Lee
John M. Levis
Elena Nicoladis
Mary Grantham O-Brien
Ron Smyth
Stuart Webb
David Wood
xxi
EDITORS’ INTRODUCTION
Background
In March of 2017, we were approached by Susan Gass with an invitation to edit a Handbook
on Speaking in the Routledge SLA series edited by Sue and Alison Mackey. We agreed, but
only if we could start the project in July of 2019, with the understanding that once we started,
we would need to adhere to strict timelines. (All that went out the window in March of 2020.)
We were somewhat surprised to be approached to edit a volume on speaking, since our own
primary expertise, second language pronunciation, lies within a subset of speaking. We were
thus slightly reluctant to take this on, but on closer investigation, we realized that many
researchers are in the same position as we are, in that their focus lies within one or two
components of speaking, rather than a broader thrust. We reached out to experts across the
spectrum, and the result is this comprehensive snapshot of L2 speaking in SLA research.
We have divided the Handbook into five parts, starting with Theoretical Foundations and
Processes Underlying Speaking, followed by Research Issues, then Core Topics, Teaching
Speaking, and Emerging Issues.
Part I: Theoretical Foundations and Processes Underlying Speaking

Six chapters appear in this part, the first three of which refer to Levelt’s (1989) groundbreaking
model of speaking, which was originally intended to describe L1 processes, but which has been
adapted to bilingual speakers. In Chapter 1, Kees de Bot and Szilvia Bátyi outline models of
L2 speaking; Daphnée Simard (Chapter 2) describes psycholinguistic processes, and Wander
Lowie and Marjolijn Verspoor (Chapter 3) provide a Complex Dynamic Systems Theory
perspective on speaking in a second language. In view of the evidence that individual varia-
bility plays a much larger role in speaking than previously acknowledged, Joan Mora (Chapter
4) discusses the role of aptitude and individual differences. One factor that appears to have a
significant impact on speaking, and even the willingness of an individual to try to speak in an
L2 is language anxiety. Malgorzata Baran-Łucarz (Chapter 6) explores aspects of this well-
studied feature. Unlike the other chapters in this part, all of which focus on the individual,
Chapter 4 by Patsy Duff and Victoria Surtees illustrates sociocultural approaches to speaking,
going beyond the individual to the tremendous range of contextual effects on learning to speak
in a second language.
DOI: 10.4324/9781003022497-1 1
Introduction
Part II: Research Issues

Research methodologies have come a long way since the early days of SLA research, which
tended to favour replicating L1 acquisition studies with new populations, or borrowing
techniques from linguistics and psychology to address particular questions. (For a com-
prehensive overview of current SLA research methodology, see Mackey & Gass, 2016.)
Charles Nagle et al. (Chapter 7) provide a fairly extensive historical review of various ap-
proaches to L2 speaking research, while Amanda Huensch and Shelley Staples (Chapter 8)
offer a careful description of the many corpora available for study (along with all the benefits
and shortcomings), and also some useful considerations for any researchers who are plan-
ning either to use existing corpora or to develop a corpus themselves. Another key element of
L2 speaking research is assessment, and Noriko Iwashita (Chapter 9) has contributed a
chapter that explores important facets of testing and directions for improvement in the
future.
Part III: Core Topics

The fundamental issues in the field of L2 speaking have evolved considerably over recent
decades. One of the most central concerns, spoken grammar, emerged in the 20th century as
a distinct area of enquiry as attention was drawn to the difference between the spoken and
written modalities. The advent of audio-lingual instruction, in particular, moved the peda-
gogical emphasis away from reading and writing activities, and placed oral production at the
very centre of the teaching of grammatical structure. As an outgrowth of mid-20th century
learning theory, it focused on the acquisition of patterns through extensive repetition and
other oral drills. Interest in this method declined dramatically as the concept of
Communicative Competence gained traction. This new perspective provided an expanded
conceptualization of L2 learning, in terms of both the kinds of “knowledge” to be acquired
and the means of acquisition. Among the influential developments of the early 1980s was
Krashen’s promotion of “comprehensible” spoken input as the fundamental driver of L2
grammatical acquisition, along with his rejection of formal language study as an effective
learning approach. His ideas largely fell out of favour within the decade, however, and were
replaced by a more eclectic perspective on language pedagogy. Research in grammar in L2
speaking now offers sophisticated studies that recognize the complex interplay of other as-
pects of L2 speech with structure, as June Ruivivar and Laura Collins (Chapter 15) deftly
describe.
While grammar, in the sense of phonological, morphological and syntactic structure, still
commands the attention of researchers and pedagogical specialists, attention has shifted
towards less traditional branches of mainstream linguistics. To a great extent, this change is
linked to an interest in interactive language use, a primary source of Krashen’s compre-
hensible input (Krashen, 1982), and the focus of much of Michael Long’s (1983) interests but
also the site at which learners are required to generate output to which interlocutors respond.
Pushing output was a key focus of Swain’s pioneering work (Swain, 1993). Jaemyung Goo
(Chapter 16) recounts the growth of conversational interaction studies from the early 1980s
to today.
Among the relatively newer areas of interest is pragmatics, understood as an aspect of
communicative language use that constrains how speakers may effectively use linguistic struc-
tures within particular interactive contexts. The study of pragmatics in L2 speaking raises sig-
nificant challenges from the standpoint of research design. Bardovi-Harlig (Chapter 17) in
particular observes that much early pragmatics work relied on data from written discourse
2
Introduction
completion tasks. However, L2 speech requires close attention to a wide array of phenomena
including fluency and pausing, and prosodic features such as stress and intonation. Jimin Kahng
(Chapter 13) details L2 fluency research, while Peggy Mok and Yanjiao Zhu (Chapter 14) focus
on the role of prosody across languages. Fluency, appropriate use of pragmatics, and prosody all
contribute to comprehensibility in the sense of “ease of understanding.” In Chapter 12, Pavel
Trofimovich and colleagues have examined the combination of factors that contribute to a given
speaker’s comprehensibility. It is now clear from several studies that not every feature of an L2
accent has an impact on comprehensibility or intelligibility – the listener’s actual understanding
of the L2 speaker’s intended message. John Levis and Alif Silpachai demonstrate this in their
discussion of speech intelligibility (Chapter 11).
As we noted earlier, pronunciation is a subset of speaking, but it is one that has elicited
significant attention in the past two decades. Clearly, if a listener cannot understand an L2
speaker’s output, communication has failed. In some cases that failure may be attributed to
aspects of the speaker’s pronunciation, while in others, the attitudes and choices of the lis-
tener also play a role. Throughout the 1970s and 1980s, it was presumed that pronunciation
would develop on its own with sufficient input. That turned out not to be the case. Tracey
Derwing and Murray Munro (Chapter 10) outline the course of pronunciation in L2 teaching
and research over the past several years.
Part IV: Teaching Speaking

This part highlights key issues facing language teachers and learners. Sara Kennedy (Chapter
18) focusses on the enhancement of second language speaking strategies, while Marlise Horst
(Chapter 19) outlines advances in the teaching of vocabulary. The role of formulaic se-
quences in L2 speaking, which has come to the fore to a greater extent in the past two
decades is discussed in Duy Van Vu and Elke Peters’ Chapter 20. In recent years, the use of
technology to encourage L2 speaking has become a subject of great interest. Walcir Cardoso
(Chapter 21) skillfully describes the potential uses and pitfalls of technological resources.
Designing an L2 speaking curriculum requires attention to educational policies, the interests
and needs of the students, the comfort level of the teacher, and the context in which speaking is
to be taught. Jonathan Newton and colleagues (Chapter 22) provide compelling arguments for
increased collaboration across all stakeholders to prepare students to use their L2. Enric Llurda
(Chapter 24) also makes the point that context is crucial in his chapter on speaking and English
as a Lingua Franca, an area in which research has grown exponentially in the past two decades.
Roy Lyster and Diane Tedick (Chapter 23) compare L2 speaking development in im-
mersion and dual language classrooms, both of which are preferable models to the “drip-
feed” approach to language teaching. However, they note that even in immersion and dual
language programs, learners’ speech tends to plateau at intermediate levels; they argue for
the need to provide students with instruction that encourages increased engagement in oral
communication.
Part V: Emerging Issues

In SLA, the diversity in topics related to L2 speaking is profound. For the final part of the
volume, we chose seven areas which have received considerable attention, one which deals with
children. The fact that the majority of SLA researchers study adults acquiring a second lan-
guage may be due in part to a sense that adults face greater obstacles, but it is likely also the
result of a convenience bias. Getting ethics approval to study children is more difficult than
3
Introduction
accessing university students studying a second language. In Chapter 29, Johanne Paradis
explores issues related to child L2 speakers with language and communication disorders.
Another under-represented group in SLA speaking research is adult migrants in the
workplace, but clearly communication in the workplace is an important societal concern. In
Chapter 25, Lynda Yates highlights what we have learned about workplace communication
thus far and makes recommendations for future studies.
Ron Thomson considers the relationship between L2 speech perception and production in
Chapter 26, pointing out that although these two aspects of language are equally important,
the focus in language classrooms has been unduly weighted on production.
In Chapter 27, Marianne Gullberg notes that speaking is multimodal. Not only does it
require manipulations of the articulators, but also speakers employ gestures with their hands,
their eyebrows, movements of the head and so on. She points out that most SLA research on
speaking ignores gestural components and offers compelling arguments for examining gestures
when investigating the development of L2 speech.
Second language researchers and teachers are not the only professional group with a
vested interest in L2 speakers. As Marie Nader (Chapter 28) describes, many speech lan-
guage pathologists have made L2 accent modification a component of their practices, yet all
too often they are unfamiliar with aspects of SLA research that could inform them of how
best to help their clients.
Exceptionally talented language learners can eventually master another language to the
point of becoming professional interpreters, perhaps the most difficult of all linguistic un-
dertakings. In Chapter 30, Jim Hlavac outlines the nascent research in this area.
Most of this volume is dedicated to the examination of processes and products of the
learning and teaching of a second language, but in Chapter 31, Monika Schmid considers the
effects of the acquisition of another language on a person’s first language speech, noting that
it is more far-reaching than previously believed.
Conclusion
Overall, this volume demonstrates the many directions that SLA research can take in the
investigation of L2 speaking. Each chapter follows a similar trajectory, defining important
terms, outlining a historical overview, highlighting critical issues and topics, examining current
contributions and research, discussing main research methods and making recommendations
for practice. Students looking for ideas for a thesis topic would do well to consult the future
directions part of each chapter, which provide a wealth of ideas for new contributions to our
understanding of L2 speaking. Spoken communication lies at the heart of our very beings. As
Derwing and Munro (2015) observed, the withdrawal of the opportunity to talk with others,
from a child’s “time out” to solitary confinement in prison, is a punishment. Humans are
social; the more we know about communication, and particularly communicating in languages
other than our mother tongues, the better. We hope that some of the topics here inspire new
research and new ways of approaching the instruction of L2 speaking.
References
Derwing, T. M. & Munro, M. J. (2015). Pronunciation fundamentals: Evidence-based perspectives for L2
teaching and research. Amsterdam: John Benjamins.
Gass, S. M., & Varonis, E. M. (1994). Input, interaction, and second language production. Studies in
second language acquisition, 16(3), 283–302.
Krashen, S. D. (1982). Principles and practice in second language acquisition. Oxford: Pergamom Press.
Levelt, W. J. M. (1989). Speaking. From intention to articulation. Cambridge, MA: MIT Press.
4
Introduction
Long, M. H. (1983). Native speaker/non-native speaker conversation and the negotiation of compre-
hensible input. Applied Linguistics, 4(2), 126–141.
Mackey, A. & Gass, S. M. (2016). Second language research: Methodology and design (2nd edn).
New York: Routledge.
Swain, M. (1993). The output hypothesis: Just speaking and writing ’aren’t enough. Canadian Modern
Language Review, 50(1), 158–164.
5
PART I
Theoretical Foundations and Processes

Underlying Speaking
1
BILINGUAL MODELS OF SPEAKING
1 Introduction
In this contribution, we discuss bilingual language production and some of the theoretical
concepts related to this. Our starting point will be the 1989 Speaking Model by Willem
Levelt. Despite the fact that it is now more than 30 years old, it still stands as the most
elaborate and empirically founded model for language production. The original model did
not aim to elucidate how bilingual production differs from monolingual production, how-
ever, in the past few decades a number of bilingual variants have been developed, although
these variants do not alter the fundamental characteristics of the model: it is lexically based,
modular in type, incremental, and skills oriented. It is not a model of change, but parts of the
model can change due to use and learning. One of the major issues in applying the model is
that it is unclear how individual differences such as motivation, attitudes, and anxiety can be
built into the blue print.
Bilingual models have been developed on the basis of the Levelt model. They have the
same characteristics, however, should help in understanding phenomena such as code-
switching and cross-linguistic influence (CLI).
2 Historical Perspectives
The question is: is the default model a monolingual or a bilingual one? On various occasions,
Levelt has indicated that in his view bilingualism is a fascinating topic, but not one he wants
to work on claiming that “monolingualism is already complex enough.” Several authors have
argued that taking a monolingual model as a starting point does not do justice to the fact
that an overwhelming majority of the world’s population is bilingual and that therefore the
default model should be a bilingual one (Grosjean, 2008, 2010). First of all, it has to be
assessed to what extent the current model can deal with various aspects of multilingual
processing. Bilingual and multilingual speech production models are usually derivations of
Levelt’s speaking model (Figure 1.1) or at least they borrow some elements from it.
Consequently, bilingual speaking models cannot be discussed without mentioning the
Speaking model. In the following parts, we describe the main components and processing
mechanism of the Levelt model followed by a discussion of bilingual versions of the model
and a brief outline of how code-switching is accounted for by the different models.
DOI: 10.4324/9781003022497-3 9
Figure 1.1 Levelt’s SPEAKING model (1995). Reprinted with permission
The Levelt Model

Speaking is an integral part of our everyday activity and it is often considered a “gift of
evolution to mankind” (Levelt, 1995, p. 13). Despite this all-pervading nature of speech, it
was not until the end of the 20th century when the first comprehensive model of the
components and processes of speech was developed by Willem J.M. Levelt (1989, 1993,
1995, 1999). It is still the most widely used theoretical framework sketching the mental
mechanisms that generate speech from communicative intentions and is based on the
theoretical and empirical knowledge that had accumulated until the creation of the model
(Figure 1.1). The model is composed of processing components (boxes) and knowledge
components (circles and ellipses). The processing components are “modular,” which means
they are quite autonomous and the intelligence of the system comes from the cooperation
between them. The three main components of speech production are as follows: the con-
ceptualizer, the formulator, and the articulator. Processing in the formulator and the ar-
ticulator is highly automatic, while planning processes in the conceptualizer require
attention from the speaker.
10
Bilingual Models of Speaking
A full description of the Speaking Model would take more space than allowed for the
present contribution. Here, we offer a condensed version of the main characteristics.
Production starts in the conceptualizer where a communicative intention is turned into
lexical concepts. In the generation of the message, information about the conversational
setting and the discourse model are taken into account which includes the selection of a
linguistic register. Hesitation markers (e.g., silent or filled pauses) are often taken as in-
dicators of the amount of mental activity going on: the expression of new or complex ideas
are often preceded by greater hesitancy manifested in pauses, that is, more attention is di-
rected to the planning stage, therefore, resources to be used to execute the act of speaking are
limited. Conceptualization involves macroplanning and microplanning, including the ren-
dering of ideas in the right order (linearization) and the plan of achieving communication
goals (instrumentality). Within the conceptualizer, the message is generated and monitored
internally whether the (preverbal) plan coincides with the intended message. Finally, the
series of lexical concepts is turned into the preverbal message (see Figure 1.2 and Table 1.1 as
an example), which is fed forward to the formulator.
Here, the essential process of turning lexical concepts into a surface structure takes place
which is done by matching lexical concepts with lemmas in the lexicon. Lexical items consist
of two parts: the lemma in which the entry’s meaning and syntax are represented, and the
lexeme that contains the morphosyntactic and phonological information. The matching of a
lemma with a lexical concept also leads to the activation of the syntactic procedures that are
part of the lemma. For instance, if a transitive verb (e.g., caught) is activated, it will start the
syntactic procedures for the generation of a direct and an indirect object (e.g., ball). The
activation of the lexical item also leads to the lexeme becoming available. This process is not
always successful, as the well-known tip-of-the-tongue phenomenon shows; sometimes the
lexical item is activated through the lemma, but the lexeme part does not come up in time.
Interestingly, some properties of the intended word form do become available; speakers often
know how many syllables the word has and what the rhythmic pattern is. The selection of the
lemmas and lexemes also leads to the formation of a surface structure. While the surface
structure is being formed, the morpho-phonological information belonging to the lemma is
activated and encoded. The output of the formulator is the input of the articulator which
converts the speech plan into actual speech. There are two feedback loops, one internal that
checks the inner speech, and one external that checks the overt speech. Syllables are the
building blocks of speech. The outputs of the articulator are motor-plans to execute the
assembling of syllables into running speech. The two feedback loops monitor the speech and
articulation.
As this short description may show, speaking is not primarily syntactically based but
lexically. It is modular in nature though Levelt has always carefully avoided calling the
processing components modules, since that would imply that these components are modular
and therefore innate. In later publications, Levelt followed the view that the modular
character of these components is emergent, it is the result of use, not the origin. Several
commentators (e.g., de Bot et al., 2007) have argued that the modularity of the model is one
of its main characteristics and also one of its weaker points, because a strict modular view
does not allow for a view on language that is more dynamic in nature.
The system works “from left to right,” that is, the information enters the system and is
processed from intention to articulation without feedback or feedforward. It is only at the
level of the phonetic plan that internal speech is monitored and corrections can be made.
This means that errors in speaking can only be detected fairly late in the process. If, for
instance, the wrong lexical item has been activated, this can only be detected through the
internal feedback loop that monitors the internal speech. This also means that there are no in
11
between mechanisms that can correct the error. Basically, error correction is redoing the
same procedures and hoping that this time the intended meaning is actually expressed
correctly.

Bilingual and monolingual speech production shares many features, however, as outlined by
Kormos (2006), there are some distinctive features as well. They share one main char-
acteristic: all of them are directly or indirectly derived from the Levelt model. It was used as a
starting point because the model is based on decades of psycholinguistic research and em-
pirical data. While it could be argued that the Speaking model is not really monolingual,
because it can deal with different registers and styles, which are not essentially different from
languages, the model is not aimed at providing a model for bilingual production. The main
question is, what a bilingual model should have that is not needed for a monolingual version.
Such a model should be able to explain issues such as code-switching and CLI. Many models
deal with part of the speaking process but only a few address speaking from con-
ceptualization to articulation. In the following, we will concentrate on three models (de Bot,
1992; Poulisse & Bongaerts, 1994; Kormos, 2006), which offer suggestions on how the bi-
lingual version of the Levelt model should be composed. In addition, the phenomena of
code-switching will be discussed in relation to the speech production process.
The first adaptation of the Levelt model for bilingual speakers was carried out by de Bot
(1992). The empirical basis was very limited to evaluate such a model, but it triggered a host
of new findings and several new model versions. With as little modification to the original
model as possible, de Bot suggested that the conceptualizer is language specific, that is, while
the macroplanning phase is not language specific, in the microplanning phase, a language is
activated in which the intended message is to be produced and it is part of the preverbal
message (a similar process to choosing between registers in unilingual speech production). de
Bot proposes that two alternative preverbal plans are generated, one for each language.
According to this model, the formulator is language specific and the lexical elements from
different languages are stored in one lexicon. Another modification in the bilingual version of
the model is that the lemmas are not necessarily linked to one lexeme only: they can be linked
to several form characteristics. Furthermore, the syntactic information and meaning are not
as closely connected and stored as in the monolingual lexicon. In a subsequent article, de Bot
and Schreuder (1993) make some modification on this model: the idea of alternative speech
plans is abandoned; the existence of a language cue at the conceptual level is assumed; the
verbalizer is introduced and placed between the conceptualizer and the formulator to fa-
cilitate concept–lemma mappings.
Poulisse and Bongaerts’ (1994) criticism of de Bot’s model is twofold. First, in their view it is
not clear how two alternative speech plans can be formulated in parallel if the information in the
preverbal message raises the activation level of one of the languages only, and second, it is
uneconomical to have speech plans being formulated in parallel, since in theory there need not be
any limit to the number of alternative plans that are being produced. Their model of bilingual
speech production is based on the analyses of unintentional and intentional code-switches.
Similar to de Bot and Schreuder’s model (1993), they propose that the preverbal message con-
tains a language cue besides the conceptual information and together they activate the lemmas
which also carry language tags. This makes lexical selection a simple process: conceptual spe-
cifications and language tags are matched with the appropriate lemmas. As lemmas are con-
nected to common lexical nodes, lexical items from both languages are activated due to
spreading activation. Erroneous selection at this level leads to code-switching.
12
Figure 1.2 The dog caught the ball
Table 1.1 The speech production process
Level Task Example
Conceptualizer Concepts are chosen. Catching (of something by someone); dog (the
entity carrying out this action); ball (the
entity on which the action is carried out.
Formulator Lemmas are accessed and {catch}
retrieved from the mental {dog}
lexicon. {ball}
Grammatical roles are given to VERB = {catch}; SUBJECT = {dog}: singular,
the lemmas. definite; OBJECT = {ball}: singular, definite;
TIME = past
The selected set of lemmas is (DETERMINER) {dog} [singular; definite]
organized into an ordered {catch} [past] (DETERMINER) {ball}
string. [singular; definite]
The lexemes or word-forms are e.g., {dog} is linked in the mental lexicon both
made available via links with to the written form <dog> and to the spoken
the lemmas. form /dɒɡ/
Articulator The utterance is pronounced. The dog caught the ball.
13
One of the latest comprehensive models of bilingual speech production was proposed by
Kormos (2006), which is also based on Levelt’s (1999) monolingual framework. Based on
memory research, this model highlights the role of knowledge stores which are shared be-
tween L1 and L2 with an additional L2 store: the declarative knowledge of syntactic and
phonological rules. At the conceptual level, a language cue is added to each concept sepa-
rately allowing for code-switching at later stages. The model also accounts for the use of
formulaic expressions (apologizing, requesting, etc.) and these are activated as single chunks
at the conceptual level. Activation of the conceptual chunks are spread to the corresponding
linguistic chunks. In the formulator, lemmas from both languages are activated and they
compete for selection. Both syntactic and phonological encoding allows for the cascading of
activation; however, backward flow between the levels is not assumed.
In all bilingual versions of the speaking model, code-switching, the alternate use of two or
more languages in the same utterance or conversation (Auer, 2005; Poplack, 1980; Stavans &
Swisher, 2006) is addressed. A well-known model accounting for code-switching is Myers-
Scotton’s (2002) Main Language Frame model which is primarily aimed at analyzing different
types of code-switching (i.e., intrasentential and insertional code-switching). She argues that as
in the Speaking model, a number of syntactic frames are activated in bilingual production. A
crucial characteristic of her model is the selecting of a so-called matrix language, the language
that provides the language frame for code-switched utterances which the speaker typically goes
back to and elements from the other language are inserted into the dominant/matrix language
according to the speaker’s proficiency. The model proposes that the matrix language can be
identified by calculating the proportion of lexical elements from one of the languages used in
the code-switching setting (the most frequently used language is the matrix language). Myers-
Scotton’s ideas are only concerned with code-switching and provide no additional insights into
bilingual production beyond the Speaking model.
As we have seen, lexical items from both languages are activated in bilingual production
due to spreading activation. For successful communication, bilinguals have to inhibit ele-
ments from the non-target language to avoid interference. In bilingual production, research
into this process has been linked with language dominance and accounted for by the
Inhibitory Control Model (ICM) proposed by Green (1986, 1993, 1998), which explains
switching cost, that is, the reactivation of a language after inhibition. The model postulates
that language dominance governs the amount of inhibition directed at the non-target lan-
guage: the stronger language (L1) has to be suppressed by a greater magnitude of inhibition
than the weaker language (L2). As a consequence, the reactivation of the language depends
on the strength of the inhibition and the reactivation cost (i.e., switching cost) will be larger
for the L1. Later, the model has been expanded to account for symmetrical switching costs
(Schwieter & Sunderman, 2008; Fink & Goldrick, 2015) when the dominance of the two
languages is similar.
3 Critical Issues and Topics

Unlike comprehension, speech planning and production is an intentional process and as such
it should allow the speaker to decide which language to activate and use, but it is not the case
(Kroll et al., 2012). There is general agreement on the existence of the three components
proposed by Levelt (1989): conceptualizer, formulator, and articulator. Following de Bot
(1992, 2004), Grosjean (2013) notes that in bilingual speech production these components are
also present; however, in a bilingual model some modifications are necessary to be made due
to the involvement of more than one language. As it was outlined in the previous part,
bilingual and monolingual speech production have a lot in common, but in the former
14
several factors have been identified to play a role in the process, which are now suggested to
be used in monolingual production models too [e.g., language selection in bilinguals is
compared to selection between registers in monolinguals (La Heij, 2005)]. The main ques-
tions of bilingual speech production concern language selection, the locus of language se-
lection in the planning process, and factors that influence bilingual production. These issues
will be briefly addressed in the following.
One of the questions that has led to considerable research is whether language production
is selective or non-selective (Colomé, 2001; Costa et al., 1999; Grosjean, 2013; Hermans et al.,
1998, Jared & Kroll, 2001). Language selection seems to be the most pronounced question of
bilingual speech production models (these models explain lexical selection) and two general
alternatives have been proposed: language selective models claiming that bilinguals are able to
speak one language alone and prevent or ignore the activation of lexical items from the other
language, while competition for selection models assume that candidates from both languages
compete for selection. In the early days of research on the bilingual lexicon, it was assumed
that there was an input switch and an output switch which allowed for the use of either
languages at will productively and perceptually (Macnamara & Kushnir, 1971). The idea was
that there is a mechanism that acts as a sieve allowing only elements from one language to be
filtered out. Similarly, in production only elements from one language are selected and used.
Although the question is still a source of inspiration for many researchers, Kroll et al. note
that “bilinguals cannot switch off one of the two languages at will. When they listen to
speech, read, or prepare to speak in only one of their two languages, information about the
language not in use is also active and influences performance” (2012, pp. 231–232).
According to La Heij (2005), supporter of the language selective process, language selection
in bilingual production is either viewed as “complex access, simple selection,” or “simple ac-
cess, complex selection.” He (along with Poulisse & Bongaerts, 1994) claims that “Access is
complex in the sense that the preverbal message contains all the relevant information, including
the intended language” and “Lexical selection is a simple, local process that is only based on
the activation levels of words” (2005, p. 302). This suggests that language selection happens
early in the speech planning process, more specifically at the conceptual level. Furthermore, in
bilinguals the outcome of conceptualization is similar to the content of the preverbal message
of the monolingual speaker as modelled by Levelt, with the difference that besides information
about which register to use, the language (L1 or L2) is also selected (de Bot, 1992; La Heij,
2005). Due to the co-activation of semantic neighbours, it is “reasonable to assume that during
lexical access words that also appear in the nonresponse language are activated to some extent”
(La Heij, 2005, p. 301). Other language selective models assume a more complex selection
process, for example, Costa et al. (1999) assume the presence of language tags in lexical access
indicating whether a word belongs to L1 or L2.
Language tags are also displayed as components of competition for selection models but
accompanied with an inhibition mechanism. Green’s Inhibitory Control Model (ICM, 1986,
1993, 1998) assumes that each individual lexical representation has a language tag (L1 or L2)
and non-target lexical nodes can be suppressed in a particular communicative context.
According to the ICM, selection is mediated via inhibitory processes at the lemma level
(contrary to La Heij’s suggestion) and the amount of inhibition is proportional to the acti-
vation level of the non-target language items (Finkbeiner et al., 2006). L2, in general, receives
less inhibition because it is usually less highly activated than the L1. Empirical evidence
supporting the ICM comes from studies investigating switching cost. It has been found that
switching cost varies according to proficiency: less-proficient bilinguals experience greater
switching cost in L2–L1 direction (Meuter & Allport, 1999), while switching cost is symmetric
for highly proficient and close to balanced bilinguals (Costa & Santesteban, 2004). Hermans
15
et al. (1998), in two picture–word interference tasks, demonstrated that bilinguals cannot
prevent interference from their L1 at the initial stages of lexical access in their L2, the L1
lemma also becomes activated.
As opposed to previous views on the locus of language selection [at the conceptual level
(La Heij, 2005), at the lexical level (Costa et al., 1999), and at the semantic level (Green,
1998)], Kroll et al. (2006) suggest that parallel activation of both languages can persist into
the execution of speech, which makes the system “generally nonselective and open to these
cross-language interactions” (p. 127). They acknowledge that “although there are circum-
stances that allow bilinguals to plan spoken utterances exclusively in one language without
the influence of the other language, those circumstances are the exception, not the rule,
particularly when speaking the L2” (p. 127).
Several factors have been proposed to influence the activation level of the languages and
the place of language selection in the bilingual speech production process. Grosjean (2001)
introduced the language mode concept which relates to the context of language use and
proposes two endpoints of the continuum: the monolingual mode where one language is
active and used and the bilingual mode in which both languages are highly active. He
proposed that the state of activation of the bilingual’s languages depend on where they find
themselves on the continuum (Grosjean, 2013). Numerous factors determine movement on
the continuum and thus the activation of languages (e.g., languages involved, interlocutors,
topic, stimuli, and experimental task). Some studies have attempted to experimentally vali-
date the language mode model but arrived at mixed results (Jared & Kroll, 2001; Van Hell &
Dijkstra, 2002; Navracsics, 2004).
Kroll et al. (2006) identified several factors that can influence the locus of language selection,
though the empirical evidence is still scarce and needs further research. These factors are the
following: language proficiency, language dominance, context of acquisition, processing demands
associated with the (experimental) task, the nature of concepts to be expressed, and activation of
the two languages. In addition to these, qualitative studies add such affective factors as emotions.
Navracsics (2014) conducted a longitudinal process study with English–Hungarian–Persian chil-
dren and found that CLI was often caused by the least frequently used language (Persian), which
was also the language of emotions in parent–child communication.
Bilingual production models rely on existing evidence in the domains of lexical processing
and sentence processing. Most studies on production focus on the former, while research on
the latter is scarcer and often yield to contradictory results. Hartsuiker and Pickering (2007)
review the evidence from bilingual sentence production studies to test the predictions of three
models: de Bot’s bilingual production model (1992) (strong and weak version), Ullman’s
declarative/procedural model (2001), and Hartsuiker et al.’s integrated model (2004). The
general question of the study is “To what extent are processes used in sentence production
integrated between the different languages of a bilingual and to what extent are they kept
separate?” (p. 479). All three models (we consider the weak version of de Bot’s model)
propose CLI but there is no consensus on the determining factors. de Bot assumes that the
degree of interaction could be a function of linguistic distance and between-language effects
should be stronger with closely related languages. Ullman is unclear about the effect of
linguistic distance but proposes a positive relationship between proficiency and the extent of
CLI which contradicts de Bot’s assumption.
The authors review recent behavioural evidence from language production experiments,
more specifically on conceptual number effects, syntactic transfer, syntactic priming (across
languages, strength of within- and between-language, the effect of linguistic distance and
proficiency), and they conclude that findings support the predictions of Hartsuiker et al.’s
16
model that neither proficiency nor linguistic distance has an effect on CLI and there is robust
between-language priming.
4 Current Contributions and Research

Research interest in bilingualism has avoided touching on the modification of the speaking
model, therefore no overarching bilingual speaking model has been proposed since Kormos
(2006). Current research on bilingual and multilingual processing is partly or fully related to
speech production, which led us to the decision of discussing only those contributions, which
are directly relevant to bilingual production models. These topics include code-switching and
the role of individual differences.
Code-Switching
One of the most prominent features of bilingual production is code-switching (CS), the
change of language during speaking. The most frequent type is switching between utterances,
but it happens at all linguistic levels, phonological, morphosyntactic, lexical semantic, and
between sign languages and even sign and non-sign languages (Meier 2002). Code-switching
has been studied extensively and there is a large body of publications on the subject. Current
code-switching research is interested in the role of cognates in code-switching, in the cog-
nitive mechanisms involved, in particular whether switching between languages takes time
and effort and in switching between modalities.
Broersma and colleagues (Broersma & de Bot, 2006; Broersma et al., 2020) carried out a
number of studies on triggered code-switching – code-switching facilitated by the occurrence
of cognates – and found a strong effect of cognates in conversations from a large corpus of
Welsh-English conversational speech. The data showed that producing cognates facilitated
code-switching, and that speakers who use more cognates tend to switch more. Interestingly,
hearing rather than producing cognates did not facilitate code-switching. In terms of a
production model, this suggests that lexical activation can have an impact on language
choice and vice-versa.
Mosca and de Bot (2017) studied Dutch-English bilinguals with small differences in
dominance and found that while in recognition, the switching cost was associated with
language dominance, in production, no such pattern was found. On the contrary, they found
a paradoxical language effect (faster responses in the L2 than in the L1) in the production.
They concluded that “language control is a much more flexible mechanism than previously
believed and that because of its malleable nature it is difficult to circumscribe it within a
specific model” (p. 16).
Declerck et al. (2019) looked at control mechanisms in code-switching between registers
(formal/informal speech in French) versus code-switching between languages (French and
English). Similar switching costs were found for register/language switching. Making the cue-
to-stimulus interval longer led to a reduction of switching costs for the language switches but
not for the register switches. This suggests that the control mechanisms for the two types of
switching were not completely identical but partially shared.
There is a growing literature on switching between modalities (i.e., sign/non-sign; Tang &
Sze, 2018). Bimodal bilinguals are individuals proficient in both spoken and signed language(s).
Code-blending, the production of spoken and signed elements simultaneously, is more frequent
in this community than code-switching and it has attracted more research interest (see
Emmorey et al., 2008). Research in this domain is still scarce but it seems that the cognitive
17
control exercised by unimodal bilinguals (Bialystok et al., 2009) do not occur for bimodal
bilinguals since the spoken and the signed language are fully or partially active when they are
producing speech, that is, no inhibition occurs (Pichler et al., 2019; Emmorey et al., 2008).
However, CLI have been observed during several studies (Emmorey et al., 2008; Morford et al.,
2011) which questions the assumption the CLI is the consequence of speaking two phonolo-
gically similar languages.
The Role of Individual Differences

Until recently, both sociolinguistic and psycholinguistic factors were seen as belonging to
different research areas. This has changed in the past decade with more attention to in-
dividual differences and sociocognitive factors that might influence language processing and
as such speech production. Individual differences (IDs) are defined by Dörnyei (2009) as
follows: “Individual differences are characteristics or traits in respect of which individuals
may be shown to differ from each other” (p. 1). They usually include age, gender, aptitude,
personality, motivation, working memory (WM), attentional control, and anxiety. For dis-
cussion of production models, the main questions are as follows: what are the processes that
lead to the impact of which individual differences, at what level and in which component of
the model do they come into play? Research answering these questions is very limited in both
the monolingual and the bilingual literature.
Linck et al. (2014) reported a meta-analysis of data from 79 samples involving 3,707
participants looking at how WM is associated with L2 processing and proficiency outcomes.
Based on the positive relationship between the variables they suggested revisiting bilingual
models of speech production and comprehension, specifically Green’s ICM. In this model,
lexical competition is resolved by a control mechanism which is operated by a supervisory
attentional system. Individuals with greater WM resolve interference between competing
representations faster. WM is closely associated with executive functions, however, our
knowledge of how these constructs correlate is very limited especially when it comes to the
domains of L1 and L2 (Linck et al., 2014). Furthermore, according to the most influential
framework by Miyake et al. (2000), it should be noted that there are three executive functions
(inhibition, updating, and shifting) which complicates matters further. Linck et al. (2014)
offer the incorporation of Miyake’s three related but separable executive functions to
Green’s ICM, “such that the current goal of the speaker (updating) directly informs the
supervisory attentional system’s functioning (shifting), which then translates that goal into a
specific task schema that exerts control over the language system (inhibition)” (p. 43).
Kormos (2015) reviewed a number of studies focusing on the relationship between IDs (WM,
anxiety, and willingness to communicate) and speech production, but the association between
them seems to be weak or none. She concluded that the discrepancy of the results might be
attributable to the use of too complex memory tests or tests of phonological short-term memory
and she also proposes the testing of the different functions of the central executive.
Another ID that has recently been included in several studies in association with speech
production is anxiety (Kormos, 2015; Sun & Zhang, 2020). It has been suggested that anxiety
[especially communication apprehension (Horwitz et al., 1986)] affects the speech production
process in the following ways: (1) it may cause difficulties for the speaker to retrieve words
from the mental lexicon; (2) the inhibition of activated L1 items can be more effortful; and (3)
difficulties can arise in orchestrating speech production processes: conceptualization, for-
mulation, and monitoring (Kormos, 2015).
18
Willingness to communicate, motivation to communicate, learning styles, and L2

speaking self-efficacy (Sun & Zhang, 2020) have been identified as factors contributing to
speech production, however, it is unclear at what level and in which component of the model
they exert their effect.
5 Main Research Methods

Bilingual and multilingual comprehension has been studied more extensively than produc-
tion, one of the reasons identified by many researchers is that it is quite complex in a la-
boratory design to create an instrument which makes participants produce the same
utterances (Kroll et al., 2008). In early research, spontaneous speech corpora served as re-
source to study speech errors of bilinguals. Speech errors are not random, and the type and
pattern of these errors can reveal the mechanism of the production system but they are not
informative enough about the planning process.
While the main questions in monolingual word production concern the separation of
various processes involved (e.g., evidence from brain imaging studies, Indefrey, 2007) and the
time allocated to each of these processes (Indefrey & Levelt, 2004), in bilingual production
the main question is whether and to what extent the other language of the bilingual is active
when they intend to speak in one language only. The main method used to find out the
presence of cross-linguistic interference and to identify the affected stages of the production
process is different variants of the picture naming task. One of the strategies has been to use
pictures whose names share similar features in the bilingual’s two languages. In this research,
paradigm pictures whose names are cognates [translation equivalents with similar ortho-
graphy and/or phonology in both languages (e.g., the word hotel in English and Hungarian,
although the pronunciation is slightly different)] are used together with pictures whose names
are non-cognate translations (e.g., Bobb & Wodniecka, 2013; Santesteban & Costa, 2016).
The advantage of this task is that distractors from the other language are not present, so the
activation of the other language is not task induced. Findings from studies which use this
paradigm demonstrate cognate facilitation – pictures with cognate translation equivalents
are named faster than non-cognate pictures. This effect is interpreted as shared phonology
across the bilingual’s two languages which is also found in languages in which the ortho-
graphy is different [e.g., Japanese–English (Hoshino & Kroll, 2008)] and taken as evidence
that the language to be ignored is active during word production.
Picture–word Stroop tasks are a variant of the classical picture naming task in which the
participant is presented with a picture and at a certain point with a distractor word. Different
variables are manipulated during this experiment: the number of languages spoken (Van
Heuven et al., 2011 with trilinguals), the language in which the picture is to be named, the
language of the distractor, the relationship between the target word (i.e., the name of the
picture) and the distractors (phonologically or semantically related) and the stimulus onset
asynchrony (SOA) which is the time latency between the presentation of the picture and the
onset of the distractor (Kroll et al., 2008). The distractor can either be presented at the same
time as the picture, a short time after the picture and a longer time after the picture. Two
main findings have emerged from studies applying this paradigm (Hermans et al., 1998): (1)
semantic interference is greater with shorter SOA, while phonological is greater with longer
SOA; and (2) bilinguals experience interference when the distractor is semantically related to
the target, regardless of the language of the distractor and phonologically related distractors
facilitate production.
19
6 Future Directions
As it becomes clear from the earlier discussion of bilingual Speaking models and questions
related to its components and mechanism, there are countless directions for future research.
There is much more to be done in terms of research and understanding of what factors
influence the activation level of the languages and the place of language selection in the
bilingual speech production process. There is a dire need to conduct more empirical work to
find out the role of factors in language activation identified by Grosjean 2013 (e.g., languages
involved, interlocutors, topic, stimuli, and experimental task) and Kroll et al. (2006) (lan-
guage proficiency, dominance, context of acquisition, processing demands, etc.). In addition
to these factors, models of bilingual speech production could benefit immensely from the
integration of the role of individual differences. Research in this area is very limited and
apart from WM and anxiety, none of the IDs could be associated directly with any levels or
mechanisms.
This chapter heavily relied on lexical processing as most research on bilingual production
have focused on this level and there is considerably less work on bilingual syntactic pro-
cessing. Further research is needed to understand sentence production in a second language
and to see to what extent syntactic representations and processes are shared in bilingual
production (e.g., in the case of translators and simultaneous interpreters).
A promising direction could be the role of gestures, which has become a research field in
itself and there is a growing awareness that non-verbal parts of language use are at least as
important as verbal ones. In the original blue print, only verbal and linguistic information
was included. Through the work of De Ruyter McNeill 2000 non-verbal aspects were in-
cluded too. He argues for the addition of a gestuary, the collection of gestures a speaker uses.
There is no simple one-to-one match between certain gestures and meaning. Nor is there a
grammar of gesture use. Modelling gestures is very complex and the transcription of gestures
is tedious and labour intensive. At the same time, it is obvious that non-verbal behaviour is
an essential part in language production and that gestures have an impact on meaning
conveyance. Gullberg (2012) has suggested a link between gestures and intonation, because
both extend over longer parts of utterances and they carry meaning by themselves, often
related to the verbal content of the sentence. The role of gestures and other non-verbal
information in language production is still to be studied.
Further Reading
Grosjean, F. (2013). Speech production. In F. Grosjean & P. Li (Eds.), The psycholinguistics of
bilingualism (pp. 50–69). Malden, MA & Oxford: Wiley-Blackwell.
Kormos, J. (2006). Speech production and second language acquisition. Mahwah: Lawrence Erlbaum.
La Heij, W. (2005). Selection processes in monolingual and bilingual lexical access. In J. F. Kroll &
A. de Groot (Eds.), Handbook of bilingualism (pp. 289–307). New York: Oxford University Press.
References
Auer, P. (2005). A postscript: Code-switching and social identity. Journal of Pragmatics, 37(3), 403–410.
Bialystok, E., Craik, F. I. M., Green, D. W. & Gollan. T. H. (2009). Bilingual minds. Psychological
Science in the Public Interest, 10(3), 89–129.
Bobb, S., & Wodniecka, Z. (2013). Language switching in picture naming: What asymmetric switch
costs (do not) tell us about inhibition in bilingual speech planning. Journal of Cognitive Psychology,
25, 568–585.
Broersma, M. & de Bot, K. (2006). Triggered codeswitching: A corpus-based evaluation of the original
triggering hypothesis and a new alternative. Bilingualism: Language and Cognition, 9, 1–13.
20
Broersma, M., Carter, D., Donnelly, K., & Konopka, A. (2020). Triggered codeswitching: Lexical
processing and conversational dynamics. Bilingualism: Language and Cognition, 23(2), 295–308.
Colomé, A. (2001). Lexical activation in bilinguals’ speech production: Language-specific or language-
independent? Journal of Memory and Language, 45, 721–736.
Costa, A., & Santesteban, M. (2004). Lexical access in bilingual speech production: Evidence from
language switching in highly proficient bilinguals and L2 learners. Journal of Memory and Language,
50, 491–511.
Costa, A., Miozzo, M., & Caramazza, A. (1999). Lexical selection in bilinguals: Do words in the
bilingual’s two lexicons compete for selection? Journal of Memory and Language, 41, 365–397.
de Bot, K. (1992). A bilingual production model: Levelt’s speaking model adapted. Applied Linguistics,
13, 1–24.
de Bot, K. (2004). The multilingual lexicon: Modeling selection and control. The International Journal
of Multilingualism, 1(1), 17–32.
de Bot, K., Lowie, W. M., & Verspoor, M. H. (2007). A dynamic systems theory approach to second
language acquisition. Bilingualism: Language and Cognition, 10(1), 7–21. doi: 10.1017/S136672
8906002732
de Bot, K., & Schreuder, R. (1993). Word production and the bilingual lexicon. In R. Schreuder &
B. Weltens (Eds.), The bilingual lexicon (pp.191–214). Amsterdam: Benjamins.
De Ruyter, J. P. (2000). The production of gesture and speech. In D. McNeill (Ed.), Language and
gesture (pp. 284–311). Cambridge: Cambridge University Press.
Declerck, M., Ivanova, I., Grainger, J., & Duñabeitia, J. A. (2019). Are similar control processes im-
plemented during single and dual language production? Evidence from switching between speech
registers and languages. Bilingualism: Language and Cognition, 23(3), 694–701. doi: 10.1017/S136672
8919000695
Dörnyei, Z. (2009). The psychology of second language acquisition. Oxford: Oxford University Press.
Emmorey, K., Borinstein, H. B., Thompson, R. & Gollan, T. H. (2008). Bimodal bilingualism.
Bilingualism: Language and Cognition, 11(1), 43–61.
Fink, A., & Goldrick, M. (2015). Pervasive benefits of preparation in language switching. Psychonomic
Bulletin & Review, 22, 808–814.
Finkbeiner, M., Almeida, J., Jansen N., & Caramazza, A. (2006). Lexical selection in bilingual speech
production does not involve language suppression. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 32, 1075–1089.
Green, D. (1986). Control, activation and resource. Brain and Language, 27(2), 210–223.
Green, D. (1993). Towards a model of L2 comprehension and production. In R. Schreuder &
B. Weltens (Eds.), The bilingual lexicon (pp. 249–277). Amsterdam: Benjamins.
Green, D. (1998). Mental control of the bilingual lexico-semantic system. Bilingualism: Language and
Cognition, 1, 67–81.
Grosjean, F. (2001). The bilingual’s language modes. In J. L. Nicol (Eds.), One mind, two languages:
Bilingual language processing (pp. 1–22). Oxford, England: Blackwell.
Grosjean, F. (2008). Studying bilinguals. Oxford: Oxford University Press.
Grosjean, F. (2010). Bilingual: Life and reality. Cambridge, MA: Harvard University Press.
Grosjean, F. (2013). Speech production. In F. Grosjean & P. Li (Eds.), The psycholinguistics of bi-
lingualism (pp. 50–69). Malden, MA & Oxford: Wiley-Blackwell.
Gullberg, M. (2012). Bilingualism and gesture. In T. K. Bhatia & W. C. Ritchie (Eds.), The handbook of
bilingualism and multilingualism (2nd edn, pp. 417–437). Malden, MA: Wiley-Blackwell.
Hartsuiker, R., & Pickering, M. (2007). Language integration in bilingual sentence production. Acta
Psychologica, 128(3), 479–489.
Hartsuiker, R., Pickering, M., & Veltkamp, E. (2004). Is syntax separate or shared between languages:
Cross-linguistic syntactic priming in Spanish-English bilinguals. Psychologica Science, 1, 5, 409–414.
Hermans, D., Bongaerts, T., de Bot, K., & Schreuder, R. (1998). Producing words in a foreign lan-
guage: Can speakers prevent interference from their first language? Bilingualism: Language and
Horwitz, E. K., Horwitz, M. B., & Cope, J. A. (1986). Foreign language classroom anxiety. Modern
Language Journal, 70(2), 125–132.
Hoshino, N., & Kroll. J. F. (2008). Cognate effects in picture naming: Does cross-language activation
survive a change of script? Cognition, 106, 501–511.
Indefrey, P. (2007). Brain imaging studies of language production, In G. Gaskell (Ed.), Oxford hand-
book of psycholinguistics (pp. 547–564). Oxford: Oxford University Press.
21
Indefrey, P., & Levelt, W. J. M. (2004). The spatial and temporal signatures of word production
components. Cognition, 92, 101–144.
Jared, D., & Kroll, J. F. (2001). Do bilinguals activate phonological representations in one or both of
their languages when naming words? Journal of Memory and Language, 44, 2–31.
Kormos, J. (2006). Speech production and second language acquisition. Mahwah, NJ: Erlbaum.
Kormos, J. (2015). Individual differences in second language speech production. In J. W. Schwieter
(Ed.), The Cambridge handbook of bilingual processing (pp. 369–388). Cambridge: Cambridge
University Press.
Kroll, J. F., Dussias, P. E., Bogulski, C. A., & Valdes-Kroff, J. (2012). Juggling two languages in one
mind: What bilinguals tell us about language processing and its consequences for cognition. In B.
Ross (Ed.), The psychology of learning and motivation (Vol. 56, pp. 229–262). San Diego: Academic
Press.
Kroll, J. F., Gerfen, C. & Dussias, P. E. (2008). Laboratory designs and paradigms: Words, sounds,
and sentences. In L. Wei & M. G. Moyer (Eds.), The Blackwell guide to research methods in bi-
lingualism and multilingualism (pp. 108–131). Oxford: Blackwell Publishing Ltd.
Kroll, J., Bobb, S., & Wodniecka, Z. (2006). Language selectivity is the exception, not the rule:
Arguments against a fixed locus of language selection in bilingual speech. Bilingualism: Language
and Cognition, 9(2). 119–135.
La Heij, W. (2005). Selection processes in monolingual and bilingual lexical access. In F. Kroll & M. B.
De Groot (Eds.), Handbook of bilinguals: Psycholinguistic approaches (pp. 289–307). Oxford: Oxford
University Press.
Levelt, W. J. M. (1989). Speaking. From intention to articulation. Cambridge, MA: MIT Press.
Levelt, W. J. M. (1992). Accessing words in speech production: Stages, processes and representations.
Levelt, W. J. M. (1993). Language use in normal speakers and its disorders. In G. Blanken, J.
Dittmann, H. Grimm, J. C. Marshall & C-W. Wallesch (Eds.), Linguistic disorders and pathologies
(pp. 1–15). Berlin: deGruyter.
Levelt, W. J. M. (1995). The ability to speak: From intentions to spoken words. European Review,
3, 13–23.
Levelt, W. J. M. (1999). Language production: A blueprint of the speaker. In C. Brown & P. Hagoort
(Eds.), Neurocognition of language (pp. 83–122). Oxford, England: Oxford University Press.
Linck, J. A., Osthus, P., Koeth, J. T. & Bunting, M. F. (2014). Working memory and second language
comprehension and production: A meta-analysis. Psychonomic Buletin and Review, 21(4), 861–883.
Macnamara, J., & Kushnir, S. L. (1971). Linguistic independence of bilinguals: The input switch.
Journal of Verbal Learning and Verbal Behaviour, 10(5), 480–487.
Meier, R., Cormier, K., & Quinto-Pozos, D. (Eds.). (2002). Modality and structure in signed and spoken
languages. Cambridge: Cambridge University Press.
Meuter, R. F. I., & Allport, A. (1999). Bilingual language switching in naming: Asymmetrical costs of
language selection. Journal of Memory and Language, 40, 25–40.
Miyake, A., Friedman, N. P., Emerson, M. J., Witzki, A. H., Howerter, A., & Wager, T. D. (2000). The
unity and diversity of executive functions and their contributions to complex “frontal lobe” tasks: A
latent variable analysis. Cognitive Psychology, 41, 49–100.
Morford, J. P., Wilkinson, E., Villwock, A., Piñar, P., & Kroll, J. F. (2011). When deaf signers read
English: Do written words activate their sign translations? Cognition, 118(2). 286–292.
Mosca, M., & de Bot, K. (2017). Bilingual language switching: Production vs. recognition. Frontiers in
psychology, 8, 934. doi: 10.3389/fpsyg.2017.00934
Myers-Scotton, C. (2002). Contact linguistics, bilingual encounters and grammatical outcomes.
Oxford: OUP.
Navracsics, J. (2004). The question of control in bilingual speech production in different language
modes. Grazer Linguistische Studien, 62, 95–111.
Navracsics, J. (2014). Input or intimacy. Studies in Second Language Learning and Teaching, 4(3),
485–506.
Pichler, C. D., Reynolds, W., & Palmer, L. J. (2019). Multilingualism in signing communities. In S.
Montanari & S. Quay (Eds.), Multidisciplinary perspectives on multilingualism (pp. 175–202). Berlin,
Boston: De Gruyter Mouton.
Poplack, S. (1980). ‘Sometimes I’ll start a sentence in English y termino en español’: Toward a typology
22
of code-switching. In J. Amastae & L. Elías-Olivares (Eds.), Spanish in the United States:

Sociolinguistic aspects (pp. 230–263). Cambridge: Cambridge University Press.
Poulisse, N., & Bongaerts, T. (1994). First language use in second language production. Applied
Linguistics, 15, 36–57.
Quinto-Pozos, D., & Robert A. (2015). Sign languages in contact. In A. C. Schembri & C. Lucas (Eds.),
Sociolinguistics and deaf communities (pp. 29–60). Cambridge: Cambridge University Press.
Santesteban, M., & Costa, A. (2016). Are cognate words “special”? On the role of cognate words in
language switching performance. In J. W. Schwieter (Ed.), Bilingual processing and acquisition:
Vol. 2. Cognitive control and consequences of multilingualism (pp. 97–125). Amsterdam: John
Benjamins Publishing Company.
Schwieter, J. W., and Sunderman, G. (2008). Language switching in bilingual speech production.
Mental Lexicon, 3, 214–238.
Stavans, A., & Swisher, V. M. (2006). Language switching as a window on trilingual acquisition.
International Journal of Multilingualism, 3(3), 193–220.
Sun, P. P., & Zhang, L. J. (2020). A multidimensional perspective on individual differences in multi-
lingual learners’ L2 Chinese speech production. Frontiers in Psychology, 11, 59. doi: 10.3389/fpsyg.2
020.00059
Tang, G., Sze, F. (2018). Bilingualism and sign language research. In A. De Houwer & M. Ortega
(Eds.), Handbook of bilingualism (pp. 483–509). Cambridge: Cambridge University Press.
Ullman, M. T. (2001). The neural basis of lexicon and grammar in first and second language: The
declarative/procedural model. Bilingualism: Language and Cognition, 4, 105–112.
Van Hell, J. G., & Dijkstra, T. (2002). Foreign language knowledge can influence native language
performance in exclusively native contexts. Psychonomic Bulletin & Review, 9(4), 780–789.
Van Heuven, W. J., Conklin, K., Coderre, E. L., Guo, T., & Dijkstra, T. (2011). The influence of cross-
language similarity on within- and between-language stroop effects in trilinguals. Frontiers in
Psychology, 2, 374. doi: 10.3389/fpsyg.2011.00374
23
2
PSYCHOLINGUISTIC PROCESSES
IN L2 ORAL PRODUCTION
Daphnée Simard
1 Introduction/Definitions
Oral production (OP), defined here as the oral expression of language encompassing any form
of language production, from the most spontaneous, which occur during informal discussions,
to entirely planned ones, such as lectures or reading papers out loud, has often been examined
from the point of view of communicative skills in second language (L2; first language, L1)
research (e.g., Bygate, 2008). In such cases, it is commonly referred to as speaking. It has also
been researched as a psycholinguistic process, which is the object of this chapter offering a
description of the linguistic and cognitive processes involved in L2 OP. Generally speaking, OP
entails that L2 speakers mobilize simultaneously and in real time their cognitive resources
along with their linguistic knowledge (pragmatic, semantic, morphosyntactic, and phonolo-
gical), mastered to varying degrees. A distinction will be made between psycholinguistic pro-
cesses, that is, the language processing mechanisms such as grammatical and phonological
encodings, from cognitive resources, specifically, memory and attention.
Since the psycholinguistic processes underlying OP, whether in L1 or L2, are considered to
be largely the same, despite some differences (e.g., Kormos, 2006, 2011), the discussion will
focus OP in general psycholinguistic terms, beginning with Levelt’s speech production model
(e.g., 1989, 1999a, 2000, 2001), believed to represent a “consensus view of the linguistic, psy-
cholinguistic, and cognitive issues underlying the act of speaking” (Segalowitz, 2010, p. 8).
Levelt’s model, which was originally formulated for L1 production, has been widely used in L2
OP research (e.g., Dörnyei & Kormos, 1998; Izumi, 2003; Kormos, 1999a, 2006, de Bot &
Batyi, this volume). Additionally, the roles played by working memory and attention in the
model are addressed, as they constitute an important source of variation observed between L1
and L2 performance. Studies examining the mediating role of working memory and attention in
the psycholinguistic processes involved in L2 OP are then addressed. Finally, future directions
for research and classroom implications are presented.
Over the past 40 years, several models have been proposed to explain how language is pro-
duced. Two broad types of models exist: modular and non-modular, with modularity referring
to encapsulated, specialized, and independent (i.e., that do not interact amongst themselves)
24 DOI: 10.4324/9781003022497-4
Psycholinguistic Processes
modules (Fodor, 1983). On the one hand, modular models put forth the idea of existing
modules through which OP proceeds (e.g., Garrett, 1984; Levelt, 1989, 1993; Levelt et al.,
1999). In this context, modules are considered to be functionally driven domain-specific pro-
cessors which operate on linguistic information (e.g., semantic, lexical, and phonetic). On the
other hand, non-modular models describe OP as a process during which linguistic information
is activated at different levels simultaneously (Dell, 1986; Trueswell et al., 1994).
According to Kormos (2006), to date, the most detailed account of OP and arguably the
most influential model is Levelt’s Blueprint of the Speaker (e.g., 1989, 1999). Levelt’s
modular model describes the psychological processing components in operation during
production and comprehension; in his view, they are closely intertwined, as speakers do both
when interacting orally (Levelt, 2000, p. 154). Therefore, the production and comprehension
systems will be addressed in the presentation of the model.
3 Critical Issues and Topics: Blueprint of the Speaker

According to Levelt’s model (1989, 1999a, 2000, 2001), OP involves processes in five mod-
ules, with three modules for production proper, and two additional ones for comprehending,
as depicted in Figure 2.1. The model has gone through a series of changes over the years. An
integrated view of the different versions is presented in Figure 2.1.
Production System
In all versions of the model, the first stage of OP is the conceptualization of what will be said.
However, in a more recent version, this first stage is fed by the speaker’s intention of
communication (Levelt, 2000), rather than being part of the conceptualizer itself (e.g., Levelt,
1989). Therefore, the first step of message conceptualization is to consider the speaker’ in-
tentions for communicating. To do so, speakers rely on their knowledge of discourse models
and Theory of Mind, which allows for the creation of “complex knowledge of structures of
social environment” (Levelt, 1999a, p. 89). Next, the conceptual generation of the message
takes place through two processes, that is, macroplanning and microplanning (Levelt,
1999a). Macroplanning is the process by which the speaker decides what to say next by
managing the discourse focus, that is, directing attention to the object of the production, and
shifting from one object to another, as the production evolves. Next, through microplanning,
the speaker determines which concepts to include in the emerging utterance, and how to
spatially and temporally represent them. The selection of lexical concepts in the mental
lexicon is referred to as perspective taking (Levelt, 1996). In the case of more complex
communicative intents, such as a narration, decisions about the order of events must be
made. This process is called linearization (Levelt, 1996). Citing Slobin’s (1987) work on
“Thinking-for-Speaking,” Levelt (1999a) states that microplanning processes are language
dependent, contrary to macroplanning processes, which are language independent (p. 94).
As they become available, segments of the propositional form of the message, also called
preverbal plans, are passed onto the formulator, for lexically driven encoding. The lexical
information has to be retrieved from the speaker’s mental lexicon – a structured network in
which lexical information is stored – originally in the form of lemmas and lexemes. Lemmas
were first defined by Kempen and Hoenkamp (1987) as the smallest units of grammatical
encoding containing semantic and syntactic properties of lexical items. In more recent ver-
sions, Levelt (1999a, 1999b; Levelt et al., 1999) attributes to lemmas only the syntactic
properties of lexical units, since an additional conceptual level has been added in the mental
lexicon (see Figure 2.1). Therefore, after having selected concepts in the mental lexicon for
25
Daphnée Simard
Communicative intention Inferred intention
CONCEPTUALIZER
Conceptual generation of speech Pragmatic &
- macroplanning Monitoring discourse processing
- microplanning
preverbal message derived message
FORMULATOR PARSER
Grammatical encoding
Mental Lexicon
- Concepts Grammatical encoding
Surface structure - Lemmas
- Lexemes
Morphophonological Morphophonological
encoding decoding & word
recognition
Phonological score
Syllabary
(internal speech)
Phonetic decoding
Phonetic encoding
Articulatory score
Prelexical representation
(Phonetic Plan)
ARTICULATOR ACOUSTIC PROCESSOR
Speech
Overt speech
Figure 2.1 Adapted version of Levelt’s model (1983, 1989, 1999a, 2000).
the preverbal plan, according to the intended message, the syntactic properties of the lemmas
associated with the concepts are activated in the form of syntactic procedures. For instance,
if a noun is selected, a noun phrase will be initiated. This process corresponds to the
grammatical encoding, and generates the surface structure, that is a sequence of lemmas
26
organized into phrases (Levelt, 1989, p. 11). Then, this surface structure is further processed
through phonological encoding by accessing the information stored in the lexemes (i.e.,
morphological and phonological properties of lexical items) and transformed into a phono-
logical score, that is the syllabified and prosodified words (and groups of words) (Levelt,
2001). The phonological score can remain in the form of internal speech or be transmitted
further in the production process to be articulated (Levelt, 1999a; Levelt et al., 1999).
A word about internal speech is necessary here. Although its nature is not entirely un-
derstood, it is thought to be phonological (Jackendoff, 1987; Levelt, 2000). This is the po-
sition taken by Levelt in his latest work (Levelt, 2000). In earlier versions of the model, it was
believed to be the result of phonetic encoding (e.g., Levelt, 1989). In any case, according to
him “whatever it is, we can attend to it and parse it just as we parse what is said to us by
others” (Levelt, 2000, p. 156).
The last phase of phonological encoding is phonetic encoding, which corresponds to the
retrieval, from the mental syllabary, of “articulatory gestural scores” or “motor instructions,”
allowing for their articulation (Levelt & Wheeldon, 1994). This mental syllabary, which is
believed to contain articulatory information for all syllables in a given language, accounts for
the various pronunciations of a given lexical item, depending on its use (e.g., Levelt, 1992,
1995, 1999a; Levelt & Wheeldon, 1994). The output of the phonetic encoding is then called the
articulatory score. The articulatory score, also known as the phonetic plan, is turned into actual
speech in the articulator. The articulation corresponds to the last phase of speech production
according to Levelt’s model (1989), that is, articulation during which phonological and pho-
netic processes translate the emerging message into overt speech.
Comprehension System
The comprehension system allows for the analysis of internal speech, following phonological
encoding, and overtly produced speech, after it has gone through the acoustic processor. The
information initially analyzed by the acoustic processor creates a prelexical representation
(probably built of contrastive information on vowels and consonants or even syllables; how-
ever, still of unclear nature) to be processed by the parser (Levelt, 2000). In both these sce-
narios, speech, internal or overt, ends up being analyzed by the comprehension system, more
recently called the parser (e.g., Levelt, 2000). The parser contains all the “procedures available
to a language user for understanding spoken language” (Levelt, 1983, p. 49), and has access to
the information contained in the mental lexicon. Finally, to interpret the intended message, the
listener relies on pragmatic and discourse processing located in the conceptualizer, which in-
teracts with grammatical decoding in the parser (Levelt, 2000). After being parsed, the message
is sent back to the conceptualizer to be self-monitored for possible feedback.
To explain self-monitoring, Levelt adopts what he calls a perceptual theory (as opposed to
a production theory) (1983, p. 46) according to which a monitor is located in the con-
ceptualizer, and is fed from the parser to allow self-monitoring (Özdemir et al., 2007). This
self-monitoring occurs through three monitors, or feedback loops (Levelt, 1983). The first
monitor loop verifies the conformity of the preverbal plan with the intended message, the
second, also called covert monitoring, checks the conformity of the internal speech, and the
last loop verifies the overt speech produced in the articulation phase. Attention must be
deployed at each stage of production to detect mismatches between speakers’ intentions and
the production outcomes.
The manifestation of speakers’ self-monitoring of their own OPs is self-repairs (Levelt,
1983, 1989). Self-repairs correspond to a modification (or an intention to modify) of what is
perceived as being a problem in one’s speech, as observed by an interruption of speech. More
27
Daphnée Simard
Moment of interruption
Original utterance Editing Phase Repair
The man opened the sui /euh/ his suitcase
Reparandum Delay Editing term Reparatum
Figure 2.2 Conceptualization of self-repair adapted from Levelt (1983). Example from Simard
et al. (2017).
specifically, a self-repair sequence consists of a reparandum (i.e., the element being the object
of a modification), an editing phase and the repair proper (i.e., the new formulation), also
called reparatum, as depicted in Figure 2.2.
In the example provided in Figure 2.2, the reparandum corresponds to “the sui” and the
reparatum, to “his suitcase”, separated by an editing term represented by /euh/, which is not
necessarily present in self-repair sequences.
The perceived problems that trigger self-repairs can vary. These categories include speech errors,
syntactic flaws, or conceptual mismatches between the original intention and the emerging message
(see Levelt, 1983, for self-repair categories; see also Postma, 2000). In this sense, self-repairs do not
always target errors, and when they do, the repair proper is not necessarily correct (Levelt, 1983).
Furthermore, Levelt (1983, 1989) identifies two broad categories of self-repairs: Covert self-repairs,
which consist of false starts, hesitations, and pauses and occur when the message is checked before
being articulated; and overt self-repairs, that is, verbalized reformulations, which occur when
learners perceive an element that they wish to change in their productions.
Levelt’s model accounts for various aspects of OP, from the formulation of commu-
nicative intentions to self-regulation, all of which rely on working memory and attention.
Working Memory and Attention in the Blueprint of the Speaker

Oral production is incremental in that the processing at each phase of production can occur
simultaneously, in a parallel manner (Kempen & Hoenkamp, 1987). As soon as a segment or
chunk of preverbal plan is generated in the conceptualizer, the processes associated with the
formulation phase start, and as soon as the grammatical and phonological encoding are
translated into a phonetic plan, the articulation of that part of the message occurs. Therefore,
the articulation of a message can start well before the conceptualization of the full message is
completed. For this to happen smoothly, a working memory is necessary for storing parts of
messages generated at different phases of the OP process for availability for subsequent phases.
To define working memory, Levelt refers to Baddeley’s (1986) multicomponential model.
According to this model, working memory, which involves the temporary storage and ma-
nipulation of information, is composed of a control system – the central executive and three
storage subsystems – the visual-spatial sketch pad, the phonological loop and more recently,
the episodic buffer (Baddeley, 2003). The central executive corresponds to a higher-level
control system (1986, 1996; Baddeley & Hitch, 1974) whose function is to coordinate the
transfer of information from the subsystems to long-term memory (Baddeley, 2010, 2015).
The phonological loop consists of verbal storage and articulatory recapitulation. It is believed
28
to be responsible for phonological memory, that is, “the ability to recognize and remember
phonological sequences in the order in which they occur” (Baddeley et al., 1998). The visual
sketch pad is responsible for the short-term storage of visuospatial information. Finally, the
episodic buffer (Baddeley, 2000) regulates short-term storage of information from the other
two subsystems and the creation of multimodal representations by binding both visuospatial
and verbal information (Baddeley, 2003). It also allows interaction between the subsystems
and long-term memory (Baddeley, 2000, 2010).
The conceptualization of the message, including the generation of its communicative in-
tention, exerts the greatest demand on the speaker’s working memory as:
Speakers do not have a small, fixed set of intentions that they have learned to realize
in speech. Communicative intentions can vary in infinite ways, and for each of these
ways the speaker will have to find new means of expression. This requires much
attention. (Levelt, 1983, p. 21)
In other words, conceptualization relies on controlled processes which are slower, involve
attention and are constrained by working memory limits (Shiffrin & Schneider, 1977).
Conversely, the processes involved in the formulator and articulator are highly automatized
and consequently put lower demands on working memory, explaining the speed with which
language can be produced (Levelt, 1989). Highly automatized processes operate quickly and
without conscious control (Shiffrin & Schneider, 1977).
As the aforementioned quote indicates, attention is fundamental in the Blueprint of the
Speaker and is assumed to be closely related to working memory. Indeed, in Baddeley’s
working memory model the central executive is similar to Norman and Shallice’s (1986)
supervisory attentional system (Baddeley, 1986, 1996). Levelt (1989, 1999a) uses two types of
representation when talking specifically about attention, one that emphasizes characteristics
of attention and another that focuses on the functions of attention. More specifically, at-
tention is considered to be selective, to shift from one object to another and to fluctuate
during OP (Levelt, 1989, p. 498). The selective aspect of attention, a characteristic of at-
tention (e.g., Filter Theory, Broadbent, 1958; Treisman, 1960), is seen as limited, selective,
and effortful. The shifting and fluctuating aspects of attention are functions: attention
shifting refers to a change in the focus of attention, fluctuating is related to a change in the
intensity of attention focalization (Levelt, 1989). A few years later, Levelt (1999a) equated
attention shift with attention management, specifying that it is mainly solicited during mes-
sage planning, when speakers must shift their attentional resources from one process to
another (e.g., between macroplanning and microplanning), and that attention has to be
selective during self-monitoring. Therefore, two aspects of attention are utilized during OP.
Since Levelt’s model of L1 production, researchers have compared non-native speakers’ pro-
duction with that of native speakers. This led to the adaptation of the model to look more spe-
cifically for similarities and differences between L1 and L2 productions and to characterize in
psycholinguistic terms the various aspects of L2 OP. Similarities and differences are accounted for
to different extents in adaptations of the model for speakers using more than one language in their
OPs (e.g., de Bot, 1992; de Bot & Batyi, this volume; Kormos, 2006, 2011; Segalowitz, 2010).
4 Current Contributions and Research: Blueprint of the Speaker

and L2 Oral Production
Levelt’s Blueprint of the Speaker was adapted for L2 OP (see among others, de Bot, 1992,
1996; de Bot & Schreuder, 1993; Kormos, 2006; Poulisse, 1997; Poulisse & Bongaerts, 1994;
29
Daphnée Simard
Segalowitz, 2010). An often-cited adaptation is the Bilingual Production Model (de Bot,
1992), which shows how one language will be selected over others, among speakers with
balanced or non-balanced levels of proficiency. de Bot argues that since the knowledge of a
situation necessary to convey a message is only available in the conceptualizer, the language
used for expressing that message has to be selected during that OP phase. More specifically,
the language of a given utterance is selected during macroplanning on the basis of in-
formation derived from the discourse model. However, it is only during microplanning that
the encoding specific to the language selected occurs (p. 8). This ensures that the preverbal
message will contain the necessary information for appropriate lexicalization in the
formulator.
The formulator is entirely language specific in the Bilingual Production Model. The
grammatical and phonological encodings specific to the selected language are triggered by
selecting lexical items in the mental lexicon subset (de Bot & Schreuder, 1993). Drawing on
the work of researchers such as Paradis (1981) and Green (1986), de Bot (1992) and de Bot
and Schreuder (1993) argued that lexical items are organized in subsets according to the
languages activated. For Poulisse and Bongaerts (1994), lexical items are tagged with in-
formation about the language to which they belong (e.g., Green, 1986), instead of being part
of subsets. Lexical items are organized in a large network which can be partly activated.
Therefore, as soon as a language is selected in the conceptualizer, all lexical items tagged as
belonging to that language subset can be activated. This activation is mediated by word
frequency. Even though an L2 has been selected by the speaker, if the lemma chosen is more
frequently used in the L1, it might be produced in the L1. Finally, with regard to the ar-
ticulator, de Bot (1992) suggests that there is only one, independent of the numbers of
languages known to the speaker, as exemplified by the persistent foreign accents observed
among L2 speakers.
In her adaptation of Levelt’s model, Kormos (2006, 2011) proposed that the knowledge
necessary for L1 or L2 OP in long-term memory is divided into episodic memory, semantic
memory, and declarative memory. Episodic memory is responsible for storing temporally
organized events or episodes experienced by the speaker (Kormos, 2011, p. 41, see also
Tulving, 1972). Semantic memory contains the mental lexicon, with conceptual information,
lemmas and lexemes. Finally, the declarative memory added by Kormos in her adaptation of
Levelt’s model, accounts for L2 syntactic and phonological rules known to the speaker that
are not proceduralized to the extent they are in L1. L2 vocabulary knowledge could also be
added to the declarative memory as defined by Kormos. In this regard, Segalowitz, in his
adaptation of de Bot (1992) and Levelt (1999a), for the explanation of L2 fluency, made a
distinction between the lexicon and vocabulary. Citing Paradis’ work, he distinguished the
lexicon which corresponds to the implicit knowledge of the meaning of words, their use, and
their syntactic properties, from vocabulary knowledge that consists of the explicit knowledge
of words. Therefore, in a model for L2 OP, there should be a possible storage for vocabulary
knowledge because it is likely to be an important part of the L2 speaker’s source of
knowledge for production.
Although there seems to be agreement that Levelt’s model can be used to describe L2
production (e.g., de Bot, 1992; Poulisse, 1997; Segalowitz, 2010) several differences between
L1 and L2 productions have been suggested. Crucially, L1 OP is considered mainly sub-
conscious and automatized (e.g., Levelt, 1989), unlike L2 OP which relies heavily on working
memory and attention (e.g., Dörnyei & Kormos, 1998; Kormos, 2006; Segalowitz, 2010).
The next part presents a discussion of their mediating role in the psycholinguistic processes
involved in L2 OP.
30
5 Main Research Methods: Working Memory and Attention

During L2 Oral Production
As mentioned previously, working memory and attention maintain an intimate and complex
relationship, whether it is in the form of the central executive in Baddeley’s working memory
model or in the form of their supporting role in each other’s performance. In that respect,
attention is considered necessary for keeping active language information in working
memory (Robinson, 2003); conversely, an overstretched working memory leads to defi-
ciencies in attention management (Towell & Dewaele, 2005). To better understand the dis-
tinctive mediating role each plays in the psycholinguistic processes involved in L2 OP, the
way they are respectively investigated in L2 research, in general, will be addressed.
First, regarding the study of working memory in L2 research, its components, as devel-
oped by Baddeley, have been investigated as separate entities (e.g., Martin & Ellis, 2012;
Miyake & Shah, 1999; Wen, 2016). On the one hand, the central executive is measured
through complex memory tasks (Wen, 2016) that require participants to store and manipulate
information at the same time, for example, a reading span test in which participants read
sentences and judge them for acceptability while keeping the last word of each one in mind
until prompted to repeat it. On the other hand, the storage subcomponents of working
memory, such as the phonological loop and visuospatial subcomponents, are measured
through simple memory tasks, in which participants must retain information for short laps-
of-time (Wen, 2016). An example of such a task measuring the phonological loop is the
repetition of non-word task in which participants listen to a string of unfamiliar sounds and
repeat them in the sequence they were heard. Figure 2.3 depicts the relationship between the
components of working memory and the instruments used to measure them.
Next, the term “attention” is used in many different ways, including attentional control,
attentional management, selective attention, and attention focus. L2 research on attention
has been conducted from two broad angles (Kormos, 2011; Tomlin & Villa, 1994; Simard &
Wong, 2001; see Zuniga, 2015 for a detailed discussion). Attention’s characteristics, in-
cluding its limited capacity, effortfulness, and selectiveness have been examined. In parti-
cular, selective attention, defined as “the volitional control over choosing relevant stimuli and
ignoring irrelevant ones” (Allport, 1987, p. 50) is deemed important for aspects of L2 de-
velopment (e.g., Kormos, 2006, 2011; Robinson, 2003). Attention has also been studied in
terms of the functions it carries out, such as alertness, orientation, detection, and more
Working Memory Components
Control system Storage sub-systems
Visuo-spatial
Episodic sketchpad
Central executive buffer
Phonological loop
Storage-and-Processing Temporary-storage only

Instruments measuring these are Instruments measuring this are referred to as
referred to as complex memory tasks simple memory tasks
Figure 2.3 Working Memory Measurement. Reprinted from Simard et al. (2020) with permission.
31
Daphnée Simard
recently, attentional shift and attentional control (e.g., Segalowitz, 2007; Simard & Wong,
2001; Tomlin & Villa, 1994). A plethora of instruments such as the Trail Making Test
(Reitan, 1958) and the D2 test of attention (Brickenkamp & Zillmer, 1998), or the eye-
tracker paradigm have been used to measure attention in L2 research. This highlights a lack
of consensus, reflected in the variety of tests, each measuring one or more aspects of at-
tention, which still prevails around the nature of attention (Kormos, 2011; See Robinson,
2003 for an in-depth discussion).
That being said, the role played by working memory and attention in L2 OP has been
extensively studied. Research reveals that both the executive aspect of working memory, and
by extension, attentional control and the storage subsystems enhance L2 production and
comprehension success (see Linck et al., 2014 for a meta-analysis; Wen et al., 2013).
However, their role in the different psycholinguistic processes, as described in Levelt’s
model – conceptualizer, formulator, articulator, and self-monitoring – during L2 OP has
been somewhat neglected (e.g., Fortkamp, 1999; Kormos, 2006). Nevertheless, the psycho-
linguistic processes involved in OP should be closely considered in regard to variation in
working memory and attention, as they are automatized in L1 but not in L2, especially
among speakers with lower levels of proficiency (de Bot, 1992; Dörnyei & Kormos, 1998,
Kormos, 2011).
Despite the difficulties in investigating each psycholinguistic process individually during
OP, some proposals have been suggested. In her study, Fortkamp (1999) examined the re-
lationship between the executive aspect of working memory as measured by a speaking span
task (administered in Portuguese L1 and English L2) and fluency in conceptualizing a
message through a speech generation task (picture description) and fluency in articulation
through an oral reading task and an oral slip task (aimed at eliciting spoonerism errors, i.e.,
speech errors involving phoneme exchanges). She found that individuals with greater
working memory spans (measured in English, the participants’ L2) were better at the speech
generation task in English L2, confirming according to the author, the reliance of the con-
ceptualizer on working memory. However, no association was found with either of the ar-
ticulation tasks and the speaking span test. This study’s results provide insights into the
differential impact of working memory on psycholinguistic processes, and as Levelt stated,
conceptualization seems to rely more heavily on working memory than articulation does.
Instead of focusing on tasks targeting each psycholinguistic process, one might manipulate
task conditions, to ease cognitive demand at different phases of OP (see Skehan, 2015, for a
detailed discussion). These manipulations include presence or absence of preplanning time,
presence or absence of on-line planning time and repetition of the same task (Skehan, 2015).
For instance, preplanning time, along with as much time as needed to produce language (i.e.,
on-line planning) should ease conceptualization, and consequently, allow the allocation of
more resources to formulation, which is believed to be taxing because of the grammatical and
phonological encoding that are based on partial L2 knowledge (e.g., de Bot, 1992). Conversely,
if L2 speakers are given as much time as needed to produce, but are not given preplanning
time, it is assumed that the formulator will be eased, but not the conceptualizer. Finally, one
might assume that the repetition of the same oral task would prime knowledge in semantic and
episodic memory (see Kormos, 2006, 2011), and therefore lessen demands on both working
memory and attention, facilitating grammatical and phonological encoding in the formulator.
Many L2 studies have focused exclusively on self-monitoring during L2 OP. Typically, self-
monitoring has been investigated by looking at its observable manifestation, that is, self-repairs.1
Some L2 studies categorized self-repairs according to the classification originally proposed by
Levelt (1983), and others created their own, based on Levelt (e.g., Bange & Kern, 1996; Kormos,
1999a), making comparisons across studies difficult. However, in general, two broad categories
32
representing two general levels of psycholinguistic processing are present in these classifications.
Self-repairs either target (1) discourse-level elements (at the conceptualizer level) by changing
words or groups of words, to modify the informational structure of the message, or (2) form-level
elements (at the formulator level), by modifying a form perceived as inaccurate (Simard et al.,
2011; Simard et al., 2017; Zuniga 2015; Zuniga & Simard, 2019). Interestingly, results from
factorial analyses show that self-repairs in the two categories load onto a different factor, con-
firming their independence in the production process (see Simard et al., 2016).
Some of the self-repair studies specifically examined their relationship with the central ex-
ecutive aspect of working memory, the phonological loop, and various aspects of attention.
Although in most cases a relationship between the central executive, as measured by complex
memory tasks and the production of self-repairs, and a larger working memory span was
associated with fewer self-repairs targeting choice of words or groups of words2 (e.g.,
Ahmadian, 2015; Mojavezi & Ahmadian, 2013; Simard et al., 2020), one study did not find any
(Georgiadou & Roehr-Brackin, 2017). It could be argued that since the complex memory task
used in Georgiadou and Roehr-Brackin (2017) was administered in the participants’ L2, their
results differed from those of other studies, in which the working memory tasks were ad-
ministered in the participants’ L1. This result is contrary to Fortkamp (1999) who observed a
significant correlation only between the working memory task administered in the participant’s
L2 and the L2 speech generation task. However, it must be noted that her participants’ OPs
were coded subjectively by two judges for fluency on a scale from 1 to 5. More studies are
needed to clarify the role of the executive aspect of working memory in self-monitoring.
Regarding working memory subcomponents, only the phonological loop has been in-
vestigated in relation to self-repairs. Simard and her colleagues (2016) found that a better
score on the non-word repetition task was associated with fewer self-repairs targeting lan-
guage forms among their intermediate proficiency participants. The phonological loop was
not associated with self-repairs targeting discourse-level elements.
Different aspects of the relationship between self-repairs and attention have been in-
vestigated (e.g., Simard et al., 2011; Zuniga, 2015; Zuniga & Simard, 2019). For instance,
Simard and her colleagues (2011) examined attentional capacity, that is, the capacity to
maintain concentration of attention across time. No relationship between self-repairs and the
limited-capacity characteristic of attention, as measured by the attentional capacity D2 test
was found. The authors argued attention shift is necessary for monitoring. This claim was
later verified and supported (Simard et al., 2016; Zuniga, 2015; Zuniga & Simard, 2019).
A closer look at the results obtained from the studies presented earlier reveals that the
complex relationship between monitoring, attention shift, phonological memory, and the ex-
ecutive aspect of working memory is a function of the language elements targeted for mon-
itoring. Indeed, phonological memory was exclusively and positively associated with self-
repairs targeting forms (Simard et al., 2016), while the executive aspect of working memory
and attention shift were negatively associated with self-repairs targeting discourse-level con-
ceptual elements (Ahmadian, 2015; Mojavezi & Ahmadian, 2013; Simard et al., 2020). Given
that attention must be controlled to shift it from one stimulus to another, and the executive
aspect of working memory is an attention–control system, it is no surprise that both measures
led to similar results (Simard et al., 2016).
6 Future Directions
Cognitive resources (i.e., working memory and attention) interact in a distinct manner with
the psycholinguistic processes (i.e., the language processing mechanisms such as grammatical
and phonological encodings) involved during L2 OP. Although we have some information
33
Daphnée Simard
regarding how working memory and attentional resources interact with self-monitoring
during L2 OP, much less information is available regarding their interaction with con-
ceptualization, formulation, and articulation. More research is needed. For instance, a re-
search programme should examine the interaction between working memory (executive and
storage aspects), attention (characteristics and functions), and a systematic manipulation of
task conditions among various L2 speaker populations (e.g., different age groups, levels of
proficiency, combinations of languages spoken). Additionally, the idea that OP tasks
themselves can target specific psycholinguistic processes should be further investigated (see
Fortkamp, 1999), and more specifically whether a given task can really isolate formulation.
Van Moere (2012) suggested that elicited repetition provides information regarding pho-
nological and grammatical encoding accuracy, which are both psycholinguistic processes
occurring during the formulation phase. This is an interesting path to examine in relation to
other measures of OP and cognitive resources.
Finally, proposals formulated in the adaptations of Levelt’s model for L2 OP should be
tested. Among others, L2 research on memory has focused on Baddeley’s model oper-
ationalization. However, it would be interesting to investigate the capacity to access the
additional storage of information in long-term memory (see Kormos, 2006) using episodic
and semantic memory tasks (e.g., Vallet et al., 2017) to verify how these measures interact
with the psycholinguistic processes involved during L2 production. The same could be done
for the relationship with a declarative memory measure.
7 Recommendations for Practice

The psycholinguistic processes described earlier show the heavy demand on memory and atten-
tion in L2 OP, especially spontaneous productions. To alleviate demand on the conceptualizer,
giving L2 speakers preplanning time eases the psycholinguistic processes involved such as com-
municative intents and conceptual retrieval. As for the formulator, ample on-line planning time
should help L2 speakers to retrieve L2 vocabulary and conduct grammatical and phonological
encoding, granted they have the prior knowledge. Repeating the same task increases the possi-
bility of automatization, given the priming effect on the necessary knowledge. The same goes for
articulation, which would benefit from ample production time and task repetition.
All that being said, as Pawlak (2011) put it:
it is necessary to equip learners with requisite systemic knowledge in terms of

grammar, lexis, multiword units, pronunciation, pragmatic routines and para-
linguistic means; sensitize them to the distinctive features of the spoken language
and to make them cognizant of how such resources can be employed to convey
intended meanings in a range of situations. (p. 15)
Notes
1 In the field of L2 acquisition research, self-repairs have been analyzed from different angles. Among
other things, their nature, frequency, and distribution in L2 OP (i.e., characteristics of self-repairs)
have been described in relation to the language features being repaired (e.g., Kormos, 1998), the
level of proficiency of the L2 speakers (e.g., Gilabert, 2007; Kormos, 1999b, 2000a, 2000b), or the
development of their language proficiency (e.g., Griggs, 1997; Kormos, 1999a, 2000a, 2000b), and
with the degree of complexity of different narrative tasks (i.e., contextual characteristics of OP) (e.g.,
Gilabert, 2007; Kormos, 1999b).
34
2 In their work, Mojavezi and Ahmadian (2013) and Ahmadian (2015) refer to discourse change,
which is similar in definition to what Simard and colleagues call choice of words or groups of words.
Further Reading
Kormos, J. (2006). Speech production and second language acquisition. Mahwah: Lawrence Erlbaum
Associates.
Segalowitz, N. (2010). Cognitive bases of second language fluency. New York: Routledge.
References
Ahmadian, M. J. (2015). Working memory, online planning and L2 self-repair behaviour. In Z. Wen,
M. B. Mota & A. McNeill (Eds.), Working memory in second language acquisition and processing
(pp. 160–174). Bristol, UK: Multilingual Matters.
Allport, D. A. (1987). Selection for action: Some behavioral and neurophysiological considerations of
attention and action. In H. Heuer & A. F. Sanders (Eds.), Perspectives on perception and action
(pp. 395–419). Hillsdale, NJ: Lawrence Erlbaum Associates.
Baddeley, A. D. (2015). Working memory in second language learning. In Z. Wen, M. B. Mota & A.
McNeill (Eds.), Working memory in second language acquisition and processing (pp. 17–28). Bristol:
Multilingual Matters.
Baddeley, A. D. (2010). Long-term and working memory: How do they interact? In L. Bäckman & L.
Nyberg (Eds.), Memory, aging and the brain: A festschrift in honour of Lars-Göran Nilsson
(pp. 18–30). Hove, UK: Psychology Press.
Baddeley A. D. (2003). Working memory and language: An overview. Journal of Communication
Disorders, 36, 189–208.
Baddeley, A. D. (2000). The episodic buffer: A new component of working memory? Trends in
Cognitive Science, 4, 417–423.
Baddeley, A. D. (1996). Exploring the central executive. The Quarterly Journal of Experimental
Psychology: Section A, 49, 5–28.
Baddeley, A. D. (1986). Working memory. Oxford: Oxford University Press.
Baddeley, A. D., Gathercole, S., & Papagano, C. (1998). The phonological loop as a language learning
device. Psychological Review, 105, 158–173.
Baddeley, A. D., & Hitch, G. (1974). Working memory. In G. H. Bower (Ed.), The psychology of
learning and motivation: Advances in research and theory (Vol. 8, p. 47–89). New York: Academic
Press.
Bange, P., & Kern, S. (1996). La régulation du discours en L1 et en L2. Études Romanes, 35, 69–103.
Brickenkamp, R., & Zillmer, E. (1998). The d2 Test of Attention. Seattle, WA: Hogrefe & Huber.
Broadbent, D. E. (1958). Perception and communication. New York: Pergamon.
Bygate, M. (2008). Oral second language abilities as expertise. In K. Johnson (Ed.), Expertise in second
language learning and teaching (pp. 104–127). New York: Palgrave Macmillan.
de Bot, K. (1992). A bilingual production model: Levelt’s “speaking” model adapted. Applied
de Bot, K., & Schreuder, R. (1993). Word production and the bilingual lexicon. In R. Schreuder & B.
Weltens (Eds.), The bilingual lexicon (pp. 191–214). Amsterdam: John Benjamins.
Dell, G. (1986). A spreading-activation theory of retrieval in sentence production. Psychological
Review, 93, 283–321.
Dörnyei, Z., & Kormos, J. (1998). Problem-solving mechanisms in L2 communication: A psycho-
linguistic perspective. Studies in Second Language Acquisition, 20, 349–385.
Fodor, J. A. (1983). The modularity of the mind. Bradford: MIT Press.
Fortkamp, M. B. M. (1999). Working memory capacity and aspects of L2 speech production.
Communication and Cognition, 32, 259–296.
Garrett, M. F. (1984). The organization of processing structure for language production. Applications
to aphasic speech. In D. Caplan, A. R. Lecours & A. Smith (Eds.), Biological perspectives on lan-
guage (pp. 172–193). Cambridge, MA: MIT Press.
35
Daphnée Simard
Georgiadou, E., & Roehr-Brackin, K. (2017). Investigating executive working memory and phonolo-
gical short-term memory in relation to fluency and self-repair behaviour in L2 speech. Journal of
Psycholinguistic Research, 46, 877–895.
Gilabert, R. (2007). Effects of manipulating task complexity on self-repairs during L2 OP. International
Journal of Applied Linguistics, 45, 215–240.
Golonka, E. (2006). Predictors revised: Linguistic knowledge and metalinguistic awareness in second
language gain in Russian. Modern Language Journal, 90, 496–505.
Green, D. W. (1986). Control, activation and resource: A framework and a model for the control of
speech in bilinguals. Brain and Language, 27, 210–223.
Griggs, P. (1997). Metalinguistic work and the development of language use in communicative pair-work
activities involving second language learners. In L. Diaz & C. Pérez (Eds.), Views on the acquisition and
the use of second languages (pp. 403–415). Barcelona, Spain: Universitat Pompei Fabrat.
Izumi, S. (2003). Comprehension and production processes in second language learning: In search of
the psycholinguistic rationale for the output hypothesis. Applied Linguistics, 24, 168–196.
Jackendoff, R. (1987). Consciousness and the computational mind. Cambridge, MA: MIT Press.
Kempen, G., & Hoenkamp, E. (1987). An incremental procedural grammar for sentence formulation.
Cognitive Science, 11, 201–258.
Kormos, J. (2011). Speech production and the cognition hypothesis. In P. Robinson (Ed.), Second
language task complexity: Researching the Cognition Hypothesis of language learning and perfor-
mance (pp. 39–60). Philadelphia: John Benjamin.
Kormos, J. (2006). Speech production and second language acquisition. Mahwah, NJ: Lawrence Erlbaum
Associates.
Kormos, J. (2000a). The timing of self-repairs in second language speech production. Studies in Second
Language Learning, 22, 145–167.
Kormos, J. (2000b). The role of attention in monitoring second language speech production. Language
Learning, 50, 343–384.
Kormos, J. (1999a). Monitoring and self-repair in L2. Language Learning, 49, 303–342.
Kormos, J. (1999b). The effect of speaker variables on the self-correction behaviour of L2 learners.
System, 27, 207–221.
Kormos, J. (1998). A new psycholinguistic taxonomy of self-repairs in L2: A qualitative analysis with
retrospection. Even Yearbook, ELITE SEAS Working Papers in Linguistics, 3, 43–68.
Levelt, W. J. M. (2001). Relations between speech production and speech perception: Some behavioral
and neurological observations. In E. Dupoux (Ed.), Language, brain and cognitive development:
Essays in honour of Jacques Mehler (pp. 241–256). Cambridge, MA: MIT Press.
Levelt, W. J. M. (2000). Psychology of language. In K. Pawlik & M. R. Rosenzweig (Eds.),
International handbook of psychology (pp. 151–167). London: SAGE publications.
Levelt, W. J. M. (1999a). Language production: A blueprint of the speaker. In C. Brown & P. Hagoort
(Eds.), Neurocognition of language (pp. 83–122). Oxford, England: Oxford University Press.
Levelt, W. J. M. (1999b). Models of word production. Trends in Cognitive Sciences, 3, 223–232.
Levelt, W. J. M. (1996). Perspective taking and ellipsis in spatial descriptions. In P. Bloom, M. A.
Peterson, M. F. Garrett & L. Nadel (Eds.), Language and space (p. 77–107). Cambridge, MA: MIT
Press.
Levelt, W. J. M. (1995). The ability to speak: From intentions to spoken words. European Review,
3, 13–23.
Levelt, W. J. M. (1993). Psycholinguistics. In A. Colman (Ed.), Companium encyclopedia of psychology
(Vol. 1, pp. 319–337). London: Routledge.
Levelt, W. J. M. (1992). Accessing words in speech production: Stages, processes and representations.
Levelt, W. J. M. (1989). Speaking: From intention to articulation. Cambridge, MA: MIT Press.
Levelt, W. J. M. (1983). Monitoring and self-repair in speech. Cognition, 14, 41–104.
Levelt, W. J. M., & Wheeldon L. (1994). Do speakers have access to a mental syllabary? Cognition,
50(1–3), 239–269.
Levelt, W. J. M., Roelofs, A., & Meyer, A. S. (1999). A theory of lexical access in speech production.
Behavioral and Brain Sciences, 22, 1–38.
Linck, J. A., Osthus, P., Koeth, J. T., & Bunting, M. F. (2014). Working memory and second language
comprehension and production: A meta-analysis. Psychonomic Bulletin & Review, 21, 861–883.
Martin, K. I., & Ellis, N. C. (2012). The roles of phonological short-term memory and working memory
in L2 grammar and vocabulary learning. Studies in Second Language Acquisition, 34, 379–413.
36
Mojavezi, A., & Ahmadian, M. J. (2013). Working memory capacity and self-repair behaviour in first
and second language oral production. Journal of Psycholinguistic Research, 43, 289–297.
Miyake, A., & Shah, P. (Eds.). (1999). Models of working memory: Mechanisms of active maintenance
and executive control. New York: Cambridge University Press.
Norman, D. A., & Shallice, T. (1986). Attention to action: Willed and automatic control of behaviour.
In R. J. Davidson, G. E. Schwartz & D. Shapiro (Eds.), Consciousness and self-regulation. Advances
in research and theory (pp. 1–18). New-York: Plenum Press.
Özdemir, R., Roelofs, A., & Levelt, W. J. M. (2007). Perceptual uniqueness point effects in monitoring
internal speech. Cognition, 105, 457–465.
Paradis, M. (1981). Neurolinguistic organization of a bilingual’s two languages. In J. E. Copeland & P.
W. Davis (Eds.), The seventh LACUS forum (pp. 486–494). Columbia, SC: Hornbeam Press.
Pawlak, M. (2011). Instructed acquisition of speaking: Reconciling theory and practice. In M. Pawlak,
E. Waniek-Klimczak & J. Majer (Eds.), Speaking and instructed foreign acquisition (pp. 3–23).
Toronto, Canada: Multilingual Matters.
Postma, A. (2000). Detection of errors during speech production: A review of speech monitoring
models. Cognition, 77, 97–131.
Poulisse, N. (1997). Language production in bilinguals. In A. de Grot & J. Kroll (Eds.), Tutorials in
bilingualism: Psycholinguistic perspectives (pp. 201–224). Hillsdale, NJ: Lawrence Erlbaum.
Poulisse, N., & Bongaerts, T. (1994). First language use in second language production. Applied
Reitan, R. (1958). Validity of the Trail Making Test as an indicator of organic brain damage.
Perceptual and Motor Skills, 8, 271–276.
Robinson, P. (2003). Attention and memory during SLA. In C. J. Doughty & M. H. Long (Eds.), The
handbook of second language acquisition (pp. 631– 678). London: Blackwell Publishing.
Segalowitz, N. (2007). Access fluidity, attention control, and the acquisition of fluency in a second
language. TESOL Quarterly, 41, 181–186.
Shiffrin, R.M., & Schneider, W. (1977). Controlled and automatic human information processing: II.
Perceptual learning, automatic attending, and a general theory. Psychological Review, 84, 127–190.
Skehan, P. (2015). Working memory and second language performance. In Z. Wen, M. Mota & A.
McNeill (Eds.), Working memory in second language acquisition and processing (pp. 189–201).
Bristol: Multilingual Matters.
Simard, D., Bergeron, A., Liu, Y.-G., Nader, M., & Redmond, L. (2016). Production d’autor-
eformulations autoamorcées en langue seconde: rôle de l’attention et de la mémoire phonologique.
Revue canadienne des langues vivantes/Canadian Modern Language Review, 72, 183–210.
Simard, D., Fortier, V., & Zuniga, M. (2011). Attention et production d’autoreformulations
autoamorcées en français langue seconde, quelle relation? Journal of French Language Studies, 21,
417–436.
Simard, D., French, L., & Zuniga, M. (2017). Evolution of L2 self-repair behavior over time among
adult learners of French. Revue canadienne de linguistique appliquée/ Canadian Journal of Applied
Simard, D., Molokopeeva, T., & Zhang, Q. Y. (2020). The contribution of working memory to L2
French pronunciation among adult language learners. Canadian Modern Language Review/Revue
canadienne des langues vivantes, 76, 50–69.
Simard, D., & Wong, W. (2001). Alertness, orientation, and detection: The conceptualization of at-
tentional functions in SLA. Studies in Second Language Acquisition, 23, 103–124.
Slobin, D. I. (1987). Thinking for speaking. Proceedings of the thirteenth annual meeting of the Berkeley
linguistics society (pp. 435–444). Berkeley, CA: Berkeley Linguistics Society.
Tomlin, R., & Villa, H. (1994). Attention in cognitive science and second language acquisition. Studies
in Second Language Acquisition, 16, 183–203.
Towell, R., & J.-M. Dewaele. (2005). The role of psycholinguistic factors in the development of fluency
amongst advanced learners of French. In J.-M. Dewaele (Ed.), Focus on French as a foreign lan-
guage: Multidisciplinary approaches (pp. 210–239). Toronto, Canada: Multilingual Matters.
Treisman, A. (1960). Contextual cues in selective listening. Quarterly Journal of Experimental
Psychology, 12, 242–248.
Trueswell, J. C., Tanenhaus, M. K., & Garnsey, S. M. (1994). Semantic influences on parsing: Use of
thematic role information in syntactic ambiguity resolution. Journal of Memory and Language, 33,
285–318.
37
Daphnée Simard
Tulving, E. (1972). Episodic and semantic memory. In E. Tulving & W. Donaldson (Eds.), Organization
of memory (pp. 381–403). New York: Academic Press.
Vallet G. T., Hudon, C., Bier, N., Macoir, J.,Versace, R., & Simard, M. (2017). ASEMantic and
EPisodic Memory Test (SEMEP) developed within the embodied cognition framework: Application
to normal aging, Alzheimer’s disease and semantic dementia. Frontiers in Psychology, 8, 1493.
van Hest, E. (1996). Self-repair in L1 and L2 production. Tilburg, Netherlands: Tilburg University Press.
Van Moere, A. (2012). A psycholinguistic approach to oral language assessment. Language Testing, 29,
325–344.
Wen, Z. (2016). Working memory and second language learning: An integrated approach. Bristol, UK:
Wen, Z., Mota, M. B., & McNeil, A. (2013). Working memory and SLA: Towards an integrated
theory. Asian Journal of English Language Teaching, 23, 1–18.
Zuniga, M. (2015). The role of attention in L2 speech production. Thèse de doctorat inédite, Québec,
Canada: Université du Québec à Montréal.
Zuniga, M., & Simard, D. (2019). Factors influencing L2 self-repair behavior: The role of L2 profi-
ciency, attentional control and L1. Journal of Psycholinguistic Research, 48, 43–59.
38
3
A COMPLEX DYNAMIC SYSTEMS
THEORY PERSPECTIVE ON
SPEAKING IN SECOND LANGUAGE
DEVELOPMENT
Complex Dynamic Systems Theory (CDST) addresses the process of language development
over time, rather than the outcomes of a process. The process is commonly described in terms
of patterns of change, which include stages of development in variability, stabilization, and
destabilization. The general goal of CDST-inspired studies is to come to an understanding of
the way in which the complex interaction of numerous forces leads to behavioural changes, to
understand how development comes about. If there is one conclusion that we can safely draw
from about 20 years of research into second language development from a CDST perspective,
it is that language development is a highly individual developmental process; language de-
velopment is not predetermined, but emerges from the interaction and coordination of sub-
systems, also referred to as “self-organization” (Smith & Thelen, 2003).
The term “systems” is central in systems theories like CDST. A system is the conglom-
eration of connected elements that form a coherent whole (Bertalanffy, 1995). The elements
in CDST are referred to as subsystems, which are also systems that may again consist of
subsystems. For instance, the language system is embedded in the larger system of cognition,
which in turn is embedded in the larger system of the human being, which is embedded in the
larger system of a speech community. The language system itself consists of several em-
bedded subsystems, like phonology and vocabulary, with language-specific subsystems for
multilinguals.
Subsystems are open and all these systems are connected regardless of the degree of
embeddedness. The changes in coordinated and interdependent subsystems form the foun-
dation of the dynamic and non-linear nature of development. As changes in any of the
subsystems may lead to changes in other subsystems, development is characterized as an
iterative process in which each stage in development is based on the system’s preceding state.
And since the combination of subsystems and the nature and timing of their interaction is
essentially unique for each person, this leads to an iterative developmental process that is
strongly individual and cannot be predetermined. The logical consequence of this is that
CDST-inspired studies tend to be longitudinal case studies that focus on the process of
development.
DOI: 10.4324/9781003022497-5 39
A growing number of CDST-inspired studies have shown that although the steps in
language development may be globally similar among learners, the timing and the magnitude
of the development strongly depends on individual differences and on changes in the in-
teracting factors that contribute to the learner’s development, including the learner’s context
or environment. CDST studies are characterized by longitudinal observations of individual
learners with dense measurements, allowing for reflections on the individual process of de-
velopment rather than focusing on generalized products of learning for groups of learners at
one moment in time.
So far, most CDST studies in second language development have focused on the devel-
opment of writing for different levels of learners in various contexts, but a small number of
studies have also focused on speaking, which is quite complex. The number of interacting
subsystems relevant for speaking is relatively large, as speaking is generally less controlled
than writing, and the contexts in which speaking is used are generally natural and ecological.
For oral production, the operation of the skills is coupled with context, including the dyad.
This is clearly shown by the occurrence of alignment and convergence during speech. People
have a tendency to adjust to the context, and likewise the context will then adjust to the
speaker, leading to an active form of coordination of the relevant subsystems. The short-term
developmental process of the speaker is coupled with that of other speakers and can be found
in all or several of the relevant subsystems, from timing and articulation in the speaker’s
pronunciation to the use of non-verbal gestures.
Another typical characteristic of dynamic systems is the self-similar nature of embedded
subsystems, also referred to as fractals. Each time we zoom in to a dynamic structure,
similar patterns can be perceived. The repeated patterns are clearly illustrated in
Mandelbrot sets, but can also be seen in many naturally occurring phenomena like cau-
liflowers and trees. The pattern of the skeleton of the tree is repeated in increasingly smaller
structures, from branches to leaves. Also, in the time domain fractals have been identified
in dynamic systems, when a certain pattern of variability is repeated at smaller timescales
(Rhea et al., 2014). Most evidence for fractal structures in the time domain have been
found at the short timescale of language processing. During speech comprehension, a
hierarchy of linguistic structures has been identified in neural tracking at different time-
scales (Zhang & Ding, 2017). Moreover, a fractal dimension has been found in the
variability of speech production during simple naming tasks in the L1 (Holden et al., 2009),
as well as in second language naming tasks (Plat et al., 2018). A fractal structure has also
been identified in the diachronic development of syntactic complexity (Evans & Larsen-
Freeman, 2020).
In the following parts, we will contextualize CDST research in a historical framework and
discuss critical issues. We will then discuss the strongly developing methods for process-
based research relevant for speaking and will mention some challenges and future directions
for the CDST framework.
CDST is founded in well-accepted dynamic theories of physics, mathematics, and demo-
graphy. Over the past three decades, applications of CDST to cognition and psychology have
been very influential (Thelen & Smith, 1994) and applications to language development and
second language development have caused a major turn in applied linguistics. Two of the
most recent turns that have influenced our thinking about second language development
today occurred in the late 1990s, when psycholinguistic and neurolinguistic experimental
40
Second Language Development
methods became an accepted line of research. The two shifts have been referred to as the
Social Turn and the Dynamic Turn.
The Social Turn (see Block, 2003 for a detailed critical review), initiated by the seminal
and controversial paper by Firth and Wagner (1997), was a strong negative reaction to the
idea that language learning can be investigated through controlled experiments. A theoretical
perspective closely associated with the social turn is Sociocultural Theory (SCT), linking
society to individual development. For second language learning, the central premise of SCT
is that any form of human cognitive development is essentially mediated by cultural artefacts
(Lantolf & Thorne, 2006). Consequently, language development cannot be studied outside its
authentic communicative context. Levine (2020) argues that SCT and CDST are commen-
surable and complementary frameworks.
The second major paradigm shift, the Dynamic Turn (de Bot, 2015), shares several of its
assumptions with the social turn and its theoretical impacts. Similar to SCT, dynamic the-
ories do not consider language development as isolated activity in the cognitive domain. But
different from SCT, Dynamic theories do not emphasize the opposition between cognition
and sociocultural artefacts, but stress their integration. Both theories consider language
development as the ongoing, emerging process of an integrated holistic system, which in-
cludes a wide range of connected, embedded and embodied subsystems. The onset of this
development dates back to 1997, when Diane Larsen-Freeman published a groundbreaking
paper on Complex Adaptive Systems in Applied Linguistics (Diane Larsen-Freeman, 1997).
The paper emphasized the dynamic nature of language development, which is described as a
journey with no end state. Years later, the dynamic turn was reinforced by a number of
papers and books, such as Herdina and Jessner (2002), de Bot et al. (2005, 2007), Larsen-
Freeman and Cameron (2008), Dörnyei (2009), and Verspoor et al. (2011). Some authors
used the term Dynamic Systems Theory, while others used the term Complex System, but the
main theoretical implications were the same. Therefore, it was decided to use the combined
term Complex Dynamic Systems Theory (CDST) (de Bot, 2017).
Today, an increasing number of scholars are doing CDST-inspired research into second
language development. The focus of their studies is diverse, from very fundamental studies
on the self-organizing nature of language use in real time (Plat et al., 2018) to theoretical
considerations about CDST research (Hiver & Al-Hoori, 2016), studies that focus on the
identification stages of development as they emerge over time (Baba & Nitta, 2014);
process-based research that focuses on the development of accuracy and complexity by
studying variability in second language development (Spoelman & Verspoor, 2010);
and pedagogical implications of a CDST framework (Levine, 2020). For practical reasons,
most studies focus on writing, though recently some work on different aspects of the dy-
namic development of speaking has been published (Hepford, 2017; Lowie et al., 2018;
Polat & Kim, 2014; Roehr-Brackin, 2014; Yu & Lowie, 2019). We will elaborate on these
contributions in Part 4.
The dynamic nature of acquisition has also been addressed from a theoretical angle.
Following Browman and Goldstein (1992), Lima Júnior (2013) proposes that a child’s first
words may not be stored and accessed as separate phonemes, but as “holistic patterns of
articulatory routines” (Browman & Goldstein, 1992, p. 39). The frequent repetition of mi-
crolevel elements leads to the emergence of macrolevel patterns. That is, the pre-linguistic units
gradually develop into gestural units of contrast. During acquisition, the child distinguishes
and adjusts his/her emerging gestures and, simultaneously, learns how to coordinate them, as
the development of the ability to produce all the gestures of a word requires their coordination.
Lima Júnior (2013) concludes that such a CDST and experiential perspective on acquisition
41
may alter not only our view of first language acquisition, but also has implications for L2
acquisition. Because of the iterative processes and eventual entrenchment of patterns (or at-
tractor states in CDST terms), L2 learners associate L1 sound patterns to unknown patterns of
the L2, which reminds us of Flege’s Speech Learning Model (1995, 1999), in which he argues
that since adult L2 learners often fail to distinguish a certain L2 sound from a close L1 sound,
they may classify the L2 sound under a (prototypical) phonological category of their en-
trenched L1 categories.

CDST approaches to second language development focus on the continuous process of
development rather than products of development. They have revealed the complex inter-
action of subsystems over time and have shown that second language development, espe-
cially within the domain of speaking, is a highly individual process. Despite the valuable
deliverables of this approach in a relatively short time span, there are several limitations and
challenges that the application of CDST faces.
It is important to realize that CDST is not a model of second language speech. Rather, it
is a frame of reference within which we can analyze and understand development, an “all-
encompassing, and multidimensional view on reality” (Lowie & Verspoor, 2015), or a meta-
theory (Larsen-Freeman, 2015). Critics of CDST have often pointed out that unlike the
classical definition of scientific theory, a meta-theory such as CDST does not lead to specific
predictions and falsifiable hypotheses. Hulstijn (2020) argues that, similar to other me-
tatheories like Darwin’s theory of evolution, “Language as a Complex Adaptive System is
falsifiable in principle but [is] not likely to be falsified” (2020, p. 8).
Related to this is the observation that findings from longitudinal case studies cannot be
generalized to populations of second language learners. As we explain in Part 5, not only is
it difficult or even impossible to generalize longitudinal observations on (multiple) case
studies, but also it is undesirable to make statements about the interaction of variables over
time for groups of language learners. While group studies can answer research questions
about the general interaction of variables at one moment in time, CDST studies can answer
research questions about the process of development over time. For instance, product-
based studies can show the effect of starting age on the quality of L2 pronunciation after a
fixed number of years of exposure by comparing a group of early starters to late starters, or
the effect of the learner’s L1 on the quality of L2 pronunciation by comparing re-
presentative groups of learners (Derwing & Munro, 2013). Conversely, a process-based
study can investigate how the development evolves over time, and can, for instance, explore
how the vowel production of an L2 learner changes over time, as it initially switches be-
tween L1 and L2 realizations, and gradually develops in the direction of the target reali-
zations after overshooting the values of matched native speaker controls for some time
(Verspoor et al., 2021).
Another critical issue that logically follows from the dynamic, time-dependent focus of
CDST studies is that the timescale or temporal window is a fundamental choice. The focus
on particular subsystems and the timescale selected for a study represent the level of gran-
ularity of the study (Hiver & Al-Hoori, 2016). While some studies report on macrolevel
development over a period of 3 years (Spoelman & Verspoor, 2010), other studies focus on
microlevel development over a time span of 10 minutes (Plat et al., 2018). The choice for the
timescale depends on the development expected at that timescale for the subsystem(s) under
investigation. This choice can be pragmatic, but should also be based on theoretical starting
42
points. Unfortunately, a focus on the relevance of timescales and the fractal nature of lan-
guage comprehension and production, especially for the second language, has been largely
underexplored.
Finally, a critical issue is the choice of measures of (written or spoken) language pro-
duction in time series. When, depending on the timescale, every hour, day, week, month, or
year repeated measurements are taken of language production, we must be sure that the
measures are representative for the production at that moment. Measures that have typically
been used in CDST research are holistic assessment by trained raters or analytic assessment
of language using measures of Complexity, Accuracy, and Fluency (CAF). The introduction
of analytic measures of CAF has been a step in the objective evaluation of language de-
velopment (Ortega, 2003), and analyses can conveniently be run on (transcribed) samples
using Natural Language Processing (NLP) tools (https://www.linguisticanalysistools.org) for
linguistic complexity and lexical sophistication. Suitable PRAAT-scripts are available for the
automatic analysis of aspects of Fluency (De Jong & Wempe, 2009). However, the consistent
application of suitable measures, especially for clausal, phrasal and lexical complexity has
been a point of concern (Norris & Ortega, 2009). Different measures tend to be sensitive to
specific levels and types of development and the substantial literature on this topic is still
growing and the choice of suitable measures is getting more and more sophisticated (see for
instance Housen et al., 2019; Kyle et al., 2020). An important concern from a CDST per-
spective is that probably no single measure can adequately represent an L2 proficiency level
(for a discussion see Lowie et al., 2017).

The number of publications that take a CDST perspective to second language development
has been growing in the past 15 years and is still increasing. Although most issues discussed
in these papers are relevant for second language development in general, including speaking,
the number of studies that specifically focus on speaking is limited. The reason for this may
be that spoken data usually take an additional step of transcription before analyses can be
done. Another likely reason is that the number of factors that affect speaking, and in par-
ticular spoken interaction, leads to high levels of inter- and intra-individual variation. And
yet, speaking may also be the most interesting focus, as it represents genuine implicit lan-
guage use (Dykstra-Pruim, 2003) and is less affected by conventions. Speaking is complex
and interactive, and tends to be less monitored and controlled than writing. In a comparison
of spoken language with and without pre-task planning time, Yuan and Ellis (2003) found
that planning time positively affected language complexity. In most speaking situations, no
extra time is available for planning.
An important claim of CDST research is that second language development is “in-
dividually owned” (Lowie et al., 2018). For example, the study by Chan et al. (2015) explores
intra-learner variability in a comparison of speaking and writing data of monozygotic twins.
The twins were extremely similar in virtually all respects, including the school environment
and the amount and type of language contact out of school. In a longitudinal study spanning
a period of 8 months, the twins performed 100 writing tasks and 100 speaking tasks. The
results of this study showed that the development of speaking over time was highly variable
for both learners. Furthermore, the study also showed that even these identical twins showed
meaningful differences in their patterns of language development. An analysis of three
syntactic complexity measures over time showed that the complexity of spoken language for
both learners was higher than written language in the beginning, which corroborates the
43
MLA & EFA

1
0.8
0.6
0.4
0.2
0
0 2 4 6 8 10 12
-0.2
-0.4
-0.6
-0.8
-1
A B
Figure 3.1 Moving correlations between syntactic complexity and accuracy for participants (A) and
(B). Reprinted from Yu and Lowie (2020) with permission
findings reported in Dykstra-Pruim (2003). However, for one of the learners this shifted to a
reverse effect over time and the complexity in that twin’s writing was higher than in spoken
language. These data show that the iterative, process-based analysis of second language
development is the effect of the dynamic interaction of changing variables. Even minute
differences at one point in time may lead to large differences over time in a non-linear de-
velopmental trajectory.
A recent study by Yu and Lowie (2019) describes the dynamic paths of the development of
speaking skills of Chinese speakers of English in a longitudinal case study focusing on
complexity and accuracy over a period of 4 months. The study shows much variability and
tentatively points to a relationship between the amount of variability and the degree of
development: increased variability tends to coincide with developmental jumps in speaking
(as measured by lexical diversity as well as in accuracy). However, the most interesting
finding in this study was that accuracy and fluency show a strong and interesting dynamic
relationship. Although during the early stages of development a competitive relationship was
found between accuracy and fluency, this shifted toward a supportive relationship in the
course of the data collection period, as illustrated in Figure 3.1. The figure shows a moving
window of correlations between the variables, illustrating the change of the correlation be-
tween these variables over time. Initially, the learners may have been slowed down by a
limited availability of resources (particularly apparent in participant B), while at later stages
of development they manage to find a balance among different linguistic subsystems. This
finding is in contrast to findings from a single case study by Polat and Kim (2014), who found
that their participant made progress on complexity, but not on accuracy. A possible ex-
planation of the difference between the studies could well be that Polat and Kim investigated
an untutored learner, while the participants in the study by Yu and Lowie received formal
instruction.
In addition to complexity and accuracy, Hepford (2017) included fluency in a single case
study of oral L2 development in a naturalistic setting over a period of 15 months. Eliciting
language production using a rich variety of speaking tasks, she investigated the interaction of
the CAF in combination with global proficiency measures and motivation. She found clear
non-linear and self-organizing development as a result of interconnected subsystems like the
44
Figure 3.2 Vowel diagram for English sounds, with the estimated Dutch rounded front vowel added in
the top left corner (When pronounced in context most productions by Dutch speakers will
be somewhat more central.)
learner’s motivation. While the relationship among most complexity measures remained
relatively constant over time, the fluency measures varied as an effect of the amount of
cognitive strain the learner experienced. This detailed case study, including a wide variety of
changing factors over a considerable time, shows that for oral production, fluency tends to be
the dimension of CAF that is most sensitive to contextual changes.
A practical longitudinal study of L2 phonological development (Verspoor et al., forth-
coming) illustrates the significance of variability in language production. This example fo-
cuses on the development of the phonological system of a 5-year-old American boy (B) who
learned Dutch in a naturalistic setting. His Dutch pronunciation was traced for about a year
(Lowie, 2013) by means of weekly measurements of speech production. The current example
concentrates on the development of the Dutch closed front rounded vowel /y/. In English, all
rounded vowels (/u/, /ʊ/, /o/, /ɔ/) are back vowels, and there are no rounded front vowels. In
Dutch, most rounded vowels are also back vowels, but /y/ is one of the exceptions. For
English learners of Dutch, the production of /y/ provides a major challenge, as it requires a
new combination of entrenched English articulators (see Flege, 1999). See Figure 3.2 for an
illustration of these options.
We observed that there was a seemingly random variation between variants of /i/ and
variants of /u/, with the occasionally combination of the two as a diphthong /ui/. The de-
velopment is far from linear and shows a highly variable developmental trajectory (see
Figure 3.3). In the first few weeks, B varies between the rounded /u/ and non-rounded /i/,
with the occasional diphthong /iu/, but later on, his productions tend towards the target /y/
much more often. We argue that this type of variability is not intentional and not caused by
any factors, but shows that the learner is aiming for a target sound and is constantly trying it
out until he approaches native like productions. The variability is functional in that without
trying and experimenting to aim for a target form, there would be no change.

Since CDST studies focus on the emergence of language over time, the inclusion of a time
dimension is typical for this line of research. And even though change over time can be
inferred from measurements or observations at two moments, typical for CDST studies is
45
4000
3500
3000
2500
2000
1500
1000
500
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Prod_F1 Prod_F2 Prod_F3
Figure 3.3 Longitudinal measurement of the production of the participant’s Dutch front rounded
vowel /y/, represented by the first three formants: F1 (darkest shade ), F2 (medium shade)
and F3 (lightest shade)
that the process of development is investigated by following learners on a large number of

occasions. The number and density of the observations is determined by the magnitude and
speed of development expected. At a relatively early stage of development during which a
learner has intensive language contact, more change can be expected that at a later stage of
development with limited language contact (for a discussion see Lowie, 2017). For long-
itudinal case studies, the number and denseness of observations could be compared to what
is the Power in a group study (Murakami, 2020) and they form important considerations for
the design of a CDST study. The timescale of a study is at least equally important and may
vary from the lifespan to weeks (Spoelman & Verspoor, 2010) or from minutes in the ob-
servation of classroom interaction (Smit et al., forthcoming) and changing motivation
(Wanninge et al., 2014) to milliseconds in reaction time studies of spoken performance in a
naming task (Plat et al., 2018).
The necessity for the use of case studies is reinforced by the limitations of studying the
development of groups of people over time. The development of groups of people can only
be accomplished under the assumption that the changes of variables over time are identical
for all the members of the group. This is referred to as the requirement of ergodicity (Lowie &
Verspoor, 2019; Molenaar, 2015). Regarding the non-linear nature of development, groups
of people are not likely to fulfil this requirement. This implies that for longitudinal studies
with several observations over time, case studies are the most appropriate possibility. This
observation leads to an important distinction of dimensions of research. While group studies
can evaluate effects of variables and their interaction at one moment in time, case studies can
reveal aspects of the process of development over time. Process-based studies that monitor
non-linear development over time and product-oriented studies at one point in time are
therefore complementary and can address different research questions. For instance,
product-based studies can show the effect of starting age on the quality of L2 pronunciation
46
after a fixed number of years of exposure by comparing a group of early starters to late
starters, or the effect of the learner’s L1 on the quality of L2 pronunciation by comparing
representative groups of learners (Derwing & Munro, 2013). Conversely, a process-based
study can investigate how the development evolves over time, and can, for instance, explore
how the vowel production of an L2 learner changes over time, as it initially switches
between L1 and L2 realizations, and gradually develops in the direction of the target rea-
lizations after overshooting the values of matched native speaker controls for some time
(Verspoor et al., 2021).
In CDST studies of second language development, several methods have been used to
explore the development in longitudinal case studies. Two methods have been dominant,
each related to different types of research questions: the analysis of variability over time and
the analysis of relationships of subsystems over time. Both analyses use moving windows
techniques to enable observing stepwise change over time while reducing the effect of local
peaks. Although many different techniques are available to answer a variety of research
questions, we will mention only one to illustrate the type of analysis.
For the analysis of variability over time especially, graphical tools are used to visualize
development. One such instrument is that of min–max graphs, in which a moving window of
minimum, mean, and maximum is used to identify changes in the amount of variability
(Van Dijk et al., 2011). This is illustrated in Figure 3.4. In these analyses it has frequently
been observed that increased variability tends to coincide with a jump in development (see
for instance Yu & Lowie, 2019). The significance of the jumps and the changes in the amount
of variability are commonly tested by using Monte Carlo simulations. These are permutation
tests in which the data are resampled a significant number of times (for instance 10,000) to
determine the coincidence of the data observed. Similar simulation analyses to determine
the significance of changes over time have been done using Change Point Analyses (Baba &
Nitta, 2014).
Several methods have been used for the analysis of relationships of subsystems over time.
After smoothing the data to eliminate extreme peaks and after detrending the data, the
relationship among the subsystems is analyzed using growth models like precursor models
30
25
Spacial prepositions
20
15
10
0
24-Feb-98 15-Apr-98 4-Jun-98 24-Jul-98 12-Sep-98 1-Nov-98 21-Dec-98 9-Feb-99
Observation date
MIN MAX average score
Figure 3.4 Moving min–max graph illustrating changes in the amount of variability in a child’s use of
spatial prepositions. Reprinted from Van Geert and Van Dijk (2002) with permission
47
(Lowie et al., 2011). Using these models, the complex dynamic interrelationship of several
subsystems has been analyzed. For instance, Caspi (2010) analyzed the relationship of four
dimensions of receptive and productive vocabulary in second language use, whereas growth
models tend to be rather advanced techniques, also more straightforward analyses have been
used, like moving correlations (Verspoor & Van Dijk, 2011). In these analyses, a moving
window of correlation between maximally two subsystems is used. Moving correlations may
show that even when the overall correlation between two subsystems is low, this may be due
to a change over time from a negative to a positive correlation, as is shown for the re-
lationship between the finite form ratio (words/FB) and sentence structure (simple/com-
pound) in Figure 3.5 (from Verspoor & Van Dijk, 2011).
The techniques mentioned here are only a small portion of the available analyses for
longitudinal data with dense measurements. After an initial discussion of CDST methods by
Larsen-Freeman and Cameron (2008), a practical guide for CDST methodologies by
Verspoor et al. (2011), and guidelines for data collection by Lowie (2017) and by Murakami
(2020), a very comprehensive overview of techniques was compiled recently by Hiver 2019.
Although most CDST-inspired papers have used quantitative analyses, several valuable
longitudinal case studies have used qualitative analyses (Lesonen et al., 2017; Roehr-Brackin,
2014). Both studies provide detailed qualitative analyses of the development over time of a
specific linguistic construction or way to express meaning in the L2. Since CDST is a rela-
tively young line of second language research, the methods of analysis can be expected to
advance in the years to come.

All longitudinal studies thus far from a CDST perspective that trace learners over a longer
period of time show variation (differences among similar learners) and variability
Moving correlation between Words/FV and Simple/compound

1
0.8
0.6
0.4
0.2
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
-0.2
-0.4
-0.6
-0.8
-1
Figure 3.5 Moving window of correlation. Reprinted from Verspoor and Van Dijk (2011) with
permission
48
(differences over time within a learner), sometimes with strong ups and downs, sometimes
with subsystems becoming more stable. This has important implications for researchers,
teachers, and learners.
For researchers, it means that if we want to investigate the process of development, we
must collect longitudinal data of single learners. Of course, this can be a small group of
learners in a similar situation or a single learner. We must know what we are looking for
and why as there should first be some theoretical motivation to investigate so we know the
time needed and the measures we might trace. For example, if there is an instructional
intervention on giving feedback on particular L2 sounds as cross-sectional studies have
found positive effects, then we need to trace the effects of such interventions not only
during the intervention, but also have several measuring moments after the intervention. In
such cases, we could measure longitudinally in several longitudinal post-test sessions, for
example, for 1 week about a month after the intervention and another week 2 months after
the intervention. In making these decisions, the researcher should make an estimate on
how long it takes to acquire a certain skill and when it may be assumed to become rather
stable.
For teachers, it means that they need to recognize that learning is a process of trial and
error and that language use and language development cannot be distinguished. Also,
some subsystems may need to be in place before others can develop. The conclusion that
language learning is strongly individually determined may not be good news for teachers
or school administrators, but explains the need for personalized learning. The variable
nature of the individual learning trajectory also illustrates the need for different ap-
proaches to assessment, in which awareness of development over time using portfolios
may be more suitable that summative assessment at one moment. Moreover, it is im-
portant to realize that there is no monocausality in (language) development. When applied
consistently, a CDST implication for teaching requires a strongly ecological and holistic
framework of second language pedagogy. A fully worked out application of CDST
pedagogy in an ecological framework is found in Glenn Levine’s recent MLJ Monograph
(Levine, 2020).
7 Future Directions
Despite the growing number of CDST-inspired studies of second language development and
despite the advancement of methods and analyses, additional methodological innovations are
required for further development of the field. One of the challenges is the paradox of the research
dimensions. There is a continuous desire for generalizations about the process of language de-
velopment. However, on the one hand, it is impossible to generalize individual data, while on the
other hand, groups of learners cannot be followed over time due to ergodicity constraints. One
possible way around this problem is to use cluster analyses to identify ergodic ensembles of
learners showing similar behaviour over time. The first steps in this direction have been made
(Peng et al., 2020), but there is still a long way to go. The recent book by Hiver and Al-Hoorie
(2019) mentions several other promising methods for future development.
The missing link for speaking research is the CDST analysis of interaction over time
during conversation. Promising developments have shown the application of GridWare
(Hollenstein, 2013) to create dynamic state space grids that analyze the attractor states in
interaction. The work of Smit et al. (2017) on student–teacher interaction in the classroom
setting is a promising step in this direction.
49
Further Reading
Hiver, P., & Al-Hoori, A. H. (2016). A dynamic ensemble for second language research: Putting
complexity theory into practice. The Modern Language Journal, 100(4), 741–756.
In this contribution, Hiver and Al-Hoori review CDST research and sketch new directions for in
vestigating studying second language development within this framework. The authors provide a
template for methodological considerations for scholars who aspire to carry out CDST-inspired
research.
Larsen-Freeman, D, & Cameron, L. (2008). Complex systems and applied linguistics. Oxford: Oxford
University Press.
This is a comprehensive and very accessible overview of all aspects of CDST applications to research
into second language development. A must-read for people interested in this framework.
Levine, G. (2020). A human ecological language pedagogy. The Modern Language Journal, 104(S).
Levine has written a very comprehensive monograph on ecological language pedagogy using the CDST
framework as a starting point. He works out all implications of complexity in an up-to-date discussion
of language teaching in the ecological context of world readiness.
Lowie, W. M., Verspoor, M. H., & Van Dijk, M. (2018). The acquisition of L2 speaking: A dynamic
perspective. In R. Alonso Alonso (Ed.), Speaking in a second language (pp. 106–125). Amsterdam/
Philadelphia: John Benjamins.
This is a study that specifically focuses on studying oral skills from a CDST-perspective, partly in
contrast to writing skills.
References
Baba, K., & Nitta, R. (2014). Phase transitions in development of writing fluency from a complex
dynamic systems perspective. Language Learning, 64(1), 1–35. doi: 10.1111/lang.12033
Bertalanffy, L. von. (1995). General system theory: foundations, development, applications (Rev. ed.).
Braziller. https://rug.on.worldcat.org/oclc/36200371
Block, D. (2003). The social turn in second language acquisition(Ser. Edinb). Edinburgh: University
Press.
Browman, C. P., & Goldstein, L. (1992). Articulatory phonology: An overview. Phonetica, 49(3–4),
155–180. doi: 10.1159/000261913
Caspi, T. (2010). A dynamic perspective on second language development: Vol. PhD Disser. Groningen:
University of Groningen.
Chan, H., Verspoor, M. H., & Vahtrick, L. (2015). Dynamic development in speaking versus writing in
identical twins. Language Learning, 65(2), 298–325. doi: 10.1111/lang.12107
de Bot, K. (2017). Complexity theory and dynamic systems theory: Same or different? In Complexity
theory and language development: In celebration of Diane Larsen-Freeman (pp. 51–58). Amsterdam:
John Benjamins.
de Bot, K. (2015). A history of applied linguistics: From 1980 to the present (NV-1 onl). London:
Routledge. http://public.ebookcentral.proquest.com/choice/publicfullrecord.aspx?p=1983433
de Bot, K., Lowie, W. M., & Verspoor, M. H. (2005). Second language acquisition, an advanced resource
book. London: Routledge.
de Bot, K., Lowie, W. M., & Verspoor, M. H. (2007). A dynamic systems theory approach to second
language acquisition. Bilingualism: Language and Cognition, 10(1), 7–21. doi: 10.1017/S136672
8906002732
De Jong, N. H., & Wempe, T. (2009). Praat script to detect syllable nuclei and measure speech rate
automatically. Behavior Research Methods, 41(2), 385–390. doi: 10.3758/BRM.41.2.385
Derwing, T. M., & Munro, M. J. (2013). The development of L2 oral language skills in two L1 groups:
A 7-year study. Language Learning, 63(2), 163–185. doi: 10.1111/lang.12000
Dörnyei, Z. (2009). Individual differences: Interplay of learner characteristics and learning environ-
ment. Language Learning, 59(SUPPL. 1), 230–248. http://www.scopus.com/inward/record.url?eid=
2-s2.0-73149118934&partnerID=40&md5=e9f751871baa290aa2b63306b9f8468d
Dykstra-Pruim, P. (2003). Speaking, writing, and explicit-rule knowledge: Toward an understanding of
how they interrelate. Foreign Language Annals, 36(1), 66–76.
Evans, D. R., & Larsen-Freeman, D. (2020). Bifurcations and the emergence of L2 syntactic structures
in a complex dynamic system. Frontiers in Psychology, 11, 2823. doi: 10.3389/fpsyg.2020.574603
50
Firth, A., & Wagner, J. (1997). SLA property: No trespassing! Modern Language Journal, 82(1), 91–94.
Flege, J. E. (1995). Second language speech learning. Theory, findings and problems. In W. Strange
(Ed.), Speech perception and linguistic experience (pp. 233–277). Baltimore: York Press.
Flege, J. E. (1999). Age of learning and second language speech. In D. Birdsong (Ed.), Second language
acquisition and the critical period hypothesis (pp. 101–131). Mahwah, NJ: Laurence Erlbaum.
Hepford, E. A. (2017). Dynamic second language development: the interaction of complexity, accuracy,
and fluency in a naturalistic learning context. Philadelphia, PA: Temple University.
Herdina, P., & Jessner, U. (2002). A dynamic model of multilingualism. Perspective of change in psy-
cholinguistics. Clevedon: Multilingual Matters.
Hiver, P., & Al-Hoori, A. H. (2016). A dynamic ensemble for second language research: Putting
complexity theory into practice. The Modern Language Journal, 100(4), 741–756. doi: 10.1111/
modl.12347
Hiver, P., & Al-Hoorie, A. H. (2019). Research methods for complexity theory in applied linguistics (NV-
1 onl). Clevedon: Multilingual Matters. doi: 10.21832/9781788925754. https://rug.on.worldcat.org/
oclc/1138500858
Holden, J. G., Van Orden, G. C., & Turvey, M. T. (2009). Dispersion of response times reveals cog-
nitive dynamics. Psychological Review, 116(2), 318–342. doi: 10.1037/a0014849
Hollenstein, T. (2013). State space grids: depicting dynamics across development(NV-1 onl). New York:
Springer. doi: 10.1007/978-1-4614-5007-8. https://rug.on.worldcat.org/oclc/858887575
Housen, A., De Clercq, B., Kuiken, F., & Vedder, I. (2019). Multiple approaches to complexity in
second language research. Second Language Research, 35(1), 3–21. doi: 10.1177/0267658318809765
Hulstijn, J. (2020). Proximate and ultimate explanations of individual differences in language use and
language acquisition. Dutch Journal of Applied Linguistics. doi: 10.1075/dujal.19027.hul
Kyle, K., Crossley, S., & Verspoor, M. (2020). Measuring longitudinal writing development using in-
dices of syntactic complexity and sophistication. Studies in Second Language Acquisition, 1–32. doi: 1
0.1017/s0272263120000546
Lantolf, J., & Thorne, S. (2006). Sociocultural theory and the genesis of of second language development.
Oxford: Oxford University Press.
Larsen-Freeman, D., & Cameron, L. (2008). Complex systems and applied linguistics. Oxford University
Press.
Larsen-Freeman, D. (1997). Chaos/complexity science and second language acquisition. Applied
Linguistics, 18(2), 141–165. http://www.scopus.com/inward/record.url?eid=2-s2.0-0040151244&
partnerID=40&md5=68465fc5cd8f0bc3db4bf803ce3ce4ce
Larsen-Freeman, D. (2015). Ten ‘lessons’ from complex dynamic systems theory: What is on offer. In
Zoltán Dörnyei, P. D. MacIntyre, & A. Henry (Eds.), Motivational dynamics in language learning
(pp. 11–19). Clevedon: Multilingual Matters. doi: 10.21832/9781783092574-004
Larsen-Freeman, D., & Cameron, L. (2008). Research methodology on language development from a
complex systems perspective. The Modern Language Journal, 92(2), 200–213.
Lesonen, S., Suni, M., Steinkrauss, R., & Verspoor, M. (2017). From conceptualization to construc-
tions in Finnish as an L2. Pragmatics & Cognition, 24(2), 212–262. doi: 10.1075/pc.17016.les
Levine, G. (2020). A human ecological language pedagogy. The Modern Language Journal, 104((S)).
Lima Júnior, R. M. (2013). Complexity in second language phonology acquisition. Revista Brasileira de
Linguística Aplicada, 13(2), 549–576. doi: 10.1590/s1984-63982013005000006
Lowie, W. M. (2013). L2 phonological development: a plea for a dynamic, process-based methodology.
Presentation at the international symposium on the acquisition of second language speech (New Sounds
2013), Montreal, Canada, 17–19 May 2013.
Lowie, W. M. (2017). Lost in state space? Methodological considerations in Complex Dynamic Theory
approaches to second language development research. In L. Ortega & Z. Han (Eds.), Complexity
theory and language development in celebration of Diane Larsen-Freeman (pp. 123–141). Amsterdam:
John Benjamins Publishing Company. doi: 10.1075/lllt.48.07low
Lowie, W. M., Caspi, T., Van Geert, P., & Steenbeek, H. (2011). Modeling development and change. In
M. H. Verspoor, K. De Bot, & W. Lowie (Eds.), A dynamic approach to second language develop-
ment: methods and techniques (pp. 22–122). Amsterdam: John Benjamins.
Lowie, W. M., Van Dijk, M., Chan, H., & Verspoor, M. H. (2017). Finding the key to successful L2
learning in groups and individuals. Journal of Language Teaching and Learning, 7(1), 127–148. doi: 1
0.14746/ssllt.2017.7.1.7
Lowie, W. M., & Verspoor, M. H. (2015). Variability and variation in second language acquisition
orders: A dynamic reevaluation. Language Learning, 65(1), 63–88. doi: 10.1111/lang.12093
51
Lowie, W. M., Verspoor, M. H., & Van Dijk, M. (2018). The acquisition of L2 speaking: A dynamic
perspective. In R. Alonso Alonso (Ed.), Speaking in a second language (pp. 106–125). Amsterdam:
John Benjamins.
Lowie, W. M., & Verspoor, M. H. (2019). Individual differences and the ergodicity problem. Language
Learning, 69(S1), 184–206. doi: 10.1111/lang.12324
Molenaar, P. C. M. (2015). On the relation between person-oriented and subject-specific approaches.
Journal for Person-Oriented Research, 1(1–2), 34–41. doi: 10.17505/jpor.2015.04
Murakami, A. (2020). On the sample size required to identify the longitudinal L2 development of
complexity and accuracy indices. In W. M. Lowie, M. Michel, A. Rousse-Malpat, M. Keijzer, & R.
Steinkrauss (Eds.), Usage-based dynamics in second language development (pp. 20–49). Clevedon:
Norris, J. M., & Ortega, L. (2009). Towards an organic approach to investigating CAF in instructed
SLA: The case of complexity. Applied Linguistics, 30(4), 555–578. doi: 10.1093/applin/amp044
Ortega, L. (2003). Syntactic complexity measures and their relation to L2 proficiency: a research sy-
thethis of college-level L2 writing. Applied Linguistics, 24(4), 492–518.
Peng, H., Jager, S., Thorne, S. L., & Lowie, W. (2020). A holistic person-centred approach to mobile-
assisted language learning. In W. M. Lowie, M. Michel, A. Rousse-Malpat, M. Keijzer, & R.
Steinkrauss (Eds.), Usage-based dynamics in second language development (pp. 87–106). Clevedon:
Multilingual Matters. doi: 10.21832/9781788925259-007
Plat, R., Lowie, W., & de Bot, K. (2018). Word naming in the L1 and L2: A dynamic perspective on
automatization and the degree of semantic involvement in naming. Frontiers in Psychology, 8, 2256.
doi: 10.3389/fpsyg.2017.02256
Polat, B., & Kim, Y. (2014). Dynamics of complexity and accuracy: A longitudinal case study of
advanced untutored development. Applied Linguistics, 35(2), 184–207. doi: 10.1093/applin/amt013
Rhea, C. K., Kiefer, A. W., Wittstein, M. W., Leonard, K. B., MacPherson, R. P., Wright, W. G., &
Haran, F. J. (2014). Fractal gait patterns are retained after entrainment to a fractal stimulus. PLoS
One, 9(9), e106755. doi: 10.1371/journal.pone.0106755
Roehr-Brackin, K. (2014). Explicit knowledge and processes from a usage-based perspective: The de-
velopmental trajectory of an instructed L2 learner. Language Learning, 64(4), 771–808. doi: 10.1111/
lang.12081. https://rug.on.worldcat.org/oclc/5694457243
Smit, N., van de Grift, W., de Bot, K., & Jansen, E. (2017). A classroom observation tool for scaf-
folding reading comprehension. System, 65, 117–129.
Smit, N., Van Dijk, M., De Bot, K., & Lowie, W. M. (in press). The complex dynamics of adaptive
teaching. International Review of Applied Linguistics.
Smith, L. B., & Thelen, E. (2003). Development as a dynamic system. Trends in Cognitive Sciences, 7(8),
343–348.
Spoelman, M., & Verspoor, M. H. (2010). Dynamic patterns in the development of accuracy and
complexity: a longitudinal case study on the acquisition of Finnish. Applied Linguistics, 31(4),
532–553.
Thelen, E., & Smith, L. B. (1994). A dynamic systems approach to the development of cognition and
action. Cambridge, MA: MIT Press.
Van Dijk, M., Verspoor, M. H., & Lowie, W. M. (2011). Variability and DST. In M. Verspoor, K. De
Bot, & W. Lowie (Eds.), A dynamic approach to second language development: methods and techni-
ques (pp. 55–84). Amsterdam: John Benjamins.
Van Geert, P., & Van Dijk, M. (2002). Focus on variability: New tools to study intra-individual
variability in developmental data. Infant Behavior and Development, 25(4), 340–375.
Verspoor, M. H., De Bot, K., & Lowie, W. M. (2011). A dynamic systems approach to second language
development: methods and techniques. In N. Spada & N. Van Deusen-Scholl (Eds.), Language
learning & language teaching 29. Amsterdam: John Benjamins.
Verspoor, M. H., Lowie, W. M., & de Bot, K. (2021). Variability as normal as apple pie. Linguistics
Vanguard, 7(s2).
Verspoor, M. H., & Van Dijk, M. (2011). Visualizing interactions between variables. In M. H.
Verspoor, K. De Bot, & W. Lowie (Eds.), A dynamic approach to second language development:
methods and techniques (pp. 85–98). Amsterdam: John Benjamins.
Wanninge, F., Dörnyei, Z., & de Bot, K. (2014). Motivational dynamics in language learning: Change,
stability, and context. The Modern Language Journal, 98(3), 704–723.
52
Yu, H., & Lowie, W. (2019). Dynamic paths of complexity and accuracy in second language speech: A
longitudinal case study of Chinese learners. Applied Linguistics, 41(6), 855–877. doi: 10.1093/applin/
amz040
Yuan, F., & Ellis, R. (2003). The effects of pre-task planning and on-line planning on fluency, com-
plexity and accuracy in L2 monologic oral production. Applied Linguistics, 24(1). doi: 10.1093/
applin/24.1.1
Zhang, W., & Ding, N. (2017). Time-domain analysis of neural tracking of hierarchical linguistic
structures. NeuroImage, 146, 333–340. doi: 10.1016/j.neuroimage.2016.11.016
53
4
SOCIOCULTURAL APPROACHES
TO SPEAKING IN SLA
Speaking is a powerful mode of human communication, learning, and sociality. It has
therefore been central to much sociocultural and sociolinguistic research on first and second
language (L2) development, as well as in studies of bilingualism, multilingualism, and other
kinds of learning. Many second language acquisition (SLA) scholars interested in oral lan-
guage development (e.g., see other chapters in this volume) view speaking through a psy-
chological, cognitive, or linguistic lens, breaking L2 oral production down into measurable
components such as pronunciation, fluency, accuracy, and comprehensibility. From a so-
ciocultural stance, speaking is seen as one observable (audible, performed) aspect of inter-
action through which meanings are constructed. In addition, speaking is viewed as a means
of potential socialization into linguistic, cultural, and other perspectives and practices as
speakers convey aspects of their identities and interests or positionalities. From this per-
spective, features of speech, such as prosody, lexical choice, or interaction (e.g., questioning
or turn-taking patterns), must be understood in terms of participants’ ability to achieve
mutual understanding, index their social meanings and identities, accomplish goals, and
participate effectively in recognizable and legitimate activities within a community.
Sociocultural theorizing in SLA is broad, multifaceted, and interdisciplinary. For that
reason, we use the plural form when describing sociocultural “theories.” Some sociocultural
work has been deeply informed by Vygotskian theory, which examines the development of
cognition or mental processes through experiences of mediated social interaction (often
spoken interaction) and intersubjectivity (for an overview of Vygotskian-informed socio-
cultural approaches, see Lantolf et al., 2018). By intersubjectivity, we mean coming to shared
understandings and alignments with one another’s views and ways of speaking often within
the context of a particular activity.
Other sociocultural work, including our own research, is informed by language sociali-
zation theory, which tends to be less explicitly Vygotskian and instead draws on sociology,
sociolinguistics, cultural psychology, and linguistic anthropology to a greater extent (e.g.,
Duff, 2007). Increasingly foregrounded in language socialization are social constructs such as
identity, power relations, and intersections among race, gender, and other social categories
that may position language learners and their ways of speaking (or their silences) in ad-
vantageous or disadvantageous ways (Duff, 2019). In addition, this sociocultural work
54 DOI: 10.4324/9781003022497-6
Sociocultural Approaches to Speaking
(unlike most Vygotskian L2 studies) often examines language ideologies (e.g., perceived
status of languages, language varieties, or “accents,” and speakers of those languages),
communities or networks of practice, forms of capital (economic, social, and cultural) at
speakers’ disposal, and relationships between social structures and human agency (e.g.,
Bourdieu, 1991; Darvin & Norton, 2015). Sociocultural approaches may also examine fac-
tors affecting individuals’ and groups’ access to opportunities to learn and use languages, to
take turns in interactions, and to receive meaningful feedback on their contributions. Finally,
the research often examines learners’ trajectories and forms of participation in their various
communities over time.
Situated sociocultural processes work in tandem with, but clearly at different levels or
through different systems from, cognitive processes of perceiving, internalizing, mastering,
and producing particular linguistic constructions. These processes also occur across different
scales of time and space, and can be realized or carry traces of those settings and histories in
even a single utterance, linguistic form, or interaction, such as the exclamation He’s so woke!
(i.e., sociopolitically aware or sensitized). Similarly, the correction of certain forms of speech
by others (e.g., speakers, teachers) conveys their identities and language ideologies and not
simply phonological accuracy – for example, when a particular phonetic form common in a
non-standard dialect is disallowed in classrooms where pronunciation conforming to a more
standard variety is required (Duff, 2019; Friedman, 2010). In summary, in this chapter we
describe sociocultural approaches to the study of speaking in SLA that analyze not only the
production of L2 speech by learners, but also their emerging communicative repertoires,
networks, and communities in transnational, intercultural contexts.
Early sociolinguistic research on the ethnography of communication and L2 pragmatics (see
Yates, this volume) emphasized that speaking – or participating in particular speech com-
munities and their oral cultural practices – requires learning how to communicate by using
particular linguistic forms and genres while taking into account social and situational vari-
ables. Such variables might include registers, social status, roles, power differentials, social
distance from speakers, and ways of displaying affective stances such as excitement or
gravitas (Hymes, 1964). These forms of knowledge or competence, such as ways of making
situation-appropriate requests, develop over time through observation, experience, and
mediation (i.e., socialization) by others. One example is Li’s (2000) case study of women
learning ESL/skills in workplace programmes in the United States where workers developed
strategies to successfully make requests in English through explicit and implicit socialization
via coursework and other social encounters. This socialization generally occurs through
meaningful, scaffolded social interaction and sometimes explicit instruction. However, not all
modelling provided by “expert” speakers is taken up by learners or novices, who may de-
velop their own strategies or preferences (Duff & Talmy, 2011).
Another important theoretical framework alluded to earlier that gives prominence to the
role of social interaction in learning, draws on Vygotsky and Activity Theory (Lantolf &
Thorne, 2006; Lantolf et al., 2018). Vygotskian theory and research in the area of L2
speaking first appeared in the 1990s (e.g., Lantolf, 1994) and has gained considerable ground
since. This work focuses on aspects of mediation, scaffolding, inner speech, and inter-
subjectivity in learning, and how social, embodied experience (including the use of gestures)
facilitates the internalization of knowledge. It also investigates the development of mental
processes and concepts, among which are linguistic ones. Through these mediated or scaf-
folded learning experiences, many of which involve speech, individuals become better able,
55
over time, to regulate their own learning and use of language according to their own pur-
poses (Lantolf et al., 2018). Two intersecting areas of current Vygotskian sociocultural re-
search on L2 speaking focus on the utility of (1) concept-based instruction (e.g., around such
linguistic concepts as tense, aspect, mood, and voice, often taught explicitly through sche-
matic diagrams representing the concepts and relationships among them) and (2) meta-
linguistic talk about language (also known as languaging) that learners engage in while
gaining a deeper understanding of linguistic concepts and usage (see examples in Lantolf
et al., 2018).
In recent years, a greater emphasis across sociocultural approaches has been placed not
only on speech and one’s primary speech communities – important though those are – but also
on the kinds of multimodal communicative or semiotic repertoires that accompany or
sometimes replace speech (e.g., Early et al., 2015). Therefore, included in contemporary
theorization within sociocultural SLA are embodied expressions of meaning, such as gestures,
facial expressions, gaze, images, and written texts (e.g., Martin-Beltrán, 2010), and the role of
silence in addition to speech (e.g., Morita, 2004). Furthermore, multilingual resources that
facilitate the comprehension and production of meanings have become more central to SLA
theory, research, and pedagogies drawing from sociocultural approaches to SLA. The focus
on repertoires and multilingual resources is also salient in the Douglas Fir Group’s (Douglas
Fir Group, 2016) article on transdisciplinary, multiscalar approaches to SLA and are found
in a burgeoning area of research dealing with the development of interactional competence in
SLA (e.g., Hall et al., 2011). The Douglas Fir Group (DFG) (2016) could be broadly con-
strued as a “dynamic sociocultural/sociocognitive systems” framework that seeks to con-
textualize and interpret the development and use of “speaking” abilities in another language
across both local and wider social contexts and scales (see Duff, 2019). The DFG framework
and sociocultural principles contained within it encourage the examination of speaking
(input or exposure, opportunities, interactions, experience, performance, impact, feedback,
change) to the degree possible or relevant within and across macro (societal), meso (in-
stitutional), and micro (social–interactional, linguistic–indexical) levels and the larger spec-
trum of valued social–semiotic repertoires referred to earlier.
Thus, when conducting research on people engaged in a classroom L2 learning task or in
study-abroad students’ L2 interactions outside of class, sociocultural researchers consider
elements of the activity, the social context, the materials and spaces (virtual, face-to-face,
coffee shop, lab) involved, the division of labour among participants and activity objectives,
the emotions involved that may mitigate learning or performance, and the negotiation of
particular meanings in discourse and how these change over time. Not all of these areas of
potential enquiry will be relevant in all socioculturally oriented studies of L2 speaking de-
velopment. However, there has been a growing recognition over the past two decades that
various extralinguistic contextual factors combined with linguistic ones contribute to lear-
ners’ experiences, motivations, and performance in important ways.
3 Critical Issues and Topics: Beyond “Speaking”

Studies of a “sociocultural” nature range from those interested primarily in macro-
sociological or sociolinguistic processes to those examining micro-interactions involved in
learning and using another language (e.g., those featured in Lantolf et al., 2018). However, it
can be exceedingly difficult in a single study to concisely capture interactions along multiple
scales (e.g., both macro- and micro-level phenomena, and temporal, developmental, and
attitudinal ones). Furthermore, some sociocultural SLA researchers are interested in parti-
cular kinds of linguistic structures (e.g., formulaic utterances, affect-marking sentence
56
particles, tense-aspect marking) or interactional sequences (e.g., turn-taking, corrective

feedback), or event structures (e.g., an oral presentation, a conversational exchange, a
transactional service encounter). It can therefore be challenging to strike a balance between
and among the following: (1) breadth and depth of analysis, (2) language use and devel-
opment, and (3) sociological and linguistic analyses.
Moreover, the relevant linguistic encounters may not concern “speaking” exclusively. In
view of the importance of face-to-face social interaction for participating and belonging in
communities as well as for language learning, oral and visual (gestural) modes and events
often enjoy a privileged place in sociocultural SLA research. However, given that socio-
cultural understandings of language view meaning-making as a process necessarily mediated
by various modes and tools (including speech itself), sociocultural studies now are seldom
exclusively about speaking or about face-to-face communication. Indeed, the global cor-
onavirus pandemic moved teaching, learning, and formal events (e.g., graduations, thesis
defenses, and board meetings) as well as casual social interaction to virtual communication
platforms such as Zoom. There, the interplay of (written) “chat,” shared visual documents
and slides, polling functions, and two-dimensional or disembodied interlocutors represented
visually by name, avatar, or a head and upper torso, with other people or pets entering and
leaving scenes in the background, incommensurate with formal speech events, remind us how
multidimensional oral communication can be. Similarly, in-person interactions between in-
terlocutors wearing masks for public health reasons display compromised sound quality and
obscured non-verbal forms of expression such as smiling. Participants become acutely aware
of this multidimensionality when speech is mediated by such tools and technologies that
enable or prevent particular kinds of gestures (both physical and symbolic through “hand-
up” or “thumbs-up” signs), turn-taking, backchanneling (discouraged by “mute” buttons),
and other natural features of oral discourse.
However, even in classroom-based face-to-face learning, we must recognize how funda-
mentally multimodal and embodied educational interactions are in L2 learning, and find
ways of accounting for those in SLA theory, research, and pedagogy. For example, Martin-
Beltrán’s (2010) sociocultural study of Grade 5 students in a dual-language English-Spanish
immersion classroom focused primarily on the affordances of oral interaction for resolving
vocabulary and pragmatic issues. Thus, the analyses Martin-Beltrán described in the article
are indeed related to “speaking” in that they examine students’ joint oral production in a
learning context. However, the author indicates that the students’ discussions about lan-
guage were directly mediated by the collaborative written texts they produced as part of the
day’s lesson. In addition, the author highlights how students used gestures to clarify
meanings when they lacked linguistic resources.
Like Martin-Beltrán’s study, much sociocultural work offers detailed accounts of how
previous experiences and multiple modes of meaning-making (including conceptual dia-
grams) converge and overlap to produce learning, reproduce ideologies, and shape identity
development and expression. Further, as noted earlier, speaking is increasingly examined in
L2 research both within and outside of classrooms in terms of participants’ multilingual
resources and repertoires in addition to multimodal ones. As stated in Douglas Fir Group
(2016), sociocultural perspectives often contest circulating ideologies held by the public and
some practitioners about such topics as time required to learn an L2, or about it being a
cumulative, linear process, or about the assumed beneficence of “native-speakers” or other
highly proficient interlocutors as facilitators of L2 learning, and L2 “native speaker” norms
as being the target. Learners’ agency and their desire to achieve or comply with prescribed
norms and goals (or not) may be important factors in understanding their engagement with
learning activities and SLA outcomes. Even if learners are highly agentive, however, they
57
may have limited access to opportunities to use the L2 or to receive timely and valuable
assistance supporting their learning. Thus, sociocultural research on speaking requires at-
tention to the broader social contexts of learning and using language, and then actual
practice.

Over the past two decades, socioculturally oriented research on L2 learning, especially
qualitative case studies and ethnographic/anthropological work, has investigated oral
practices across age groups, settings, and L2s. This research has provided valuable insights
into under-researched language learning sites such as Qur’anic schools in Cameroon (Moore,
2013), Jewish day schools in the United States (Avni, 2012), workplaces and immigrant
language programmes (Li, 2000), K-12 programmes, and the maintenance and learning of
Indigenous languages. For example, Eriks‐Brophy and Crago (2003) compared the forms
and impacts of teacher-talk produced by Inuktitut-speaking teachers and English-speaking
teachers in a school where students were educated exclusively in Inuktitut until Grade 2 and
then switched to English in Grade 3. (Inuktitut is spoken by the Inuit people in the northern
territories of Canada.) However, as with most areas of SLA research, the majority of so-
ciocultural work has studied major world languages such as English, French, Spanish,
Japanese and, to a lesser extent, Mandarin and Arabic.
Given the immense scope of research covered by the term sociocultural, and the space
limitations here, it is impossible to provide a comprehensive overview of the contributions of
this research. Therefore, we will focus on work conducted in three broadly defined contexts:
(1) research on informal interaction in study abroad (i.e., temporary sojourns for educational
and L2 learning purposes), (2) classroom interaction in K-12 schools, and (3) oral partici-
pation and L2 development in university courses.
Study Abroad Research

In the study abroad (SA) context, research has primarily focused on the ways in which
informal interactions with peers and host family members contribute to L2 learning (see
Shively, 2018, for an overview). Given that informal interactions are likely rich in inter-
personal and colloquial language use (i.e., forms of use that may have been absent in stu-
dents’ classroom learning experiences or textbooks), SA studies often focus on the students’
acquisition of sociolinguistic and pragmatic aspects of oral language, such as specific markers
of stance (Diao, 2016), forms of address (Kinginger, 2008), and listener responses (Dings,
2014). Researchers have also examined patterns of interaction in communicative events re-
lated to culture and language learning, such as talk about food (e.g., DuFon, 2006; Kinginger
& Wu, 2018), national differences (Iino, 2006), language (Surtees, 2018; Theodórsdóttir,
2018), and humour (Shively, 2013). This research has produced rich case studies demon-
strating how learners’ spoken L2 repertoires change as a result of their participation in new-
found L2 communities (e.g., through clubs, service encounters, and homestays). The studies
provide empirical evidence of the contingent and contextual nature of language learning,
often by comparing and contrasting learners’ different trajectories and analyzing the means
through which they become sensitized to the value systems that shape the meaning potentials
of the forms they encounter. In some cases, the learners may knowingly or even unwittingly
reject some of those same norms, when particular gendered norms or expectations of highly
stratified politeness expression are incongruent with their own deeply held social values
(Siegal, 1996).
58
Diao (2016) examined L2 Chinese learners’ use of sentence-final particles in Mandarin (a/
ya, la, me, o, eh/ye), forms associated with “girl talk” among China’s urban youth. She
compared the developmental trajectories of three undergraduate American SA students, Tuzi
and Mac (both male) and Ellen (female). Drawing on recorded data of informal conversa-
tions between roommates over a semester, Diao analyzed patterns in participants’ use of the
particles as well as the thematic content of participants’ conversations. Findings show how
participants’ particle use shifted following conversations with roommates about the link
between femininity and frequently used stance particles. Ellen’s particle use increased to align
more closely with that of her female roommate, while Tuzi’s use decreased as he became
aware that overuse could be interpreted as an “effeminate” speech style. Mac, who did not
explicitly talk about gender, love, or relationships with his roommate, continued to omit the
particles from his speech entirely. Diao’s findings point to the importance of metalinguistic
talk about the social meanings of language forms. Other SA studies have observed a similar
role for explicit discussions of social meanings. For example, Surtees (2018) found that
Japanese SA students in Canada developed a range of discursive strategies related to asking
for language help following their peers’ repeated offers to help with English “when asked.”
K-12 Classroom Research

K-12 research has typically examined classroom interactions and has highlighted the ways in
which overlapping semiotic repertoires (e.g., multilingual, gestural, visual, or graphic) and
inclusionary/exclusionary social practices create or constrain affordances for language learning
(e.g., Burdelski & Howard, 2020; Eriks‐Brophy and Crago, 2003; Martin-Beltrán, 2010;
Talmy, 2008). For example, Talmy (2008) explored how classroom talk in a Hawaiian high
school reflected and reproduced certain identities in the classroom, specifically those related to
newcomer teachers and experienced “local ESL” students. Drawing on conversation analysis
and membership categorization analysis to analyze transcripts from the ESL classroom, Talmy
found that students regularly engaged in direct oral confrontation with teachers as well as
other embodied forms of resistance such as not responding to teacher questions or claiming to
have not brought school materials. Rather than casting these students as “bad,” Talmy de-
monstrates how these oral practices reinforced students’ membership in the (unpopular) “local
ESL” category. He also focuses on how students’ oral practices of resistance served to socialize
newcomer teachers into particular ways of teaching, such as providing alternative assignments,
not all of which served learners’ longer-term linguistic or academic trajectories well. Talmy’s
study provides evidence of the oral practices that constitute the means through which students
and teachers engage in the L2 learning process, rather than focusing on the L2 forms that
students are expected to acquire (as in Diao, 2016). This focus on oral practices as a means of
socialization runs through many studies conducted in K-12 contexts. Collectively, these studies
provide evidence that the concepts of “expert” or “learner” are not fixed but rather that these
identities are negotiated through embodied interaction in the classroom. They also indicate
ways in which learner and teacher identities are shaped through such practices and how those
outcomes impact learner engagement.
University Programme Research

Work conducted in university contexts has primarily examined how students are socialized
into classroom practices such as participation in discussions (Morita, 2004; Ro & Burch,
2020), presentations (Kobayashi, 2016; Zappa-Hollman, 2007), or dialogue tasks (Al
Masaeed, 2016; Van Compernolle, 2014). Some of this research focuses on how learning is
59
scaffolded or mediated through various forms of oral interaction. For example, Al Masaeed
(2016) examines how Arabic learners successfully scaffold their accomplishment of a paired
discussion task in the L2 by using English to resolve linguistic difficulties.
Other work, particularly in English-medium university settings, adopts an academic dis-
course socialization perspective to examine how university students take up and participate
in oral academic tasks (Kobayashi et al., 2017). For example, Kobayashi (2016) described
how one undergraduate international student, Otome, was socialized into delivering oral
academic presentations in English during her one year sojourn at a Canadian university.
Kobayashi’s findings highlight how the teachers’ criteria for “good presentations” centred
around content organization, such as the inclusion of critical and comparative talk, and
effective use of paralinguistic cues, such as eye contact. He highlighted how Otome became
sensitized to the expectations of the instructor through interactions with the instructor and
her peers. For instance, Otome observed how the instructor used textbooks and research to
support his claims during lectures and decided during her second presentation to use evi-
dence from the textbook. Thus, as with most sociocultural work, speaking development was
conceptualized with reference to local criteria for success – criteria that went beyond issues of
pronunciation or grammatical accuracy to incorporate key features of the genre, including
content selection and audience rapport.

Sociocultural approaches in SLA drawing on anthropology and sociolinguistics seldom
make use of experimental methods, which are fundamentally at odds with a more “ecolo-
gical” perspective on language use and development. In contrast, Vygotskian-inspired L2
studies often use pre-tests, interventions, and post-tests to chart development or change over
time (see studies in Lantolf et al., 2018). Sociocultural work typically seeks to examine
changes in learners’ spoken repertoires in instances of real-world use over a period of time in
a specific context (e.g., while performing an authentic task such as a role play, or within a
classroom or community over weeks, months, or years). Furthermore, to gain insight into
learners’ linguistic choices and the factors that mediate them, (oral) language use is studied in
ways that account for the interactive, cultural, and (in some studies) sociopolitical context of
learning and production. Thus, sociocultural research tends to be longitudinal and devel-
opmental and often solicits participants’ own perspectives on their language learning, use
and sense of membership and legitimacy as multilingual/L2 users (Anya, 2016).
Qualitative case studies, ethnographic methods, and various linguistic or discourse analyses
are commonly used in this research and aim to create linkages between macro-level phenomena
and micro-level practices (for an overview of case study research methods in applied linguistics,
see Duff, 2014). To investigate macro-level phenomena, researchers may conduct site visits,
write field notes, keep reflective journals, analyze policy documents and media reports, or make
maps and diagrams of physical learning spaces. In addition, researchers usually conduct open-
ended or semi-structured interviews to gain insight into participants’ understandings of their
language use, identities, and experiences, as well as local language policies (e.g., “English-only”
signage in the school). To observe practices at the micro-level, researchers typically gather
(oral) interaction data via audio- or video-recording in “naturalistic” settings (e.g., service
encounters, classrooms, workplaces, or informal settings such as the dinner table). Sometimes,
due to difficulties gaining permission to observe participants’ behaviours in their social con-
texts, analyses may rely more, or exclusively, on participants’ self-reports of their interactions
and development (Norton, 2013; Duff et al., 2000). Alternatively, researchers may use elicited
interaction on purposefully-designed research or pedagogical tasks (e.g., Kinginger & Belz,
60
2005; van Compernolle, 2014). Typically, the researcher is present while recording and taking
field notes as the interaction unfolds. However, adult or university level learners may in some
cases be asked to record their own interactions with mobile phones or handheld recorders and
to submit them to researchers later. This practice of remote recording gives participants more
discretion over the data collected and shared and, at the same time, the process is less intrusive
than having the researcher present. Data are typically interpreted from a participant-relevant
(or emic) perspective to examine how participants themselves orient to and co-construct local
meanings. An advantage of this emic approach is that it resists the use of external standards or
benchmarks to evaluate learners’ oral production and thus allows researchers to capture and
conceptualize the value of non-standard communicative practices, such as translanguaging
(freely intermixing languages or dialects) and language play (e.g., Martin–Beltrán, 2010; Al
Masaeed, 2016) or even transgressive, oppositional behaviours and speech (e.g., Talmy, 2008).
Findings from sociocultural studies are often reported as case studies to provide sufficient
contextualization and depth when describing individuals or small groups of learners.
Typically, cases in sociocultural research are interpretive – they focus on a phenomenon as it
occurs in a specific context. A case can include the experiences and language use of a single
focal learner, or the shared practices of a larger institution such as a school, or the nesting of
cases within cases. Often, researchers will select several focal cases within a research project
to illustrate contrasting findings. For example, in the Kinginger et al. (2014) study on SA
students’ mealtime conversations with Chinese host families, findings were presented for two
participants with significantly differing initial proficiencies, thus providing an example of
contrasting experiences. Alternatively, focal participants may be selected because they re-
present a “typical” developmental trajectory, or conversely, because they are exceptional in
some way. Explaining the rationale for recruitment and selection is important when con-
textualizing cases within the larger sample of learners and within broader societal issues and
theoretical questions. A key potential strength of case study research (and some other ap-
proaches) is that it leads to new understandings, awareness, and (possibly) empathy, and to
the realization that we are dealing with whole people with histories and aspirations, and not
simply data. In this sense, the research is intended to be transformative for the reader as well
as (potentially) the field (Duff, 2014).
Since case studies often involve multiple data sources, sociocultural work may make use
of multiple methods of data analysis including narrative analysis, descriptive and thematic
analysis (often for the purposes of contextualization or examining meso/macro-level phe-
nomena), and some form of linguistic or discourse analysis to investigate the linguistic re-
sources used (lexical, grammatical, pragmatic, etc.). In her study on racialized identities in
SA learning of L2 Portuguese by American students in Brazil, Anya (2016) presents all of the
following analyses: a thematic analysis of participants’ perceptions of race, a descriptive
discourse analysis of students’ overt references to identity categories in talk, and a critical
discourse analysis of participants’ interactions focusing on genres, speech acts, interactive
frames, and stance-taking practices. By employing multiple analytic methods, researchers
such as Anya can make clearer connections between phenomena at the macro and meso
levels (e.g., discourses about race and racial categories) and speaking practices at the mi-
crolevel (e.g., stance-taking practices that index racial identities). While only the micro-level
analyses are about speaking performance in the strict sense, macro-level analyses answer
important questions about factors that shape interlocutors’ linguistic choices, willingness to
engage in particular oral interactions, and the reactions of other interlocutors to learners’
language use.
The design of sociocultural research can vary widely in its epistemological commitments,
and may take up cognitive, interactionist, or critical orientations. Vygotskian-inspired L2
61
studies that focus on concept-based instruction are quite cognitivist in the sense that they aim
to understand changing conceptions and control over particular linguistic forms (e.g., tense-
aspect marking). In contrast, work that takes a critical perspective, such as De Costa’s work
(2014) which adopts a language socialization framework, focuses on macro-systems of power
and their relationship to micro-level events. The De Costa’s research involved a year-long
ethnographic study of high school students studying abroad in Singapore in which he con-
ducted extensive field observations and became highly involved in the everyday life of the
school. His research questions reflect his commitment to critical enquiry and focused on how
macro-ideologies about cosmopolitanism were enacted at the level of interaction and how
those interactions impacted “students” development of a cosmopolitan outlook and set of
linguistic practices’ (p. 13). Informed by the work of post-structural scholars such as
Bourdieu (1991), De Costa collected and analyzed policy documents from the school to
demonstrate macro-level discourses. Then, using insights from interactional sociolinguistics
and conversation analysis, he examined how those ideologies were reproduced and trans-
formed in actual instances of classroom interaction with students and teachers. His findings
demonstrate how changes in students’ oral language use are linked to external societal
structures and values. As this example illustrates, the types of information that a socio-
cultural researcher chooses to gather and the ways they are analyzed largely depends on their
epistemological commitments and initial research questions. Critical readers of sociocultural
research should reflect carefully on the ways in which a researcher’s chosen methods align
with their theorizations of the learner, language, learning, power, and society.

Sociocultural SLA research has generated implications for the establishment of target norms,
assessment practices, content selection, and ways of explicating linguistic concepts in L2
classrooms. For example, some sociocultural research drawing on functional grammar and
interactional competence urges practitioners to shift the target of L2 pedagogy from the
emphasis on traditional accuracy or grammar-focused outcomes to a more repertoire-based
approach to L2 teaching (Douglas Fir Group, 2016; Hall, 2019). Repertoire-based ap-
proaches focus less on error correction at the word, phrase or sentence level and instead view
learner development as an ever-expanding ability to participate in new discourse contexts
with new interlocutors. Within these approaches, effective and appropriate communication
and joint negotiation are prioritized over external prescriptive norms for oral language use.
This vision of language development has the potential to de-centre the native speaker norm
by placing value on contexts that learners need and want to interact in, which are often
multilingual and do not necessarily involve native speakers of the target language.
In the area of academic English, the focus on repertoires allows for more curricular at-
tention in the classroom to how different disciplines organize language in different ways (e.g.,
genre/field of speaking tasks). For example, in their investigation of the classroom language
used by undergraduate mathematics instructors, Artemeva and Fox (2011) draw attention to
“chalk talk” as a central multimodal genre of mathematics education. The authors describe
the language, organization, and practice required for instructors for a wide variety of lan-
guage backgrounds to become effective users of chalk talk. This work has generated insights
for planning content–language integrated courses and programmes for mathematics in some
university programmes.
Sociocultural understandings of development are also shaping assessment practices in the
field (see Iwashita, this volume). For example, paired speaking tests focusing on interactional
62
competence – learners’ abilities to jointly negotiate meaning – are increasingly common (e.g.,
Galaczi, 2014). This form of assessment evaluates the range of discursive moves and turn-
taking strategies learners are able to employ in their L2. Other sociocultural researchers such
as Poehner and Lantolf (2005) advocate dynamic assessment, in which examiners interact
with examinees during oral exams to better understand the way learners process language,
and to lead them to new cognitive stages of development (see also Lantolf et al., 2018). This
approach to assessment prioritizes the sociocultural notion that language learning is ne-
cessarily a mediated process and draws on the Vygotskian notion of the zone of proximal
development. Dynamic assessment has been incorporated successfully into assessments of
oral pragmatic knowledge. For example, Van Compernolle (2014) engaged learners of
French in a task in which they judged the appropriateness of the pronominal address forms
tu and vous; however, rather than having learners complete these tasks independently, they
engaged in cooperative dialogue with a tutor and explained their choices and thinking as they
completed the task.
The connections that sociocultural research has shown between language use and identity
also have implications for how practitioners select content and design curricula. Most so-
ciocultural work highlights the importance of teaching material that allows students to
cultivate an L2 identity that they value. For example, Lin and Man (2011) drew on findings
from sociocultural research related to identity to create a rap-based extra-curricular English
programme for youth in Hong Kong. The authors explain that they designed the curriculum
around the identity of the young emcee “with the idea that many students would find in ELT
RAP a space to reconcile their mixed feelings about English” (p. 205). Rap and hip hop were
particularly appealing to the students in the programme as a medium through which youth
worldwide speak out against social injustice. The rhyming elements of rap also afforded
opportunities for phonetic development and writing “good” rap required students to engage
with current events to expand their active vocabulary.
Sociocultural research on speaking does not usually compare pedagogical methods or
evaluate performance based on external criteria; it is conducted in very specific learning
contexts. Whereas some studies are unlikely to provide teachers with explicit guidance about
how to teach specific features of oral language in their classrooms, others have developed
elaborate methods for raising learners’ awareness of linguistic concepts, meanings, and re-
lationships. Implications of this research focus on expanding the awareness of researchers,
teachers, learners, and programmes of the complex contextual and linguistic factors that
influence oral language use and learning. Furthermore, sociocultural approaches urge
practitioners and researchers to not view speaking as a context-independent skill but rather
to consider for what purpose and with whom students will be using oral language. It en-
courages practitioners to think critically about the learning processes in their own contexts
by asking questions such as:
• What social meanings are attributed to the language and linguistic forms I am teaching
by my learners, the school/institution, and society?
• How will awareness of those social meanings impact learners’ ability to participate in
their desired communities?
• What messages does my teacher-talk communicate to students about their identities and
abilities as learners or as multilingual speakers and how does it scaffold their learning?
• How do different modes of meaning-making (e.g., visual, gestural, textual; and in-
structional vs. experiential) work together with oral language to produce optimal con-
ditions for language learning in my specific learning space?
63
This approach advocates for pedagogies that raise students’ awareness of these questions as
well as teaching that provides students with different options for expressing social meanings
in particular contexts. In sum, findings from sociocultural work challenge the input/output
conceptualization of language acquisition, which dominated much cognitive–interactionist
theory and pedagogy in SLA, in favour of dialogic/mediated understandings of speaking that
conceptualize all communication as interaction.
7 Future Directions
Writing this chapter during a unique and tumultuous period of COVID-19 self-isolation, on
the one hand, and anti-racist protests around the world, on the other, we offer some final
remarks on future directions for SLA research. These two acute circumstances are relevant to
sociocultural research on L2 speaking and learning in several ways. First, in relation to
COVID-19 constraints, more speaking and learning is taking place online than ever before,
typically mediated by synchronous video communication platforms (e.g., Zoom). However,
such platforms transform the nature of interaction and learning, due to limited screen size,
bandwidth, lag time or overlap between turns, the potential for concurrent chatting through
sidebar conversations, and other features (e.g., eliminating sound or video images through
muting buttons, or forgetting to unmute them when starting to speak causing unexpected
delays in taking turns). It remains to be seen how this critical epidemiological phenomenon
and rapidly emerging new tools and forms of learning and communicating will change
pedagogical and participatory structures for SLA learning and speaking in either the near
term or longer term. Nor is it clear how SLA research designs and pedagogies in relation to
speaking will adapt accordingly. This is an area with substantial intriguing possibilities for
research, theory, and practice.
Second, the current global surge of social unrest confronting racism and violence,
coupled with urgent calls for decolonization and Indigenous language revitalization, re-
quire a collective reckoning regarding our priorities and practices as applied linguists,
educators, and citizens. We are urged to re-examine the curriculum, modes of access to and
participation in education and in intercultural interactions as well as in civic society. We
must critically examine representations of speakers, contexts, and positionalities in learning
materials, in research, in the range of L2s (and multilingual repertoires) examined, and in
our research practices that reproduce the marginalization of certain people, cultures, and
languages. Although marginalization and oppression are historically-rooted sociological
and political processes, they continue to impact public and private perceptions of the value
of certain populations and experiences, heritage/Indigenous languages, locally and inter-
nationally valued ways of learning and participating, and people’s sense of legitimacy,
safety, and purpose in society. All of these factors affect people’s engagements with
learning and their future trajectories and well-being. Chapters in Burdelski and Howard
(2020) provide examples of how alienation occurs through speech activities (despite di-
rectives by teachers insisting on inclusive, respectful practices) as well as other behaviours,
attitudes, and groupings of participants: such as who will or will not dance or play with
minoritized children in an elementary school based on perceptions of the inferiority of the
Other. These larger discourses are highly relevant in sociocultural studies of SLA as they
may reproduce widespread historical and contemporary processes of exclusion and neglect
rather than foster greater inclusion, social participation, and opportunities for upward
mobility.
64
Further Reading
Douglas Fir Group. (2016). A transdisciplinary framework for SLA in a multilingual world. The
Modern Language Journal, 100, 19–47.
An interdisciplinary framework and ten principles for considering SLA across macro, meso, and micro
levels of analysis, integrating sociocultural and other compatible theories.
Duff, P., & May, S. (Eds.). (2017). Language socialization. Encyclopedia of language and education
(3rd edn). Cham: Switzerland.
This edited volume highlights one sociocultural approach to language learning: language socialization.
The authors examine the learning and use of a variety of oral and signed languages across a range of
learning contexts and across the lifespan.
Lantolf, J. P., Poehner, M., & Swain, M. (Eds.). (2018). The Routledge handbook of sociocultural theory
and second language development. New York: Routledge.
An authoritative overview of current Vygotskian-inspired sociocultural theory and concepts in SLA
with pedagogical implications.
References
Al Masaeed, K. (2016). Judicious use of L1 in L2 Arabic speaking practice sessions. Foreign Language
Annals, 49(4), 716–728. doi:10.1111/flan.12223
Anya, U. (2016). Racialized identities in second language learning: Speaking blackness in Brazil.
Artemeva, N., & Fox, J. (2011). The writing’s on the board: The global and the local in teaching
undergraduate mathematics through chalk talk. Written Communication, 28(4), 345–379. doi: 10.11
77/0741088311419630
Avni, S. (2012). Translation as a site of language policy negotiation in Jewish day school education.
Current Issues in Language Planning, 13(2), 77–89. doi: 10.1080/14664208.2012.678976
Bourdieu, P. (1991). Language and symbolic power. [G. Raymond & M. Adamson, Trans.]. Cambridge,
MA: Harvard University Press.
Burdelski, M. J., & Howard, K. M. (Eds.). (2020). Language socialization in classrooms: Culture, in-
teraction, and language development. Cambridge: Cambridge University Press.
Darvin, R., & Norton, B. (2015). Identity and a model of investment in applied linguistics. Annual
Review of Applied Linguistics, 35, 36–56. doi:10.1017/S0267190514000191
De Costa, P. (2014). Reconceptualizing cosmopolitanism in language and literacy education: Insights
from a Singapore school. Research in the Teaching of English, 49(1), 9–30. https://www.jstor.org/
stable/24398662
Diao, W. (2016). Peer socialization into gendered L2 mandarin practices in a study abroad context:
Talk in the dorm. Applied Linguistics, 5(1), 599–620. doi:10.1093/applin/amu053
Dings, A. (2014). Interactional competence and the development of alignment activity. The Modern
Language Journal, 98(3), 742–756. doi:10.1111/j.1540-4781.2014.12120.x
Douglas Fir Group. (2016). A transdisciplinary framework for SLA in a multilingual world. The
Modern Language Journal, 100, 19–47. doi:10.1111/modl.12301
Duff, P. A. (2007). Second language socialization as sociocultural theory: Insights and issues. Language
teaching, 40(4), 309–319. doi:10.1017/S0261444807004508
Duff, P. A. (2014). Case study research on language learning and use. Annual Review of Applied
Linguistics, 34(2014), 233–255. doi:10.1017/S0267190514000051
Duff, P. A. (2019). Social dimensions and processes in second language acquisition: Multilingual so-
cialization in transnational contexts. The Modern Language Journal, 103, 6–22. doi:10.1111/
modl.12534
Duff, P. A., & Talmy, S. (2011). Language socialization approaches to second language acquisition. In
D. Atkinson (Ed.), Alternative approaches to second language acquisition (pp. 95–116). New York:
Routledge. doi:10.4324/9780203830932
Duff, P. A., Wong, P., & Early, M. (2000). Learning language for work and life: The linguistic so-
cialization of immigrant Canadians seeking careers in healthcare. Canadian Modern Language
Review, 57(1), 9–57. doi:10.3138/cmlr.57.1.9
65
DuFon, M. A. (2006). Socialization of taste during study abroad in Indonesia. In M. A. DuFon & E.
Churchill (Eds.), Language learners in study abroad contexts (pp. 91–119). Clevedon: Multilingual
Matters.
Early, M., Kendrick, M., & Potts, D. (2015). Multimodality: Out from the margins of English language
teaching. TESOL Quarterly, 49, 447–460. doi:10.1002/tesq.246
Eriks‐Brophy, A., & Crago, M. (2003). Variation in instructional discourse features: Cultural or lin-
guistic? Evidence from Inuit and non‐Inuit teachers of Nunavik. Anthropology & Education
Quarterly, 34(4), 396–419. doi:10.1525/aeq.2003.34.4.396
Friedman, D. (2010). Speaking correctly: Error correction as a language socialization practice in a
Ukrainian classroom. Applied Linguistics, 31, 346–347. doi:10.1093/applin/amp037
Galaczi, E. D. (2014). Interactional competence across proficiency levels: How do learners manage
interaction in paired speaking tests? Applied Linguistics, 35(5), 553–574. doi:10.1093/applin/amt017
Hall, J. K. (2019). Essentials of SLA for L2 teachers: A transdisciplinary framework. New York:
Routledge.
Hall, J. K., Hellerman, J., & Pekarek Doehler, S. (Eds.). (2011). L2 interactional competence and de-
velopment. Bristol, UK: Multilingual Matters.
Hymes, D. (1964). Introduction: Toward ethnographies of communication. American Anthropologist,
66(6), 1–34.doi:10.1525/aa.1964.66.suppl_3.02a00010
Iino, M. (2006). Norms of interaction in a Japanese homestay setting: Toward a two-way flow of
linguistic and cultural resources. In M. A. DuFon & E. Churchill (Eds.), Language learners in study
abroad contexts (pp. 151–202). Clevedon: Multilingual Matters.
Kinginger, C. (2008). Language learning in study abroad: Case studies of Americans in France. The
Modern Language Journal, 92, 1–124. doi:10.1111/j.1540-4781.2008.00821.x
Kinginger, C., & Belz, J. A. (2005). Socio-cultural perspectives on pragmatic development in foreign
language learning: Microgenetic case studies from telecollaboration and residence abroad.
Intercultural Pragmatics, 2(4), 369–421. doi:10.1515/iprg.2005.2.4.369
Kinginger, C., Lee, S.-H., Wu, Q., & Tan, D. (2014). Contextualized language practices as sites for
learning: Mealtime talk in short-term Chinese homestays. Applied Linguistics, 1–26. doi:10.1093/
applin/amu061
Kinginger, C., & Wu, Q. (2018). Learning Chinese through contextualized language practices in study
abroad residence halls: Two case studies. Annual Review of Applied Linguistics, 38, 102–121.
doi:10.1017/S0267190518000077
Kobayashi, M. (2016). L2 academic discourse socialization through oral presentations: An under-
graduate student’s learning trajectory in study abroad. Canadian Modern Language Review, 72(1),
95–121. doi:10.3138/cmlr.2494
Kobayashi, M., Zappa-Hollman, S., & Duff, P. (2017). Academic discourse socialization. Language
socialization. Encyclopedia of Language and education (3rd edn, pp. 239–254). Cham, Switzerland:
Springer.
Lantolf, J. P. (1994). Sociocultural theory and second language learning. Modern Language Journal,
78(4), 418–420. doi:10.2307/328580
Lantolf, J. P., Poehner, M., & Swain, M. (Eds.). (2018). The Routledge handbook of sociocultural theory
and second language development. New York: Routledge.
Lantolf, J. P., & Thorne, S. L. (2006). Sociocultural theory and the genesis of second language devel-
opment. Oxford: Oxford University Press.
Li, D. (2000). The pragmatics of making requests in the L2 workplace: A case study of language
socialization. Canadian Modern Language Review, 57(1), 58–87. doi:10.3138/cmlr.57.1.58
Lin, A., & Man, E. (2011). Doing-hip-hop in the transformation of youth identities: Social class, ha-
bitus, and cultural capital. In C. Higgins (Ed.), Negotiating the self in a second language: Identity
formation and cross-cultural adaptation in a globalizing world (pp. 201–220). London: Equinox.
Martin–Beltrán, M. (2010). The two‐way language bridge: Co‐constructing bilingual language learning
opportunities. The Modern Language Journal, 94(2), 254–277. doi:10.1111/j.1540-4781.2010.01020.x
Moore, L. C. (2013). Qur’anic school sermons as a site for sacred and second language socialisation. Journal
of Multilingual and Multicultural Development, 34(5), 445–458. doi:10.1080/01434632.2013.783036
Morita, N. (2004). Negotiating participation and identity in second language academic communities.
TESOL Quarterly, 38(4), 573–603. doi:10.2307/3588281
Norton, B. (2013). Identity and language learning: Extending the conversation. Bristol, UK: Multilingual
matters.
66
Poehner, M. E., & Lantolf, J. P. (2005). Dynamic assessment in the language classroom. Language
Teaching Research, 9(3), 233–265. doi:10.1191/1362168805lr166oa
Ro, E., & Burch, A. R. (2020). Willingness to communicate/participate’ in action: A case study of
changes in a recipient’s practices in an L2 book club. Linguistics and Education, 58, 100821.
doi:10.1016/j.linged.2020.100821
Shively, R. L. (2013). Learning to be funny in Spanish during study abroad: L2 humor development.
The Modern Language Journal, 97(4), 930–946. doi:10.1111/j.1540-4781.2013.12043.x
Shively, R. L. (2018). Language socialisation during study abroad: Researching interactions outside the
classroom. In S. Coffey & U. Wingate (Eds.), New directions for research in foreign language edu-
cation (pp. 97–112). New York: Routledge.
Siegal, M. (1996). The role of learner subjectivity in second language sociolinguistic competency:
Western women learning Japanese. Applied Linguistics, 17(3), 356–382. doi:10.1093/applin/17.3.356
Surtees, V. (2018). Peer language socialization in an internationalized study abroad context: Norms for
talking about language (Doctoral dissertation). University of British Columbia.
Talmy, S. (2008). The cultural productions of the ESL student at Tradewinds High: Contingency,
multidirectionality, and identity in L2 socialization. Applied Linguistics, 29(4), 619–644. doi:10.1093/
applin/amn011
Theodórsdóttir, G. (2018). L2 teaching in the wild: A closer look at correction and explanation
practices in everyday L2 interaction. The Modern Language Journal, 102, 30–45. doi:10.1111/
modl.12457
Van Compernolle, R. A. (2014). Sociocultural theory and L2 instructional pragmatics. Bristol:
Van Compernolle, R. A. (2015). Interaction and second language development: A Vygotskian perspective.
Philadelphia: John Benjamins.
Zappa-Hollman, S. (2007). Academic presentations across post-secondary contexts: The discourse
socialization of non-native English speakers. Canadian Modern Language Review, 63(4), 455–485.
doi:10.3138/cmlr.63.4.455
Zuengler, J., & Miller, E. R. (2006). Cognitive and sociocultural perspectives: Two parallel SLA
worlds? TESOL Quarterly, 40(1), 35–58. doi:10.2307/40264510
67
5
APTITUDE AND INDIVIDUAL
DIFFERENCES
Joan C. Mora
Speaking in a second language is very challenging. Besides difficulties in finding the right
words and using the L2 grammar appropriately, learners often struggle to produce un-
familiar sounds. Their utterances often contain hesitations, inappropriate silences, and
mispronounced words, all of which make speech difficult to understand.
L2 pronunciation research is concerned with learners’ development and acquisition of the
sound system of a second language (L2 phonology), and with pronunciation instruction and
assessment (Derwing & Munro, 2015; this volume). Acquiring an L2 sound system involves
learning to perceive and produce individual sounds (vowels and consonants), knowing how
they combine to form syllables and words (phonotactics), and learning the appropriate
rhythm and intonation patterns (prosody), all of which shape the way utterances are pro-
duced and perceived. Perceptual dimensions of L2 speech, such as accentedness (speech
nativelikeness measured as degree of perceived foreign accent), perceived fluency, compre-
hensibility (ease of understanding), and intelligibility (actual amount of speech understood)
reflect L2 speech learning and development and are therefore also sensitive to sources of
individual differences.
Current L2 speech acquisition models such as the Speech Learning Model (SLM; Flege,
1995) and its revised version (SLM-r; Flege & Bohn, 2021) and the Perceptual Assimilation
Model of L2 speech learning (PAM-L2; Best & Tyler, 2007) aim to account for L2 learners’
speech development in naturalistic second language acquisition (SLA). These models focus
mainly on the perception and production of individual sounds and their mental re-
presentations (phonetic categories), in particular sound contrasts that are functionally im-
portant in language because they distinguish meaning (e.g., the vowels in cat and cut).
L2 fluency research investigates the properties and development of speaking fluency
(smooth delivery of speech), comprising three dimensions of speech production: utterance
fluency, cognitive fluency, and perceived fluency (Kahng, this volume; Segalowitz, 2010,
2016). Utterance fluency can be measured in terms of temporal properties (e.g., speech rate),
breakdowns (e.g., pauses and hesitations) and repairs (e.g., repetitions, false starts and re-
formulations that allow learners to overcome breakdowns). Cognitive fluency is the efficient
use of mechanisms and processes such as fast retrieval of words from memory, while per-
ceived fluency is the listeners’ perception of a speaker’s output. L2 fluency research has
68 DOI: 10.4324/9781003022497-7
Aptitude and Individual Differences
investigated the relationships among utterance fluency measures (e.g., speech rate), listeners’
judgements (Suzuki & Kormos, 2020), and speakers’ cognitive fluency (Kahng, 2020), as well
as the predictability of L2 utterance fluency from parallel L1 measurement (de Jong
et al., 2015).
At all levels of proficiency, learners vary in their success in resolving L2 speech production
difficulties and consequently in their comprehensibility. Even for L2 learners with similar L2
learning histories, variability in speech is huge. For example, whereas some advanced L1-
Spanish learners of English may effectively distinguish the English word cat /kæt/ from cut
/kᴧt/ in production, many others do not, pronouncing both words with their perceptually
closer L1 equivalent /a/. Similarly, some learners may speak English at near-native rates
(approximately 240 syllables per minute), whereas others are much slower, ranging from 110
to 230 syllables per minute (Mora & Valls-Ferrer, 2012). Thus, some L2 learners’ pro-
nunciation may be easy to understand while that of others may be detrimental to intellig-
ibility; some speak fluently without apparent difficulties, while others pause frequently,
compromising comprehension. Individual differences research aims to understand and de-
scribe the sources of this inter-learner variability and seeks to identify the factors that may
explain how well L2 learners speak, especially at ultimate attainment levels, that is, when the
L2 learner has reached a relatively advanced level of L2 competence and use that is no longer
progressing substantially. Such factors have been investigated from a variety of approaches
and perspectives, and conceptualizations of aptitude and individual differences abound (e.g.,
Doughty, 2019; Dörnyei, 2005; Kormos, 2013; Robinson, 2012).
In this chapter individual differences are understood to comprise a number of predictors
of ultimate attainment in L2 speech that can be categorized as age- and experience-related
factors (age of onset of L2 learning, amount and quality of L2 exposure and use, L1
background, learning contexts) sociopsychological factors (motivation, personality, anxiety,
learning strategies, willingness to communicate) and cognitive and aptitude-related factors
(working memory, acoustic and phonological memory, attention, inhibition, and auditory
processing skills).
L2 speech Production Mechanisms and Processes and Individual Differences

The speaking performance of even very advanced second language (L2) learners often shows
levels of skill development that fall strikingly short of the levels of use in their first language
(L1). In L1 speaking, the processes of message conceptualization (generating a pre-verbal
message), formulation (selecting words from memory and setting their phonological, mor-
phological and grammatical properties to produce them in grammatically correct sentences),
articulation (generating the articulatory gestures for the production of speech) and speech
monitoring all function efficiently and effortlessly, drawing on robust automatized lexical,
morphosyntactic, and phonological knowledge. For example, producing the phrase “Would
you like a drink?” as /wʊʤə laɪk ə drɪŋk/ does not require any metalinguistic analysis for the
native speaker, but it might involve consciously selecting the appropriate auxiliary verb for
the L2 speaker. In L2 speaking, therefore, formulation processes are typically effortful and
inefficient, drawing on lexical, phonological and grammatical knowledge that may be only
partially automatic and which require conscious processing, slowing down speech produc-
tion (Kormos, 2006; Segalowitz, 2010). Phonological processing differences are also obvious
between L1 and L2. In L1, phonetic and phonological encoding (e.g., determining whether a
sound is to be realized as voiceless in a specific phonetic environment) and the activation of
the articulatory representations that allow speakers to produce sound sequences are auto-
matic. In perception, the language-specific mechanisms of acoustic, phonological, and lexical
69
Joan C. Mora
analysis also function automatically. However, in the L2 these mechanisms function less
efficiently because representations are less accurately defined and the processes to access
them less automatized.
As a consequence of the processing difficulties outlined earlier, L2 speech is usually less
accurate and less complex grammatically, lexically and phonologically, temporally less
fluent, and much more variable than L1 speech, which makes it less comprehensible and
harder to process for native and non-native listeners (Munro & Derwing, 1995b).
Understanding the sources of this variability is important for pedagogical, scientific, and
sociopsychological reasons. Pedagogically, it will help to improve instruction through tai-
loring of classroom tasks to individual learner characteristics, as well as assessment methods.
From a scientific point of view, it will help us to gain a better understanding of the me-
chanisms and processes involved in the acquisition of L2 speech, and it will help to develop
and extend current models of L2 speech learning that still do not take individual differences
into account (but see Flege & Bohn, 2021). Finally, it will help us to better understand the
psychological and social dimension of L2 speaking, such as the impact of learners’ speech on
their self-confidence and social integration (Segalowitz, 2016).
The study of language learning aptitude and individual differences has a long tradition in the
field of SLA (Dörnyei & Ryan, 2015; Skehan, 2014) and is primarily motivated by the ob-
servation that language learners vary in their ability to master an L2 and the notion that
identifying learner characteristics leading to successful language learning will benefit L2
learning and teaching.
Early research on individual differences focused on identifying language aptitude
components. Carroll and Sapon’s (1959) modern language aptitude test (MLAT) identified
four main components of language learning aptitude: phonemic coding ability (sound
coding involving sound retention, retrieval, and recognition), grammatical sensitivity
(capacity to identify grammatical relationships), inductive language learning skill (ability
to extract syntactic and morphological patterns and use them for further language pro-
cessing), and associative memory (ability to establish memory links between L1 and L2
vocabulary items). Of these components, sound sequence recognition (for lexicalization
processes) and phonemic coding ability (for input processing) seem obvious sources of
individual differences. A more recent test battery, LLAMA (Meara, 2005), has been used
extensively to assess language learning aptitude. In particular, LLAMA-E (phonemic
coding ability) has been associated with segmental and suprasegmental pronunciation
accuracy (Saito, 2019a), and LLAMA-D (sound sequence recognition) has been found to
be positively related to L2 learners’ development of comprehensibility and speed and
breakdown fluency (Saito et al., 2019). Another test battery, specially designed to predict
very high levels of L2 proficiency, is the High-Level Language Aptitude Battery (Hi-LAB;
Doughty, 2019). It extends the MLAT by incorporating several working memory com-
ponents, but its speech-related components (auditory perceptual acuity: phonemic dis-
crimination and categorization) have not been investigated extensively yet as predictors of
L2 speech learning. Granena (2019) recently combined a measure of sound recognition
(LLAMA D) and one of facilitation of lexical processing (Hi-LAB’s ALTM synonym) into
a single implicit memory ability predictor, and found it to predict L2 speech rate
(see Part 4). In general, the amount of variance explained by aptitude sub-tests in outcome
speech measures appears to be very modest (8%–10%; Granena, 2019; Saito, 2019b; Saito
et al., 2019).
70
Extensive research in the 1980–1990s from L2-immersion settings focused on age- and
experience-related factors and demonstrated their role in determining ultimate attainment in
L2 speech learning, often with a focus on L2 pronunciation accuracy and accentedness. Such
factors include the extent to which the L1 and the L2 differ phonetically (L1 background),
the age of onset of L2 learning (AOL) or age of first extensive exposure to the L2, L2
experience (often operationalized as length of residence in an L2 speaking environment
(LoR) or as amount and quality of L2 input received) and frequency and amount of L2 use
(for a review see Bohn & Munro, 2007). However, the large number and variety of factors
examined make it difficult to determine which contribute the most to L2 speech learning.
Whereas some studies concluded that the strongest predictors of L2 pronunciation accuracy
were L1 background and motivation to speak the L2 well, others found AOL, LoR, and
amount of L1 and L2 use to be the most important predictors (e.g., Flege et al., 1995). A
similar picture emerges with regard to L2 speaking fluency. L2 use factors such as amount of
interaction with native speakers and socializing in the L2 appear to be important predictors
of attainment in immigrant populations (Derwing & Munro, 2013), but other studies have
found L2 experience to be less predictive of speaking fluency than AOL (Trofimovich &
Baker, 2006). The outcome of this research, however, suggests that at least in immersion
settings an early start is better, and L2 use and exposure impact L2 speech development
positively. This contrasts sharply with findings from research conducted in foreign language
classroom contexts, where age- and experience-related factors appear to contribute little to
L2 speech acquisition due to the limited exposure and L2 use conditions in FL classrooms. In
this context, learner characteristics other than those related to L2 exposure and use, such as
aptitude (Saito et al., 2019) or motivation (Saito et al., 2017) may explain a considerable
amount of inter-learner variability in L2 speech learning.
Research on individual differences in L2 speech learning over the past 20 years has ex-
perienced a shift of focus towards cognitive aptitude skills (e.g., memory, attention, and
inhibition) and psychological factors (e.g., motivation and anxiety). Unlike psychological
predictors, cognitive language aptitude is unique in that it is componential (different com-
ponents underlie different aspects of performance), relatively stable in adulthood, and does
not change substantially with experience (Doughty, 2019). However, its contribution to L2
speech is difficult to assess, as the implication of different aptitude components in language
performance varies in strength, depending on the linguistic dimension investigated (Li, 2016).

Research on aptitude and individual differences in L2 speech faces a number of important
methodological challenges. First, L2 speech is a complex domain of L2 performance, en-
tailing the development of perceptual and articulatory skills, and fluent speech production
requires the skilful operation of the processes of message planning, formulation, and ar-
ticulation (de Bot & Bátyi, this volume). Second, speech is permeable to contextual, ex-
periential and individual learner factors, or combinations of factors (e.g., talent, personality,
and experience) that interact to shape unique language learning experiences and outcomes
(Moyer, 2014). Finally, whereas some individual characteristics are relatively stable over time
(cognitive functions, language learning aptitude, auditory processing skills, L1 background,
AOL), others (e.g., proficiency, or affective factors like motivation and anxiety) change with
learners’ experience, learning context, and time (see Figure 5.1). For example, speaking
anxiety is related to motivation and willingness to communicate (Baran-Łucarz, this volume)
and may affect both L2 speech development and performance (Teimouri et al., 2019), but the
magnitude of its impact is likely to change as L2 proficiency increases and learners become
71
Joan C. Mora
Cognitive and aptitude factors Sociopsychological factors
- Memory (WM, PSTM) - Motivation

- Attention control - Anxiety
- Inhibitory control - Verbal intelligence
- Musical aptitude - Personality (extraversion)
- Speed of lexical access - Learning strategies / attitudes
- Phonemic coding ability L2 speech - Willingness to communicate
acquisition
Age- and experience-related L2 speaking Auditory processing

factors performance
- AOL - Auditory acuity

- L1 background - Frequency discrimination
- LoR / instruction hours
- L2 input: quality / quantity - Spectral and temporal
- L2 proficiency / vocabulary auditory motor integration
- L1 exposure / use - Imitation ability
Figure 5.1 Sources of individual differences in L2 speech acquisition and performance
more confident in speaking the L2. Similarly, L2 learners’ ability to inhibit their L1 when
speaking the L2 may minimize L2-to-L1 interference in phonology (Darcy et al., 2016), but
the amount of interference is likely to be modulated by the amount of L1 and L2 exposure
and use. In sum, determining which factors influence (and to what extent) specific aspects of
L2 speaking performance and development is challenging because studies can include only a
limited set of factors and their effects on L2 speech are difficult to isolate. In addition,
interactions between experiential, psychological-, cognitive-, and aptitude-related sources of
individual differences in L2 speech (the black arrows in Figure 5.1) are under-researched.
Currently, questions addressed by most individual differences research mainly fall within one
of the following three categories:
1. To what extent does a given factor (e.g., L2 vocabulary size) affect L2 performance (e.g.,
pause frequency in an oral narrative task)?
2. To what extent does a given factor (e.g., auditory selective attention) affect a given L2
learning outcome (e.g., gains in the production of an L2 vowel contrast after phonetic
training)?
3. Which of a set of factors (e.g., phonemic coding ability, motivation, and hours of in-
struction) contributes the most to a given L2 learning outcome or to L2 performance
(e.g., pronunciation accuracy, comprehensibility, or speaking fluency)?
Most research addressing these questions has examined the speech performance of L2
learners differing in age, experience, or aptitude profiles at a single point in time and cross-
sectionally rather than longitudinally (but see Derwing & Munro, 2013). The outcome of this
research indicates that L2 input quantity and quality are fundamental in L2 speech learning,
whereas aptitude factors have a more modest influence. However, few researchers to date
have examined the relative contribution of experiential and aptitude factors together in a
72
single study, and those who have, report a complex relationship between aptitude, experience
and L2 speech learning. For example, Saito et al. (2019) found classroom experience to be
associated with improvement in comprehensibility, while phonemic coding ability was as-
sociated with the development of fluency and prosody. Saito et al. (2020) found segmental
pronunciation accuracy to be influenced by phonemic coding ability but experience factors
(in-class and out-of-class L2 use) influenced word stress, intonation, and speaking fluency. A
similar picture emerges from recent research on cognitive skills and L2 speech learning (see
part 4 below). Current findings therefore suggest that L2 speech learning is very complex,
with multiple factors contributing to the development of different dimensions of L2 speech
and to varying degrees at different stages in acquisition.
Experience-Related Sources of Individual Differences in L2 Speech Learning

Research on age- and experience-related factors in naturalistic settings (reviewed in Part 2 )
indicates that age of learning, as well as quantity and quality of L2 input and use are important
predictors of L2 speech learning (Flege, 2009). These factors, however, are currently under-
researched in foreign language learning contexts, where L2 input and use are minimal and it is
usually not possible to characterize L2 experience in terms of, for example, amount of inter-
action with native speakers. In this context, in- and out-of-class exposure has been found to be
more essential to the development of L2 oral skills than starting age (Muñoz, 2014). In fact,
students’ engagement in speaking classroom activities and the use of out-of-class sources of L2
input have also been shown to affect L2 speech performance and development positively (Saito
& Hanzawa, 2016), which underscores the prevailing role of input factors in any learning
context and the pedagogical relevance of providing in- and out-of-class speaking practice
through communicative tasks.
Sociopsychological Sources of Individual Differences in L2 Speech Learning

L2 learners’ personality (e.g., extraversion), learning attitudes and strategies (e.g., motiva-
tion) and social affective factors such as foreign language anxiety (FLA) or willingness to
communicate (WTC) may have a substantial impact on L2 pronunciation and speaking skills
(Dörnyei, 2006; Kormos, 2015).
FLA has been widely researched (Teimouri et al., 2019), as it is an important attention-
draining factor influencing learning processes and L2 speech production (Kormos, 2015).
Low anxiety levels have been linked to positive attitudes and motivation for foreign language
learning, whereas high anxiety levels may develop into L2 speaking apprehension in some
learners, compromising their oral performance and development. For example, Pérez
Castillejo (2019) found that more anxious learners produced longer and more frequent
pauses and spoke for less time during an oral exam.
Motivation, generally linked to successful L2 acquisition (Dörnyei & Ushioda, 2013) is
often operationalized as learners’ ratings of how important they think good pronunciation is
for their professional or social life (Sardegna et al., 2018). Motivation is positively related to
L2 pronunciation accuracy and speaking skills (Nagle, 2018), but has also been linked to
other socioaffective constructs such as the “ideal L2 self” or WTC (Moyer, 2014) and the use
of pronunciation learning strategies (Baker Smemoe & Haslam, 2013), reflecting learner
attitudes towards pronunciation, such as the desire to sound native-like and identify oneself
with the target culture. In immersion contexts, such attitudes are likely to promote speaking
73
Joan C. Mora
skills by leading learners to seek opportunities to engage in meaningful conversation with

native speakers. In the foreign language classroom such social involvement attitudes are less
likely to predict ultimate attainment due to the scarcity of opportunities for using the L2
interactively, but affective variables like motivation and anxiety may also have an impact
(Saito et al., 2017).
Cognitive Aptitude Sources of Individual Differences in L2 Speech Learning

Cognitive aptitude factors underlie language learning skills by supporting the efficiency of
speech processing in both L1 and L2 acquisition (Kormos, 2013) and include, but are not
limited to, the various components of working memory and executive control (Miyake &
Friedman, 2012). Executive functions are independent of one another, but support each
other in accomplishing task goals. For example, working memory resources help learners
regulate the amount of attention they allocate to input processing or output production
(Kormos, 2015; Simard, this volume).
Individual Differences in Working Memory

Working memory (WM) research in SLA is primarily based on Baddeley’s multi-component
model of working memory (Baddeley, 1986). It consists of a central executive system that
controls cognitive processes and coordinates three subsystems: the phonological loop, the vi-
suospatial sketchpad, and the episodic buffer. The phonological loop consists of a short-term
phonological memory (PSTM) store for verbal information that encodes phonological elements
(sounds) and their serial temporal order in the form of auditory memory traces that decay
rapidly, and an articulatory rehearsal component that prevents these auditory traces from
vanishing through subvocal rehearsal, a silent articulation mechanism. The PSTM store has
often been referred to as a language learning device, given its relevance in L1 and L2 acquisition.
WM has been positively associated with both L2 processing and proficiency outcomes,
but its role varies as a function of L2 learners’ age, the WM task employed, and the linguistic
domain examined (Juffs & Harrington, 2011). Kormos (2015) concluded that existing re-
search can offer only partial support for the role WM plays in determining the quality of L2
speech production, whereas PSTM has more consistently been associated with it, especially
in speaking fluency (e.g., O’Brien et al., 2007). In the domains of L2 speech perception and
phonological processing, the role of PSTM has not been clearly determined. One reason for
this is variability in the PSTM measures used, the target L2 phonology dimensions assessed,
and the learner populations investigated across studies. For example, Darcy et al. (2015)
found their L2 measures of PSTM (forward and backward digit span and non-word recall
tasks in L2 Korean) and a complex WM span measure to be related to individual L2-English
phonological scores (a composite phonological score based on segmental categorization,
lexical stress and phonotactics). Hu et al. (2013) assessed PSTM through a digit span task
and a non-word repetition task and found PSTM to be unrelated to L1-German advanced
learners’ English pronunciation accuracy in the reading of a passage in English. Darcy and
Mora (2016) found PSTM scores obtained through a serial nonword recognition task to be
related to L2-English perception accuracy (ABX categorical discrimination) in bilingual
learners, but not in a comparable monolingual population.
The implication of WM and its components, especially PSTM, in language learning is
quite robust for L2 grammar and vocabulary acquisition, but more research is needed to be
able to determine their implication in L2 speech learning.
74
Individual Differences in Attention

Attentional resources are necessary to process the information-relevant cues in the incoming
speech signal during speech processing and are crucial in perceptual learning because lis-
teners need to place their focus of attention on the linguistic cues that may become relevant
during communicative interaction (e.g., a phrase-final pitch rise signalling an unfinished
speaking turn). The amount of conscious attention required depends on the various encoding
processes responsible for lexical activation, retrieval, and articulation (Horst, this volume) as
well as learners’ proficiency level and the degree of automatization in speech processing
achieved (Kormos, 2015). Attention-shifting skills and selective attention are important at-
tentional sub-skills in language processing because communicating involves bringing relevant
linguistic information into focus while keeping irrelevant information in the attentional
background (Segalowitz, 2010) and listeners need to attend to specific detail in the acoustic
input to interpret auditory events.
Attention control may also help learners develop more accurate representations for L2
sounds by helping them notice L2-specific phonetic features or cross-language differences
between L2 and L1 sounds, or by allowing them to focus on relevant L2-specific phonetic
dimensions. However, studies have obtained mixed results, with positive relationships be-
tween attention control and L2 phonological processing only surfacing for specific partici-
pant groups or phonological processing tasks (Darcy & Mora, 2016; Darcy et al., 2015). It is
therefore difficult to come to compelling conclusions about the impact of individual differ-
ences in attention control on L2 speech learning.
Individual Differences in Inhibitory Control

Inhibition involves moving irrelevant information to the background while information
under focus is being attended to (Miyake & Friedman, 2012). In language processing, it
allows speakers to control linguistic interference, for example by inhibiting one language
while speaking another. Inhibitory control may modulate the amount of cross-linguistic
influence between the languages of multilingual speakers during lexical activation and word
recognition processes as well as during speech production. It is, therefore, likely for L2
learners with strong inhibitory control to be more efficient at suppressing cross-language
influence from their L1 when learning and using their L2, which might lead to more fluent L2
speech production (Linck et al., 2009) and less interference from the L1 in L2 phonological
processing. For example, Darcy et al. (2016) found that L2 learners with stronger inhibitory
skill were more accurate in categorically perceiving difficult L2 vowel contrasts. However,
this relationship did not hold for the same contrasts in production. Findings suggest that
inhibitory control supports L2 processing, but further research is needed to determine the
extent to which it can predict ultimate attainment in L2 pronunciation and speaking.
Individual Differences in Auditory Processing

L2 learners’ ability to efficiently process speech input and its segmental acoustic features
(e.g., formant frequency or duration) may be related to individual differences in general
auditory processing skills. Domain-general auditory processing skills constitute an under-
researched source of individual differences in L2 speech learning, but so far frequency and
pitch discrimination acuity (Lengeris & Hazan, 2010) and precision in the processing of
sound quality and quantity and auditory–motor integration (Saito et al., 2020) have been
found to be related to L2 speech learning. The extent to which such skills support or are
75
Joan C. Mora
supported by executive functions, phonological short-term memory and attention, remains

an empirical question for future research.
To sum up, research has shown that L2 experience (amount and quality of L2 input and
use) appears to contribute substantially to L2 speech learning in both immersion and
classroom foreign language contexts. Psychological factors like motivation and extroversion
seem to affect L2 speech learning positively, whereas anxiety appears to be detrimental to it.
The role of language learning aptitude (e.g., phonemic coding and sound sequence re-
cognition) and executive functioning (memory, attention, and inhibition), however, is in-
conclusive. Further research, especially research targeting potential interactions between
experience-related, psychological, and aptitude-related sources of individual differences, is
necessary to be able to gain a better understanding of their contribution to L2 speech
learning.

Researching individual differences in L2 pronunciation and speaking fluency entails deciding
on which speech dimensions to measure and what factors to investigate as potential sources
of individual differences.
L2 Speech Measures
L2 speech measures are based on perception and production data (analyzable in terms of
segmental, suprasegmental, and temporal properties) obtained through tasks that use a variety
of presentation formats and elicitation techniques (controlled or spontaneous) (Nagle et al., this
volume). For example, L2 learners’ perceptual sensitivity to the quality differences between the
vowels in the lexical contrast cat-cut can be measured through discrimination (same-different)
or identification tasks, whereas their discriminability in production can be measured through
acoustic analysis of the formant frequencies, perceptual judgements of pronunciation accuracy
(e.g., listeners’ judgements of accentedness on a Likert scale), or intelligibility tests involving
listeners’ transcriptions of auditorily presented materials (e.g., Saito & Plonsky, 2019).
However, segmental production accuracy data obtained through acoustic analyses and per-
ceptual judgments need not match perfectly, as a small acoustic difference may have a sub-
stantial impact on listeners’ perception, and vice-versa (Munro, 1993; Pérez-Ramón et al.,
2020). Measures of L2 learners’ speaking fluency (as well as measures of comprehensibility)
require more than word- or sentence-long stretches of speech and are typically obtained from
20–30 seconds of speech elicited through a picture-based oral narrative. Recorded speech
samples are then analysed in terms of their temporal, lexical, and grammatical properties with
the help of speech analysis software. Measures of speech rate (syllables per second including
pausing time), articulation rate (mean syllable duration computed by dividing speaking time
excluding pause time by the total number of syllables produced), pause frequency and duration
within and at clause boundaries and number of dysfluencies per minute (repetitions and repairs)
are among the most commonly used measures reflecting the speed, breakdown, and repair
characteristics of L2 learners’ utterance fluency.
Individual Differences Measures

There is so much variability in the instruments used in research to measure experiential,
psychological, and cognitive factors that it is hard to find two studies that have used the same
exact instrument. Language background and language use questionnaires provide subjective
76
estimates of participants’ learning histories and of L2 exposure and use, whereas psycho-
logical factors are often measured through adaptations of well-established questionnaire
instruments (e.g., the Foreign Language Classroom Anxiety questionnaire by Horwitz et al.,
1986). Similarly, cognitive psychology offers SLA researchers a wide array of cognitive tests,
even for measuring a single cognitive skill. For example, inhibitory control has been mea-
sured in SLA research through retrieval-induced forgetting (RIF) tasks (Darcy et al., 2016),
which measure participants’ capacity to supress the activation of a lexical item to the point of
forgetting it by increasing the activation of competing lexical items; a Stroop task (Lev-Ari &
Peperkamp, 2014), in which participants need to inhibit colour names (e.g., “red”) when
asked to name the congruent or incongruent ink colour in which they are visually presented
(i.e., the word red presented in blue ink); or a Simon task (Linck et al., 2012), in which
participants are presented with red or blue boxes appearing on the right or left of the screen
and are instructed to press a right or left key associated with the box colour regardless of its
spatial position. The RIF and Stroop tasks are domain-specific because they are based on
lexical activation and involve linguistic interference, whereas the Simon task is domain-
general rather than language-oriented. Still, they are all meant to measure the same cognitive
skill. This is an important methodological aspect to consider when assessing cognitive ap-
titude. Domain-general tasks allow researchers to obtain measures of cognitive individual
differences that are language independent, both in terms of the materials used and the
participants tested, whereas domain-specific linguistic-oriented tasks provide a cognitive
measure in a testing context that resembles the language use context where the cognitive skill
is required (e.g., speech production). It would therefore seem convenient to use speech-based
(rather than domain-general) tasks to obtain cognitive control measures used to predict
individual gains in L2 speaking performance.
Another methodological issue concerns distinguishing L1-based from L2-based sources of
individual differences. The speech production mechanisms and the cognitive processes that
support them (e.g., working memory) are essentially the same in L1 and L2 (Kormos, 2013)
and are not easy to disentangle. For example, pause frequency within clauses reflects pro-
blems in formulation (lexical, syntactic, and phonological encoding) both in L1 and L2 so
that measures of pause frequency in the L2 may also reflect L1 pausing behaviour (de Jong
et al., 2015; Kahng, 2020). This does not preclude certain features of utterance fluency and
their underlying cognitive processes to reflect L2-specific individual differences. For example,
clause-internal pauses are longer and more frequent in L2 than in L1 (de Jong, 2016) and
speed of lexical retrieval (Kahng, 2020) is slower in L2 than in L1. Thus, capturing the
individual variability in L2 utterance fluency uniquely attributable to cognitive processes
operating in L2 speech production requires L2-specific measures of utterance and cognitive
fluency. These can be obtained by using L1 data as a baseline, that is, by residualizing L2
utterance and cognitive measures against their corresponding L1 measures (e.g., Kahng,
2020; Segalowitz, 2016). This would help identify the cognitive predictors of individual
differences in L2 utterance fluency (e.g., speed of lexical access, articulatory skill, and
working memory) underlying L2-specific speech processing.

Recommendations for conducting research on individual differences are motivated by findings
indicating that psychological and cognitive predictor variables are not consistently associated
with measures of L2 pronunciation and speaking, and may relate differently to particular
outcome measures depending on the measurement instruments used (e.g., domain-general or
linguistically oriented) and the L2 speech dimension under investigation (e.g., phonetic
77
Joan C. Mora
accuracy or breakdown fluency). It is therefore important to consider carefully which target

predictor variables to measure and how, starting with those shown in previous research to
explain some variance in L2 speech learning. In addition, as different sources of individual
differences are likely to be related to one another, it would be convenient to include as many
predictor variables in a single study as possible. This would allow researchers to statistically
assess their joint and unique contribution while controlling for the confounding effects of
mediating factors. Finally, especially when investigating cognitive factors, obtaining L1-
adjusted (i.e., L2-specific) measures of both cognitive predictors and L2 speech measures
would allow researchers to identify relationships between predictor variables and L2 outcome
measures more precisely.
Individual differences research has important implications for L2 pedagogy, as the in-
dividual aptitude and psychological profiles of learners are likely to affect how much they
benefit from different types of learning tasks (Kissling, 2014). For example, learners with
strong cognitive control in auditory selective attention and working memory and those with
good auditory processing skills may benefit more from a pronunciation training paradigm
exposing them to L2 sounds from a variety of talkers than those with poorer attention or
speech processing skills, who may benefit more from less variable exposure or from more
explicit pronunciation training methods. Similarly, interactive tasks to develop L2 learners’
speaking skills (e.g., role-plays, narrative tasks, and oral presentations) may be detrimental
to speaking development for unmotivated L2 learners or for those with high levels of L2
speaking anxiety.
The effectiveness of foreign-language instruction on speaking and pronunciation may thus
crucially depend on how well pedagogical tasks are adapted to the individual characteristics
and skills of learners and on teachers’ ability to pay attention to learners’ individual dif-
ferences when setting up classroom activities. One possible solution for some types of
learning tasks (e.g., phonetic training of L2 sounds through discrimination tasks) is to first
test learners’ auditory attention and speech processing skills and then assign them to dif-
ferent training procedures specifically tailored to match their individual skill profiles.
Another approach is to provide training in those cognitive control and auditory processing
skills (e.g., selective attention, duration, and frequency discrimination) that have been shown
to benefit L2 pronunciation skills (e.g., Jaeggi et al., 2011). In the domain of L2 oral skills, L2
speaking anxiety levels could be reduced to effectively develop L2 speaking fluency by
training learners’ speaking skills through virtual reality applications that simulate real-life
speaking situations (Xie et al., 2019). In addition, teachers’ knowledge of their students’
individual differences profiles as well as understanding how cognitive and psychological
factors affect L2 speech development will be helpful in creating learning tasks adapted to
learners’ characteristics and in organizing classroom learning activities (e.g., pairing students
with similar language learning aptitude profiles) in ways that promote the development of
speaking skills for all learners.
7 Future Directions
The key methodological issues outlined in Parts 3, 5, and 6 suggest that future research on
aptitude and individual differences in L2 speech learning would benefit substantially from
investing efforts in extending current methodological approaches. In particular, future re-
search should consider the following: investigating the sources of individual differences
dynamically and longitudinally (Nagle, 2018), increasing L2 speech assessment consistency
and homogeneity through the use of common research-informed measures and testing
methods (Saito & Plonsky, 2019), including or controlling for inter-related experiential,
78
psychological and cognitive variables, and using both domain-general and speech-based
tasks in the assessment of cognitive aptitude constructs. In addition, the benefits of pro-
nunciation instruction (Lee et al., 2015) and pedagogical interventions aiming at developing
L2 speaking fluency (Tavakoli et al., 2016) remain under-researched from an individual
differences perspective.
Further Reading
Andringa, S., & Dąbrowska, E. (Eds.) (2019). Individual differences in first and second language
ultimate attainment and their causes. Language Learning, 69: S1.
An edited collection of articles on individual differences in language acquisition.
Granena, G., Jackson, D. O., & Yilmaz, Y. (Eds.) (2016). Cognitive individual differences in second
language processing and acquisition. Amsterdam: John Benjamins.
Covers research on cognitive individual differences in second language acquisition and ultimate
attainment.
Hansen Edwards, J. G. (2017). Pronunciation and individual differences. In O. Kang, R. Thomson, &
J. Murphy (Eds.), The Routledge handbook of contemporary English pronunciation (pp. 385–398).
A review of research on individual differences in L2 pronunciation including discussion of aptitude,
affective variables, gender and identity.
University Press.
Reviews recent research on individual differences in second language speech production focusing on the
role of working memory, attention and affective variables like anxiety and willingness to communicate.
References
Astheimer, L. B., Berkes, M., & Bialystok, E. (2016). Differential allocation of attention during speech
perception in monolingual and bilingual listeners. Language, Cognition and Neuroscience, 31,
196–205.
Baddeley, A. (1986). Working memory. Oxford: Clarendon Press.
Baker Smemoe, W. & Haslam, N. (2013). The effect of language learning aptitude, strategy use and
learning context on L2 pronunciation learning. Applied Linguistics, 34, 435–456.
Best, C. & Tyler, M. (2007). Nonnative and second-language speech perception: Commonalities and
complementaries. In O-S. Bohn, & M. J. Munro (Eds.), Language experience in second language
speech learning. In honor of James Emil Flege. (pp. 13–34). Amsterdam: John Benjamins.
Bohn, O-S., and Munro, M. J. (2007). Language experience in second language speech learning.
Amsterdam: John Benjamins.
Carroll, J. B. & Sapon, S. (1959). Modern languages aptitude. Test-Form A. New York: Psychological
Corporation.
Darcy, I., & Mora, J. C. (2016). Executive control and phonological processing in language acquisition:
The role of early bilingual experience in learning an additional language. In G. Granena, D. O.
Jackson, & Y. Yilmaz (Eds.), Cognitive individual differences in L2 processing and acquisition
(pp. 247–275). John Benjamins.
Darcy, I., Mora, J. C., & Daidone, D. (2016). The role of inhibitory control in second language
phonological processing. Language Learning, 66(4), 741–773.
Darcy, I., Park, H., & Yang, C.-L. (2015). Individual differences in L2 acquisition of English pho-
nology: The relation between cognitive abilities and phonological processing. Learning and
Individual Differences, 40, 63–72.
de Jong N. H. (2016). Predicting pauses in L1 and L2 speech: The effects of utterance boundaries and
word frequency, International Review of Applied Linguistics in Language Teaching, 54, 113–132.
de Jong, N. H., Groenhout, R., Schoonen, R., & Hulstijn, J. H. (2015). Second language fluency:
Speaking style or proficiency? Correcting measures of second language fluency for first language
behavior. Applied Psycholinguistics, 36(2), 223–243.
79
Joan C. Mora
A 7‐year study. Language Learning, 63, 163–185.
Derwing, T. M., & Munro, M. J. (2015). Pronunciation fundamentals. Evidence-based perspectives for L2
Dörnyei, Z. (2005). The psychology of the language learner: Individual differences in second language
acquisition. New Jersey: Mahwah.
Dörnyei, Z. (2006). Individual differences in second language acquisition. AILA Review, 19, 42–68.
Dörnyei, Z., & Ryan, S. (2015). The psychology of the language learner revisited. New York: Routledge.
Dörnyei, Z., & Ushioda, E. (2013). Teaching and researching motivation. Abingdon, Oxfordshire, UK:
Routledge.
Doughty, C. J. (2019). Cognitive language aptitude. Language Learning, 69, 101–126.
Flege, J. E. (1995). Second-language speech learning: Theory, findings, and problems. In Strange, W.
(Ed.), Speech perception and linguistic experience: Issues in cross-language research (pp. 229–273).
Timonium, MD: York Press.
Flege, J. E. (2009). Give input a chance! In T. Piske & M. Young-Scholten (Eds.), Input matters in SLA
(pp. 175–190). Bristol, UK: Multilingual Matters.
Flege, J. E., & Bohn, O.-S. (2021). The revised Speech Learning Model (SLM-r). In R. Wayland (Ed.),
Second language speech learning: Theoretical and empirical progress (pp. 3–83). Cambridge:
Cambridge University Press.
Flege, J. E., Munro, M. J., & MacKay, I. R. (1995). Factors affecting strength of perceived foreign
accent in a second language. The Journal of the Acoustical Society of America, 97(5), 3125–3134.
Granena, G. (2019). Cognitive aptitudes and L2 speaking proficiency: Links between LLAMA and Hi-
LAB. Studies in Second Language Acquisition, 41(2), 313–336.
Hu, X., Ackermann, H., Martin, J. A., Erb, M. , Winkler, S., & Reiterer, S. M. (2013). Language
aptitude for pronunciation in advanced second language (L2) learners: behavioural predictors and
neural substrates. Brain and Language, 127, 366–376.
Jaeggi, S. M., Buschkuehl, M., Jonides, J. and Shah, P. (2011). Short- and long-term benefits of cog-
nitive training. Proceedings of the National Academy of Sciences, 108, 10081–10086.
Juffs, A., & Harrington, M. (2011). Aspects of working memory in L2 learning. Language Teaching, 44,
137–166.
Kahng, J. (2014). Exploring utterance and cognitive fluency of L1 and L2 English speakers: Temporal
measures and stimulated recall. Language Learning, 64(4), 809–854.
Kahng, J. (2020). Explaining second language utterance fluency: Contribution of cognitive fluency and
first language utterance fluency. Applied Psycholinguistics, 41, 457–480.
Kissling, E. M. (2014). What predicts the effectiveness of foreign-language pronunciation instruction?
Investigating the role of perception and other individual differences. Canadian Modern Language
Review, 70, 532–558.
Kormos, J. (2006). Speech production and second language acquisition. Mahwah (NJ): Lawrence
Erlbaum.
Kormos, J. (2013). New conceptualizations of language aptitude in second language attainment. In G.
Granena & M. H. Long (Eds.), Sensitive periods, language aptitude, and ultimate L2 attainment.
(pp. 131–152). Amsterdam: John Benjamins.
University Press.
Lee, J., Jang, J., & Plonsky, L. (2015). The effectiveness of second language pronunciation instruction:
A meta-analysis. Applied Linguistics, 36, 345–366.
Lengeris, A., & Hazan, V. (2010). The effect of native vowel processing ability and frequency dis-
crimination acuity on the phonetic training of English vowels for native speakers of Greek. Journal
of the Acoustical Society of America, 128, 3757–3768.
Lev-Ari, S., & Peperkamp, S. (2014). The influence of inhibitory skill on phonological representations
in production and perception. Journal of Phonetics, 47, 36–46.
Li, S. (2016). The construct validity of language aptitude: A meta-analysis. Studies in Second Language
Acquisition, 38, 801–842.
Linck, J. A., Kroll, J. F., & Sunderman, G. (2009). Losing access to the native language while immersed
in a second language: Evidence for the role of inhibition in second-language learning. Psychological
Science, 20, 1507–1515.
80
Linck, J. A., Schwieter, J. W., & Sunderman, G. (2012). Inhibitory control predicts language switching
performance in trilingual speech production. Bilingualism, 15(3), 651.
Meara P (2005). LLAMA language aptitude tests. Swansea: Lognostics.
Miyake, A., & Friedman, N. P. (2012). The nature and organization of individual differences in ex-
ecutive functions: Four general conclusions. Current Directions in Psychological Science, 21, 8–14.
Mora, J. C., & Valls-Ferrer, M. (2012). Oral fluency, accuracy and complexity in formal instruction and
study abroad learning contexts. TESOL Quarterly, 46, 610–641.
Moyer, A. (2014). Exceptional outcomes in L2 phonology: The critical factors of learner engagement
and self-regulation. Applied Linguistics, 35, 418–440.
Muñoz, C. (2014). Contrasting effects of starting age and input on the oral performance of foreign
language learners. Applied Linguistics, 35(4), 463–482.
Munro, M. J. (1993). Productions of English vowels by native speakers of Arabic: Acoustic mea-
surements and accentedness ratings. Language and Speech, 36, 39–66.
Munro, M. J., & Derwing, T. M. (1995a). Foreign accent, comprehensibility, and intelligibility in the
speech of second language learners. Language Learning, 45(1), 73–97.
Munro, M. J., & Derwing, T. M. (1995b). Processing time, accent, and comprehensibility in the per-
ception of native and foreign-accented speech. Language and Speech, 38(3), 289–306.
Nagle, C. (2018). Motivation, comprehensibility, and accentedness in L2 Spanish: Investigating moti-
vation as a time‐varying predictor of pronunciation development. The Modern Language Journal,
102, 199–217.
O’Brien, I., Segalowitz, N., Freed, B., & Collentine, J. (2007). Phonological memory predicts second
language oral fluency gains in adults. Studies in Second Language Acquisition, 29, 557–581.
Pérez Castillejo, S. (2019). The role of foreign language anxiety on L2 utterance fluency during a final
exam. Language Testing, 36, 327–345.
Pérez-Ramón, R., Cooke, M., & Lecumberri, M. L. G. (2020). Is segmental foreign accent perceived
categorically? Speech Communication, 117, 28–37.
Robinson, P. (2012). Individual differences, aptitude complexes, SLA processes, and aptitude test
development. In P. Robinson & M. Pawlak (Eds.), New perspectives on individual differences in
language learning and teaching (pp. 57–75). Berlin, Heidelberg: Springer.
Saito, K. (2019a). Individual differences in second language speech learning in classroom settings: Roles
of awareness in the longitudinal development of Japanese learners’ English /ɹ/ pronunciation. Second
Language Research, 35, 149–172.
Saito, K. (2019b). The role of aptitude in second language segmental learning: The case of Japanese
learners’ English /ɹ/ pronunciation attainment in classroom settings. Applied Psycholinguistics, 40(1),
183–204.
Saito, K., & Hanzawa, K. (2016). Developing second language oral ability in foreign language class-
rooms: The role of the length and focus of instruction and individual differences. Applied
Psycholinguistics, 37(4), 813–840.
Saito, K., & Plonsky, L. (2019). Effects of second language pronunciation teaching revisited: A pro-
posed measurement framework and meta‐analysis. Language Learning, 69(3), 652–708.
Saito, K., Dewaele, J. M., & Hanzawa, K. (2017). A longitudinal investigation of the relationship
between motivation and late second language speech learning in classroom settings. Language and
Speech, 60, 614–632.
Saito, K., Kachlicka, M., Sun, H., & Tierney, A. (2020). Domain-general auditory processing as an
anchor of post-pubertal L2 pronunciation learning: Behavioural and neurophysiological in-
vestigations of perceptual acuity, age, experience, development, and attainment. Journal of Memory
and Language, 115, 104168.
Saito, K., Sun, H., & Tierney, A. (2020). Domain-general auditory processing determines success in
second language pronunciation learning in adulthood: A longitudinal study. Applied Psycholinguistics,
41(5), 1083–1112.
Saito, K., Suzukida, Y., & Sun, H. (2019). Aptitude, experience, and second language pronunciation
proficiency development in classroom settings: A longitudinal study. Studies in Second Language
Sardegna, V. G., Lee, J., & Kusey, C. (2018). Self‐efficacy, attitudes, and choice of strategies for English
pronunciation learning. Language Learning, 68, 83–114.
Segalowitz, N. (2016). Second language fluency and its underlying cognitive and social determinants.
International Review of Applied Linguistics in Language Teaching, 54, 79–95.
81
Joan C. Mora
Skehan, P. (2014). Individual differences in second language learning. Abingdon, Oxfordshire, UK:
Routledge.
Suzuki, S., & Kormos, J. (2020). Linguistic dimensions of comprehensibility and perceived fluency: An
investigation of complexity, accuracy, and fluency in second language argumentative speech. Studies
in Second Language Acquisition, 42(1), 143–167.
Tavakoli, P., Campbell, C., & McCormack, J. (2016). Development of speech fluency over a short
period of time: Effects of pedagogic intervention. TESOL Quarterly, 50, 447–471.
Teimouri, Y., Goetze, J., & Plonsky, L. (2019). Second language anxiety and achievement: A meta-
analysis. Studies in Second Language Acquisition, 41(2), 363–387.
Trofimovich, P., & Baker, W. (2006). Learning second language suprasegmentals: Effect of L2
experience on prosody and fluency characteristics of L2 speech. Studies in Second Language
Acquisition, 28(1), 1–30.
Xie, Y., Chen, Y., & Ryder, L. H. (2021). Effects of using mobile-based virtual reality on Chinese L2
students’ oral proficiency. Computer Assisted Language Learning, 34(3), 225–245.
82
6
LANGUAGE ANXIETY
Language anxiety (LA), one of the affective individual differences (IDs), has attracted the
attention of second language acquisition (SLA) researchers for at least four decades. In
SLA, the construct has been referred to as Foreign Language Classroom Anxiety, Foreign/
Second Language Anxiety or simply Language Anxiety. Here, the latter term is used.
Horwitz et al. (1986) define it as “a distinct complex of self-perceptions, beliefs, feelings,
and behaviours related to classroom learning arising from the uniqueness of the language
learning process” (p. 128). Gardner and MacIntyre (1993) add that LA may not only
accompany second language (L2) learning but also appear “when a situation requires the
use of a second language with which the individual is not fully proficient” (p. 5).
Numerous studies have examined its nature, causes, correlates and, most importantly, the
effect it has on L2/foreign language (FL) learning and use. (L2 is an umbrella term which
encompasses FL but here the terms are used interchangeably.) MacIntyre and Gardner
(1991) considered LA “one of the best predictors of success” in FL learning (p. 96). Meta-
analyses conducted recently (Botes et al., 2020; Teimouri et al., 2019) provide firm evi-
dence for the detrimental effect of LA on FL achievement. Several early (e.g., Horwitz
et al., 1986; Phillips, 1992) and more contemporary studies (Gkonou, 2017; Piechurska-
Kuciel, 2008; Tóth, 2012) have suggested that among the skills most frequently generating
anxiety in L2 users is speaking.
The conceptualization of LA has changed over the years. Since the construct concerns a
specific kind of anxiety, the general phenomenon of anxiety is first discussed from a psy-
chological perspective. Then, the construct of LA is introduced in its various phases of de-
velopment, along with its causes and correlates. What follows is an overview of studies
examining the impact of LA specifically on oral performance, which simultaneously reveals
the approaches and instruments typically deployed to investigate this matter. Then, a brief
report is presented on results of meta-analyses, a construct of pronunciation anxiety, and
examples of studies following the current trend to explore LA from a Dynamic Systems
Theory (DST) point of view. Finally, practical hints on how to lower LA in the FL classroom
are delineated, along with areas for future LA research.
DOI: 10.4324/9781003022497-8 83
Language Anxiety
Selected Early Models of Anxiety

According to Pekrun (1992), the experience of anxiety is determined by the extent to which
individuals appraise stimuli or events as threatening their egos and by their self-assessed
capacities to deal with them. Consequently, most views and models of anxiety position its
cognitive dimension as fundamental. In an early model proposed by Liebert and Morris
(1967), anxiety is presented as a bi-dimensional construct, encompassing worry and emo-
tionality. The former refers to thoughts about one’s performance and concerns related to the
consequences a failure may bring. An example is speaking in an L2, which incorporates the
attempt to understand the interlocutor, express one’s thoughts and ideas with an intelligible
accent, and at the same time to be as “genuine” as possible (Horwitz et al., 1986; Horwitz,
2017). The second component, emotionality, refers to a wide range of reactions, for example,
heart pounding or dyspepsia, which appear in response to stress (Liebert & Morris, 1967).
A few decades later, Vasa and Pine (2004) defined anxiety as a construct composed of
three dimensions: physiological, behavioural, and cognitive. The first dimension refers to
characteristic body postures, such as closed positions or self-touching, and bodily reactions,
for example, mounting perspiration, dry mouth, heart racing, blushing, dizziness, and motor
tension, including tension of the articulators. These are frequently reported by those who
experience anxiety when learning and/or using an FL (e.g., Gkonou, 2017; Şimşek &
Dörnyei, 2017). Among the behavioural symptoms of anxiety are impatience, irritability, and
actions typical for avoiding threatening situations. When FL learning is concerned, this may
include withdrawal from participation in anxiety-provoking tasks, particularly speaking
exercises, which, in the long run, may result in students’ skipping class or even dropping out
of an EFL course (Horwitz et al., 1986; Şimşek & Dörnyei, 2017). Similarly, anxious L2 users
may avoid communicative situations in naturalistic contexts (Baran-Łucarz, 2011), depriving
them of exposure and opportunities to practise speaking. As for the cognitive dimension of
anxiety, anxious individuals are constantly concerned about unsatisfactory performance and
focused on identifying potentially threatening situations and stimuli, which debilitates ef-
fective attention steering and concentration (Vasa & Pine, 2004). This typically leads to
irritability, impatience, and undue worry caused by awareness of physical discomfort and
negative projections of their performance, resulting in a vicious circle. Additionally, anxiety
inhibits all three levels of information processing, that is, the input, central processing, and
output stages (see Piechurska-Kuciel, 2008). Since Vasa and Pine’s (2004) work, anxiety
models have advanced. However, in most contemporary approaches, the cognitive aspect is
still a central component that interacts strongly with emotions.
Phases of LA Development
Explaining how the conceptualization of anxiety in SLA has changed over time, MacIntyre
(2017) suggests three phases: the Confounded Phase, Specialised Phase, and Dynamic Phase. In
the first phase, attempts were made to define and measure LA, by borrowing concepts and
instruments directly from psychology, without taking into account the specificity of the FL
learning process and context. According to Scovel (1978), some concepts were also misperceived,
which led to introducing overgeneralizations and chaos into SLA theory and research. Among
them were facilitative anxiety and debilitative anxiety, which were originally suggested to be
independent constructs, measured with the use of different scales, rather than two ends of one
continuum. The misinterpretation of concepts and misuse of instruments led to ill-formed
84
conclusions, such as the facilitative nature of anxiety in L2 learning. As Horwitz (2017) pointed
out, Scovel’s warning against applying anxiety measures directly from psychology to FL learning
situations indicated that a language-specific conceptualization of LA and an instrument to
measure it were needed. This conviction was further reinforced by the emergence of the socio-
educational model and development of an instrument – the Attitude Motivation Test Battery –
whose elements focused on anxiety experienced specifically when learning an FL (Gardner et al.,
1976). Data gathered with its use lent consistent support to the detrimental effect of anxiety on
L2 achievement and performance (MacIntyre & Gardner, 1991).
Scovel’s cautions and Gardner’s work contributed to the development of the construct of
anxiety referring specifically to L2 learning and use – Foreign/Second Language Anxiety – and
an instrument to measure it, introduced by Horwitz et al. (1986) in a pioneering article. This
marked the beginning of the Specialised Phase (MacIntyre, 2017), whose name derives from
the fact that L2 anxiety was treated as a language-specific construct. It is a phase dominated by
cross-sectional studies, focusing on exploring the correlates and causes of LA and its effects on
L2 achievements. Important developments of this stage are referenced in the next part.
Finally, a new approach – the Dynamic Approach – to research LA has emerged in the so-
called Dynamic Phase (MacIntyre, 2017). The arrival of the new trend results from the po-
pularity of the Complex Dynamic Systems Theory (Larsen-Freeman, 1997). Based on its
main principles, LA is considered to fluctuate over time, rising and falling not only within
years and months, but also within minutes. This is because it is connected to several con-
textual variables and IDs of L2 learners. The assumption has led to the application of more
classroom-oriented research methodologies, with longitudinal preferred over cross-sectional,
ethnographic research, design-based research, mixed-method research, and action research.
Examples of such studies are presented in further subsections.
The Nature and Conceptualization of LA

LA is a type of social anxiety (Horwitz et al., 1986; MacIntyre & Gardner, 1989), which
accompanies those who fear “social interactions in which one can be observed by others” due
to projecting their performance as leading to “embarrassment and humiliation” (King &
Smith, 2017, p. 91). Attempts have been made to classify LA as a trait (a permanent like-
lihood to experience anxiety irrespective of situation), a state (an emotional reaction to a
situation at a particular moment) or situation-specific (a personal predisposition to experi-
ence apprehension in specific contexts) anxiety. In the Confounded Phase, LA was believed
to have a trait-like habitual nature, being experienced typically by those who have a dis-
position to experience general anxiety. However, since the Specialized Phase, Horwitz et al.
(1986) have stressed emphatically that LA is not simply general anxiety, but a situation-
specific anxiety type (Horwitz et al., 1986; MacIntyre & Gardner, 1989, 1991). Recently,
Horwitz (2017) characterized a person experiencing LA as “having the trait of feeling state
anxiety when participating in (or sometimes even thinking about) language learning and/or
use” (p. 33). Dewaele (2007, pp. 405–406) explains that LA is “probably situated half-way
between trait, situation-specific anxiety and state, more sensitive to environmental factors
than personality traits and yet more stable than states since it remains relatively stable across
languages.” More recent studies (Şimşek & Dörnyei, 2017) have found a strong link between
general and language anxiety (r = .60; p <.001), suggesting that although LA is indeed a
unique anxiety, individuals with high levels of trait anxiety might be particularly prone to
experience apprehension evoked by FL learning and use.
85
Language Anxiety
When conceptualizing LA, Horwitz and her colleagues (1986) considered American
students’ comments regarding anxiety-generating experiences in their Spanish and French
classes. The participants explained that “speaking the language aloud, frequent testing
and fear of being negatively evaluated by their teachers and peers” were particularly
stressful (Horwitz, 2017, p. 15). Consequently, it was hypothesized that LA may be related
to such types of anxiety identified earlier by psychologists as communication apprehen-
sion, test anxiety, and fear of negative evaluation. Communication apprehension refers to
the general fear of communicating with others (mainly orally), that is, transferring a
message effectively to the interlocutor, making oneself understood (McCroskey, 1984).
Test anxiety denotes high levels of stress experienced before, during, or after test taking,
with the typical physiological and behavioural symptoms of anxiety and its cognitive
limitations (Sarason, 1981). Finally, fear of negative evaluation refers to threat-related
thoughts caused by assuming that one’s performance will be negatively assessed by others
(Watson & Friend, 1969). It is important to stress that Horwitz et al. (1986) in their
seminal paper did not suggest LA is directly composed of these three anxiety types in
different proportions; instead, it was claimed that LA is “only analogous” to them
(Horwitz, 2017, p. 33).
To measure levels of LA, a 33-item self-report questionnaire, the Foreign Language
Classroom Anxiety Scale (FLCAS), was designed (Horwitz et al., 1986), with items referring
to apprehension experienced specifically while learning and using an L2, “evidenced by ne-
gative performance expectancies and social comparisons, psycho-physiological symptoms,
and avoidance behaviours” (Horwitz et al., 1986, p. 559). It is this instrument, and adapted
shorter versions, that has been most frequently employed by SLA researchers exploring LA
(Botes et al., 2020; Teimouri et al., 2019). Studies have shown that LA is not simply a sum of
the three anxiety types mentioned above (e.g., Horwitz et al., 1986; Aida, 1994). More re-
cently, Park’s (2014) research led to the conclusion that communication apprehension either
in isolation or working in tandem with other factors (e.g., confidence) constitutes the “core
component of the FLCAS” (Horwitz, 2017, p. 36). As Horwitz claims, data at our disposal
seem neither to support the tripartite model nor its unitary form, leaving the nature of LA
still shrouded in mystery, evidence of its high complexity and ambiguity.
Causes and Correlates of LA

While some studies examined the nature of LA, others focused on identifying causes of the
phenomenon, suggesting various classifications. Among the first attempts at determining the
sources of LA was that of Young (1991), who isolated “six potential sources of language
anxiety” (p. 427). These can be categorized as internal (from the learner) or external (from
outside factors) (Szyszka, 2017). The most frequently researched internal causes of LA
mentioned by Young (1991), are personal and interpersonal anxieties. They include students’
beliefs about themselves as FL learners, that is, self-images and self-evaluations concerning
their general academic skills and abilities to master and use an FL. Numerous studies have
shown that negative self-perceptions regarding general academic skills and self-worth (e.g.,
Onwuegbuzie et al., 1999; Stroud & Wee, 2006) or L2 competence and learning abilities (e.g.,
Baran-Łucarz, 2011; Gardner & MacIntyre, 1993, Gkonou, 2013; Stroud & Wee, 2006;
Szyszka, 2011) lead to high levels of LA. Taking a different perspective, Sparks and
Ganschow (1991), in their Linguistic Coding Differences Hypothesis, attribute LA not to
self-perceived incompetence but to factual deficiencies encountered specifically in L1 lan-
guage encoding. Some studies (e.g., Piechurska-Kuciel, 2008) provide support for this claim.
However, Horwitz (2000) and MacIntyre (1995) explain that LA and aptitude are
86
independent constructs, though LA “may influence or be influenced by performance of the

language learner” (Botes et al., 2020, p. 36).
Another category of internal causes of LA are student beliefs about the effectiveness of
FL learning and teaching. These derive from one’s prior learning experiences, and are shaped
by family background, personality, and culture. LA emerges when the expectations students
hold about themselves as L2 learners (e.g., their belief in being able to achieve an FL accent)
cannot be obtained or when their ideas concerning effective teaching clash with the approach
used by their FL teacher (Young, 1991).
Externally driven causes of LA include factors such as instructor beliefs about effective FL
teaching; instructor–learner interactions; classroom procedures; and language testing (Young,
1991). As already signalled, LA can occur when the teacher’s and learner’s perceptions of
effective teaching procedures differ. However, apprehension also arises when criticism dom-
inates over praise; an unsupportive manner of error correction is applied (Gkonou, 2013);
teacher talk is overused (Young, 1991); lockstep is more frequent than pair and group work;
and when the student performs in front of the class (e.g., Price, 1991; Young, 1990). Finally,
FL apprehension is raised by formal test-taking, particularly when the format of the test is
unknown or ambiguous (e.g., Young, 1990). It is important to remember that true causality
can only be determined on the basis of experimental designs (MacIntyre, 2017). These, how-
ever, are difficult to conduct and are thus rarely undertaken by researchers.
Correlational research identifying IDs linked to LA has contributed to our understanding
of LA (an extensive list of studies examining IDs as correlates of LA can be found in
Dewaele (2017). Several studies in this vein have been conducted by Dewaele (e.g., 2002,
2007, 2013, 2017; Dewaele & Dewaele, 2017), finding significant relationships between LA
and personality dimensions (e.g., extraversion, neuroticism, and perfectionism), age, gender,
social class, tolerance of ambiguity, number of known languages, proficiency level, and
enjoyment experienced in the FL, with effect sizes ranging from small to medium. Finally, in
the heuristic pyramid model of L2 willingness to communicate (L2 WTC) (MacIntyre et al.,
1998), anxiety is situated among enduring factors, together with L2 self-perceived commu-
nicative competence. These are suggested to work in tandem, determining the level of L2 self-
confidence, and, in turn, L2 WTC.
LA and Oral Performance

As indicated earlier, anxiety influences all three stages of information processing: it has a
detrimental effect on speech perception, understanding, and production (MacIntyre &
Gardner, 1991). At the input stage, worrisome thoughts impede intake, by limiting the ca-
pacity to notice elements of speech (sounds, prosody, and words), and to select parts of the
input for further processing and encoding. Operations on the already restricted intake are
further inhibited by anxiety in the central processing stage, involving short-term, long-term,
and working memory. Inaccurately perceived sequences of sounds, held in short-term
memory, may trigger inappropriate mental categories stored in long-term memory. This may
lead to incorrect interpretations of the message, coordinated by the working memory.
Finally, anxiety affects both quality and fluency of performance, with threat-related thoughts
that debilitate the retrieval and control of chunks, lexical items, grammatical structures and
pronunciation. Moreover, as Szyszka (2017) stresses, the tension of articulators interferes
with appropriate production of segments, which, together with distorted prosody, may result
in more accented and, in some cases, less intelligible speech.
Numerous studies have verified the devastating effects of anxiety on L2 speech processes.
These generally use quantitative cross-sectional designs, exploring the correlation between
87
Language Anxiety
oral performance and LA. To examine the link, LA was most often identified with the
FLCAS or a modified version, while speaking capabilities were measured with tasks such as
role-plays, discussions or spontaneous speech. Quantitative outcomes were often supple-
mented with qualitative data gathered via surveys or semi-structured interviews. Sometimes
triangulation was used (e.g., Liu, 2006), that is, data were gathered with a written ques-
tionnaire filled out by students, verified by teacher’s reflections and comments, and further
supplemented with observations of anxious and non-anxious students in the FL classroom.
As anxiety influences all stages of information processing, it can have a particularly
detrimental effect on L2 pronunciation. Data corroborate this assumption. Horwitz et al.
(1986) reported that high anxiety students complained about having problems with
“discriminating the sounds (…) of a target language” (p. 126). Participants in Derwing and
Rossiter’s (2002) study observed that their pronunciation deteriorated when they felt
stressed. Young (1991) concluded that students were aware their pronunciation was impaired
when talking in front of the teacher. These self-observations support Scovel’s (1978) claim
that “high language anxiety experienced when speaking causes stiffness of muscles, which in
turn results in a learner’s poor pronunciation” (Szyszka, 2017, p. 83). Moreover, students’
awareness of poor pronunciation may make them try even harder, generating more anxiety
and leading to poorer articulation, and eventually to a vicious circle (see Gregersen &
Horwitz, 2002).
Further evidence of the connection between LA and pronunciation was demonstrated by
Szyszka (2011) and Baran-Łucarz (2011), who found moderate negative correlations between
learners’ self-perceived pronunciation competence and their levels of LA. Both studies reveal
medium effect sizes, explaining 21%–24% of variance in LA. Baran-Łucarz (2013) observed a
moderate negative relationship between scores on a pronunciation attainment test and results
of a skill-specific Phonetics Learning Anxiety Scale. The connection was verified further by a
t-test showing that the pronunciation of more anxious students was significantly worse than
that of less anxious individuals. Finally, in a more recent study, Szyszka (2017) observed that
high and low LA students differ significantly in the range of pronunciation learning strategies
and tactics they deploy.
The level of LA also affects other aspects of oral performance. Phillips (1992) recorded
young adult learners of French as an FL on two tasks – free speech and a role-play. Their
performance was transcribed, assessed and correlated with their level of LA measured with
the FLCAS. Measures were made of the average length of communication units (CUs)
(indicating syntactic complexity), and the percentage of words in CUs (indicating lexical
proficiency). The study showed a negative correlation of moderate strength between the
degree of LA and general level of oral performance, which can be considered a medium effect
size, accounting for 16% of variance in LA. The outcomes corroborate the hypothesis that
anxious individuals have limited working memory capacity and difficulties retrieving in-
formation from long-term memory. Further qualitative data – reflections of highly anxious
participants – provided support for this hypothesis; they “reported feeling frustrated, pa-
nicked, and apprehensive” having forgotten words they had learned but could not recall
while speaking (Szyszka, 2017, p. 101). The research was replicated in two later studies
(Stephenson , 2006; Hewitt & Stephenson, 2012) with similar results.
A negative relationship of moderate strength was also observed by Park and Lee (2005),
who designed their own measure, adopting the instrument of Aida (1994) and the FLCAS, in
which some items represented anxiety, while others tested L2 confidence and motivation. The
study revealed that the more anxious the students were, the less rich their lexis and grammar
and the less fluent their speech. Moreover, the participants with higher levels of LA used
fewer communication strategies and had more limited social skills.
88
Rich data appear in Tóth (2012). By applying a translated version of the FLCAS, she
discriminated among high and low LA Hungarian adults, who performed three oral tasks
with a native speaker: an introductory interview, a conversation about a controversial topic,
and an interpretation of a picture. Mann–Whitney U tests showed statistically significant
differences between the oral performance of low and high LA participants in task perfor-
mance, communication effectiveness, grammar accuracy and range, lexical correctness and
range, fluency, and appropriate pronunciation/intonation use.
Finally, it is worth considering a study conducted by Piechurska-Kuciel (2008), who
found a strong negative relationship between participants’ L2 self-perceptions and their le-
vels of LA, measured with a Polish translation of the FLCAS. The effect size was the highest
(large) in the case of self-assessed speaking skills. Similarly, Kitano (2001) and Subaşi (2010),
using a Japanese version of the FLCAS and three self-rating scales, found that the higher the
LA, the lower the students’ self-perceived level of English in comparison to that of their
classmates. The correlation was particularly high, revealing a large size effect, in the case of
self-rated pronunciation. The vast body of research reported in this subsection leaves no
doubt that LA affects L2 oral performance negatively.
LA and L2 Achievement
As noted earlier, among important recent contributions to literature on LA are two meta-
analyses – Botes et al. (2020) and Teimouri et al. (2019), which investigate the strength of
evidence for the connection between LA and FL achievement. The results of both confirm the
negative relationship between the two variables, with the correlations achieved in numerous
studies revealing a mean of r = −.36 (k = 105; N = 19,933) and r = −.39 (k = 59;
N = 12,585), respectively, and LA accounting for 13%–15% of variance in learners’ L2
achievement. Comparing outcomes of meta-analyses of other correlates of success in FL
learning, Teimouri et al. (2019) found aptitude and motivation to have more impact on per-
formance than LA, which was, in turn, followed by working memory. Together the four IDs
explained 58% of the variance in FL achievement. It is interesting that although participants of
numerous studies (Gkonou, 2017; Horwitz et al., 1986; Phillips, 1992; Piechurska-Kuciel, 2008;
Tóth, 2012) confessed that speaking is the most anxiety-generating L2 skill, in neither meta-
analysis did oral production reveal the highest effect size. More specifically, Teimouri et al.
(2019, p. 15) found that “listening and writing anxiety showed much larger effects than reading
or speaking anxiety.” Similarly, Botes et al. (2020) reported moderately large correlations be-
tween FLCAS scores and L2 writing and listening achievements and moderate correlations in
the case of reading. This time speaking achievement had the lowest mean effect size (r = −.26; k
= 16; N = 1,745). The researchers, however, draw attention to the high heterogeneity of cor-
relations found in the studies focusing on speaking, which may indicate that the relationship is
“exacerbated or impeded by other factors such as general public speaking anxiety”(p. 46). It
seems that the high dispersion of effect sizes in the case of oral production may be explained
also by various criteria chosen for assessing this FL skill.
The Concept of Pronunciation Anxiety

Intensive research on apprehension experienced when learning and using an L2 resulted in
identifying several language-skill-specific types of anxiety, that is, speaking, listening (com-
prehension), writing, reading, and grammar anxiety. Interestingly enough, although speaking
89
Language Anxiety
is often acknowledged by students to be one of the most anxiety-provoking skills, and the
source of LA frequently comes from the belief that one has “a terrible accent” (Price, 1991,
p. 105), it was not long ago that a construct of Pronunciation Anxiety (PA) was proposed
(Baran-Łucarz, 2014a). In conceptualizing this anxiety type, I relied on my observations
from teaching pronunciation, comments of my students regarding the sources of their ap-
prehension related to L2 pronunciation and its learning (Baran-Łucarz, 2011; 2013), and
earlier studies showing a connection between L2 pronunciation and identity (e.g., Guiora,
1972; Walker, 2011). I defined PA as a “multidimensional construct referring to the feeling of
apprehension and worry experienced by non-native speakers in oral-communicative situa-
tions […] in the classroom and/or natural contexts, deriving from their negative/low self-
perceptions, beliefs and fears related specifically to pronunciation,” whose occurrence is
evidenced by typical cognitive, physiological, and behavioural symptoms of anxiety (Baran-
Łucarz, 2017, p. 109). I further advocated that the concept has four antecedents that interact
dynamically. These are (1) fear of being negatively viewed by interlocutors, classmates, or
teachers due to pronunciation; (2) pronunciation self-efficacy and self-assessment based on
comparisons made to classmates or interlocutors; (3) pronunciation self-image – the con-
ception of one’s aural and visual appearance while speaking the target language (TL) and
one’s readiness to accept the image; and (4) beliefs related to TL pronunciation concerning
the difficulty of the TL phonological system for speakers of a particular L1, the importance
of pronunciation for communication, and attitudes towards the general sound/particular
aspects of TL pronunciation.
Using an instrument to measure PA in English as an FL classroom setting in Poland
(Baran-Łucarz, 2014a), I found PA to be strongly correlated (r = −.60; p <.0005) with L2
WTC. The results of this analysis were further verified by a t-test, showing a statistically
significant difference between the general level of WTC of high and low PA students
(t = −7.828; p < .0005) (Baran-Łucarz, 2014b), with the former being less eager to speak than
the latter. The tests showed a large (in the case of correlation) and very large effect size (in the
case of the t-test, computed with Hedges’ g, which equalled 1.51) (Cohen, 1988).
Further studies conducted with a newer version of the Measure of Pronunciation Anxiety, in
both classroom and naturalistic settings (Baran-Łucarz, 2017), revealed a negative moderate
relationship between PA and motivation, represented by ideal pronunciation L2 self. This
study also lent support to the construct validity of the PA concept, demonstrating that low and
high PA students differ in their attitudes towards aspects of the TL sound system. For ex-
ample, in pictures indicating associations evoked by interdentals and post-alveolars, low PA
students drew positive images, such as dandelions flying in the wind. In contrast, a learner
displaying the highest PA level sketched images of a gravestone with an inscription “RiP -
Me,” a tongue with explanations “trapped between the teeth” and “dead tongue,” a crying face
with an inscription “tears,” and a sad face with a label “dead eyes” (Baran-Łucarz, 2017,
p. 121). The feelings displayed via pictures were further supported by expressions related to the
aforementioned sounds, with the low PA participants providing adjectives such as “delicate,”
“subtle,” “refined,” and “warm” (Baran-Łucarz, 2017, p. 123), and the high PA students
writing “heavy,” “strange,” “stupid,” “very unnatural,” “difficult,” “nobody speaks like that,”
“crippled,” and “childish,” (Baran-Łucarz, 2017, p. 121). Clearly, PA appears to be a highly
complex construct, whose nature and role in FL learning is worth further exploration.
The Dynamic Approach

Here, I examine a few contemporary studies focusing on the dynamically evolving inter-
connections between the learner and context(s), for instance, the research of Gregersen et al.
90
(2014). Following an idea from an earlier qualitative study (Gregersen & Horwitz, 2002), a few
high and low LA teacher trainees were video-recorded while delivering a presentation. They
wore a heart rate monitor recording their peaks and falls in anxiety. Then, they viewed their
video-recorded performance and rated their levels of anxiety while speaking, with the use of
McIntyre’s (2012) free access idiodynamic programme. Finally, in interviews, the participants
explained the sources of their anxiety falls and peaks. Thus, it was possible not only to observe
that anxiety fluctuated during oral performance, but also to identify its causes. Furthermore,
the study showed that spikes of anxiety can be traced among low LA students. It turned out
that the anxiety of such learners might result from the convergence of several factors, which
stimulate physiological, cognitive, emotional and behavioural systems of anxiety.
The dynamic perspective in researching LA was also explored by Gregersen et al. (2017).
Here, the anxiety self-ratings of the same low and high LA participants from Gregersen et al.
(2014) were compared to anxiety detections of a peer and an expert on non-verbal behaviour
and emotional intelligence, who used the same idiodynamic procedure as the speakers.
Additionally, three conditions for detecting LA in the participants’ performance were of-
fered, that is, visual only, audio only, and combined audio and visual channels. The good
news for anxious learners, who fear their anxiety will be easily decoded by interlocutors,
classmates or teachers, is that not all the spikes of anxiety were identified by the observers.
The most salient symptoms were “their hands, faces, eyes, bodies and voices” (Gregersen
et al., 2017, p. 129), which they can learn to control. The research is valuable also for its
pedagogical implications. Since tracing LA is not as simple as suggested earlier, it appears
necessary to train FL teachers to recognize anxiety in their learners, as well as anxiety-
generating situations, which could, in turn, help in developing remedies.
Another project rooted in the dynamic perspective is that of Gkonou (2017). This study
examines the causes of LA by referring to Brofenbrenner’s (1993) nested ecosystems model.
First, the researcher identified highly anxious students with the use of the FLCAS, who wrote
weekly diary entries and were then interviewed. The data revealed that the LA experienced
by learners in actual FL classrooms (microsystem) were interrelated with prior learning
experiences (mesosystem), shaped by local attitudes towards learning English represented by
teachers (exosystem), which were in turn determined by the Greek FL education system
(macrosystem). The researcher concluded that it is important for teachers to be aware that
their students are not tabula rasas and that their negative emotions may be rooted in wider
contexts, shaped by several external factors exposed to earlier.

Negative effects of LA on FL learning and performance suggest the need to equip FL tea-
chers with competence on the LA phenomenon and practical skills for dealing with it ef-
fectively in their classrooms. Due to space limitations, this subsection discusses only the most
basic pedagogical principles (for additional practical advice, refer to Gregersen et al., 2014;
Gregersen et al., 2017; Horwitz, 2017; Oxford, 2017).
Oxford (2017) proposes that interventions borrowed from traditional psychology can be
applied in FL classrooms, such as exposure therapy, modelling, social skills training, or
relaxation techniques. She also strongly recommends fighting negative emotions with posi-
tive ones, such as “curiosity,” “joy,” “pride,” or “satisfaction” (p. 182). She encourages
teachers to increase flow, intrinsic motivation and agency in L2 learners, and to develop
various aspects of their emotional intelligence. Empirical data, however, are needed to verify
Oxford’s recommendations. On the other hand, Rubio-Alcala (2017) stresses the importance
of training learners to use strategies to lower their LA low, such as searching for social
91
Language Anxiety
support or individual strategies relieving stress and helping control emotional reactions.
Şimşek and Dörnyei (2017) encourage students to verbalize their negative emotions in
constructive narratives and to introduce explicit discussions in FL classrooms on negative
emotions and ways of dealing with them.
Although more tangible data are needed on the effectiveness of introducing FL teaching
approaches, procedures, and techniques to keep anxiety low, it should be helpful to rely on the
information above regarding the nature, correlates of LA, and particularly the externally driven
classroom-based causes of LA. First and foremost, however, it is vital to follow the most basic
recommendations proposed by all LA specialists, that is, acknowledge the existence of LA (e.g.,
Gkonou, 2017), and make the learning environment as friendly, supportive, and encouraging as
possible (e.g., Horwitz, 2017). As Gregersen et al. (2017) put it, “it is better to assume the
presence of anxiety and build in classroom “safety nets” such as supportive interactive en-
vironments and effective error correction, than to miss negative affect when it is present and risk
its negative effects” (p. 131). Moreover, besides making lessons as pleasant as possible, it is
important to foster positive self-perceptions of FL learners and change their frequent unrealistic
FL learning expectations, such as achieving nativelike pronunciation.
Remembering the interplay among internal and external factors which determine anxiety,
teachers should be aware that not all remedies work equally well for all anxious students in
all contexts. However, showing students understanding, acceptance, concern, and readiness
to help overcome their fears is always a good starting point in eliminating negative feelings,
which debilitate performance and discourage learning and risk-taking in L2 usage.
6 Future Directions
Despite the fact that LA has attracted the attention of SLA researchers for decades, there are
still several theoretical and practical questions to be answered. What requires deeper ex-
ploration is the very nature of LA, with its antecedents and correlates (Horwitz, 2017; Şimşek
& Dörnyei, 2017). Although cross-sectional studies are needed, they should be supplemented
with qualitative research designs. Qualitative data can be gathered through narratives (e.g.,
Şimşek & Dörnyei, 2017), interviews, and classroom observations, supported with the use of
idiodynamic software (MacIntyre, 2012). The latter can be particularly helpful in shedding
light on LA as a dynamic phenomenon experienced by learners of different cognitive, affective,
personality profiles and in determining which strategies and external remedies are more ef-
fective for which learners. More experimental designs are needed to help verify the “causal
connections between language anxiety and performance” (MacIntyre, 2017, p. 23). Moreover,
the nature of language-specific anxieties is still rather tentative. What seems interesting to
examine is which learner profiles are more prone to experience which language-specific anxi-
eties. Also underexplored is the role of LA at different stages of learning, its effect across age
groups, including young learners and older adults, and differences in intensity, causes and
effects on FL learning and use among different cultural groups. To better know how to control
LA, further investigations are needed of the relationships and dynamics of LA with regard to
identity, motivation, WTC, mindfulness, enjoyment, engagement, autonomy, boredom, and
silence. Crucially, classroom-based research should be extended to naturalistic settings. Studies
on fluctuations of LA among learners of different profiles and cultures, using L2 in authentic
communicative situations are clearly missing. Finally, future research could focus on teachers’
anxiety, its causes and relationship with burnout and teachers’ well-being, and the potential
connection between teacher and learner anxiety in the FL classroom (Gkonou et al., 2017).
92
Further Reading
Gkonou, C., Daubney, M., & Dewaele, J.-M. (Eds.). (2017). New insights into language anxiety: Theory,
research and educational implications. Bristol: Multilingual Matters.
This book contains chapters authored by well-known experts on LA, who revisit the concept, report on
their studies, and offer pedagogical interventions.
Gregersen, T., & MacIntyre, P. (2014). Capitalizing on language learners’ individuality. From premise to
practise. Bristol: Multilingual Matters.
The first chapter introduces readers to the theoretical underpinnings of LA and several practical
classroom activities to lessen its negative effects on L2 learning.
Szyszka, M. (2017). Pronunciation learning strategies and language anxiety. Cham, Switzerland:
Springer International Publishing.
This book explores the relationship between LA and pronunciation learning strategies. It offers a
comprehensive overview of empirical studies of LA, with a special focus on its effect on speaking and
pronunciation.
References
Aida, Y. (1994). Examination of Horwitz, Horwitz, and Cope’s construct of foreign language anxiety:
The case of students of Japanese. Modern Language Journal, 78(2), 155–168.
Baran-Łucarz, M. (2011). The relationship between language anxiety and the actual and perceived
levels of FL pronunciation. Studies in Second Language Learning and Teaching, 1(4), 491–514.
Baran-Łucarz, M. (2013). Phonetics learning anxiety – Results of a preliminary study. Research in
Language, 11(1), 57–79. doi:102478/v10015-012-0005-9
Baran-Łucarz, M. (2014a). The link between pronunciation anxiety and willingness to communicate in
the foreign-language classroom: The Polish EFL context. Canadian Modern Language Review, 70(4),
445–473. doi:103138/cmlr.2666
Baran-Łucarz, M. (2014b, June). The link between pronunciation anxiety and willingness to commu-
nicate in and outside the FL classroom. Paper presented at Psychology and Language Learning
Conference, Graz.
Baran-Łucarz, M. (2016). Conceptualizing and measuring the construct of pronunciation anxiety.
Results of a pilot study. In M. Pawlak (Ed.), Classroom-oriented research (pp. 39–56). Berlin,
Heidelberg: Springer. doi:101007/978-3-319-30373-4_3
Baran-Łucarz, M. (2017). FL pronunciation anxiety and motivation: Results of a preliminary mixed-
method study. In E. Szymańska-Czaplak, M. Szyszka & E. Kuciel-Piechurska (Eds.), At the
crossroads: Challenges in FL learning (pp. 107–133). Heidelberg & New York: Springer International
Publishing. doi:101007/978-3-319-55155-5_7
Botes, E., Dewaele, J.-M., & Greiff, S. (2020). The foreign language classroom anxiety scale and
academic achievement: An overview of the prevailing literature and a meta-analysis. Journal for the
Psychology of Language Learning, 2, 25–56.
Brofenbrenner, U. (1993). Ecological models of human development. In M. Gauvain & M. Cole (Eds.),
Readings on the development of children (pp. 37–43). New York: Freeman.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. New York: Routledge.
Derwing, T. M., & Rossiter, M. J. (2002). ESL learners’ perceptions of their pronunciation needs and
strategies. System, 30, 155–166.
Dewaele, J.-M. (2002). Psychological and sociodemographic correlates of communicative anxiety in L2
and L3 production. The International Journal of Bilingualism, 6, 23–39.
Dewaele, J.-M. (2007). The effect of multilingualism, sociobiographical, and situational factors on
communication anxiety and foreign language anxiety of mature language learners. The International
Journal of Bilingualism, 11, 391–409.
Dewaele, J.-M. (2013). The link between foreign language classroom anxiety and psychoticism, ex-
traversion, and neuroticism among adult bi- and multilinguals. The Modern Language Journal,
97(3), 670–684. doi:101111/j.1540-4781.2013.12036.x
Dewaele, J.-M. (2017). Are perfectionists more anxious foreign language learners and users? In C.
Gkonou, M. Daubney & J.-M. Dewaele (Eds.), New insights into language anxiety: Theory, research
and educational implications (pp. 70–90). Bristol: Multilingual Matters.
93
Language Anxiety
Dewaele, J.-M., & Dewaele, L. (2017). The dynamic interactions in foreign language classroom anxiety
and foreign language enjoyment of pupils aged 12 to 18. A pseudo-longitudinal investigation.
Journal of the European Second Language Association, 1(1), 12–22.
Gardner, R. C., Smythe, P. C., Clement, R., & Gliksman, L. (1976). Second language learning: A social
psychological perspective. The Canadian Modern Language Review, 32(3), 198–213.
Gardner, R. C., & MacIntyre, P. D. (1993). A student’s contribution to second-language learning.
Part II: Affective variables. Language Teaching, 26, 1–11.
Gkonou, C. (2013). A diary study on the causes of English language classroom anxiety. International
Journal of English Studies, 13(1), 51–68.
Gkonou, C. (2017). Towards an ecological understanding of language anxiety. In C. Gkonou, M.
Daubney & J.-M. Dewaele (Eds.), New insights into language anxiety: Theory, research and educa-
tional implications (pp. 135–155). Bristol: Multilingual Matters.
Gkonou, C., DeWaele, J-M. & Daubney, M., (Eds.) (2017). New insights into language anxiety: Theory,
research and educational implications (pp. 217–223). Bristol: Multilingual Matters.
Gregersen, S., & Horwitz, E. (2002). Language learning and perfectionism: Anxious and non-anxious
language learners’ reactions to their own oral performance. Modern Language Journal, 86(4),
562–570. doi:101111/1540-4781.00161
Gregersen, T., MacIntyre, P., & Meza, M. (2014). The motion of emotion: Idiodynamic case studies of
learners’ foreign language anxiety. Modern Language Journal, 89(2), 574–588. doi:101111/j.1540-
4781.2014.12084.x
Gregersen, T., MacIntyre, P., & Olson, T. (2017). Do you see what I feel? An idiodynamic assessment of
expert and peer’s reading of nonverbal language anxiety cues. In C. Gkonou, M. Daubney & J.-M.
Dewaele (Eds.), New insights into language anxiety: Theory, research and educational implications
(pp. 110–134). Bristol: Multilingual Matters.
Guiora, A. (1972). Construct validity and transpositional research: Toward an empirical study of
psychoanalytic concepts. Comprehensive Psychiatry, 13(2), 139–150.
Hewitt, E., & Stephenson, J. (2012).Foreign language anxiety and oral exam performance: A re-
plication of Phillips’s MLJ study. The Modern Language Journal, 96(2), 170–189. 10.1111/
j.1540‐4781.2011.01174.x
Horwitz, E. K. (2000). It ain’t over’til it’s over: On foreign language anxiety, first language deficits, and
the confounding of variables. The Modern Language Journal, 84(2), 256–259. doi:101111/0026-
7902.00067
Horwitz, E. K. (2017). On the misreading of Horwitz, Horwitz and Cope (1986) and the need to balance
anxiety research and the experiences of anxious language learners. In C. Gkonou, M. Daubney &
J.-M. Dewaele (Eds.), New insights into language anxiety: Theory, research and educational im-
plications (pp. 31–47). Bristol: Multilingual Matters.
Horwitz, E. K., Horwitz, M., & Cope, J. A. (1986). Foreign language classroom anxiety. Modern
Language Journal, 7, 125–132.
King, J., & Smith, L. (2017). Social anxiety and silence in Japan’s tertiary foreign language classroom.
In C. Gkonou, M. Daubney & J.-M. Dewaele (Eds.), New insights into language anxiety: Theory,
research and educational implications (pp. 91–109). Bristol: Multilingual Matters.
Kitano, K. (2001). Anxiety in the college Japanese language classroom. The Modern Language Journal,
85(4), 549–566
Larsen-Freeman, D. (1997). Chaos/complexity science and second language acquisition. Applied
Liebert, R. M., & Morris, L. W. (1967). Cognitive and emotional components of test anxiety: A dis-
tinction and some initial data. Psychological Reports, 20, 975–978. doi:102466/pr0.1967.20.3.975
Liu, M. L. (2006). Anxiety in Chinese EFL students at different proficiency levels. System, 34, 301–316.
MacIntyre, P. D. (1995). How does anxiety affect second language learning? A reply to Sparks and
Ganschow. The Modern Language Journal, 79(1), 90–99. doi:101111/j.1540-4781.1995.tb05418.x
MacIntyre, P. D. (2012). The idiodynamic method: A closer look at the dynamics of communication
traits. Communication Research Reports, 29(4), 361–367.
MacIntyre, P. D. (2017). An overview of language anxiety research and trends in its development. In C.
Gkonou, M. Daubney & J.-M. Dewaele (Eds.), New insights into language anxiety: Theory, research
and educational implications (pp.11–30). Bristol: Multilingual Matters.
MacIntyre, P. D., & Gardner, R. C. (1989). Anxiety and second language learning: Towards a theo-
retical clarification. Language Learning, 39(2), 251–275.
94
MacIntyre, P. D., & Gardner, R. C. (1991). Methods and results in the study of anxiety in language
learning: A review of the literature. Language Learning, 41, 85–117.
MacIntyre, P. D., Clément, R., Dornyei, Z., & Noels, K. A. (1998). Conceptualizing willingness to
communicate in a L2: A situational model of L2 confidence and affiliation. Modern Language
Journal, 82(4), 545–562.
McCroskey, J. C. (1984). Communication competence. The elusive construct. In R. N. Bostrom (Ed.),
Competence in communication: A multidisciplinary approach (pp. 259–268). Beverly Hills, CA: SAGE
Publications.
Onwuegbuzie, A. J., Bailey, P., & Daley, C. E. (1999). Relationship between anxiety and achievement at
three stages of learning a foreign language. Perceptual and Motor Skills, 88, 1085–1093.
Oxford, R. (2017). Anxious language learners can change their minds: Ideas and strategies from tra-
ditional psychology and positive psychology. In C. Gkonou, M. Daubney & J.-M. Dewaele (Eds.),
New insights into language anxiety: Theory, research and educational implications (pp.177–197).
Park, G. P. (2014). Factor analysis of the foreign language classroom anxiety scale in Korean learners
of English as a foreign language. Psychological Reports, 115, 261–275.
Park, H., & Lee, A. R. (2005). L2 learners’ anxiety; self-confidence and oral performance. In
Proceedings of the 10thconference of the Pan-Pacific association of applied linguistics (pp. 107–208).
Edinburgh University, August 2005. Retrieved from http://www.paaljapan.org/resources/
proceedings/PAAL10/pdfs/hyesook.pdf
Pekrun, R. (1992). Expectancy-value theory of anxiety: Overview and implications. In D. Forgays &
T. Sosnowski (Eds.), Anxiety: Recent developments in cognitive, psychological and health research
(pp. 23–39). Washington, DC: Hemisphere.
Phillips, E. (1992). The effects of language anxiety on students’ oral test performance and attitudes. The
Modern Language Journal, 76(1), 14–26.
Piechurska-Kuciel, E. (2008). Language anxiety in secondary grammar school students. Opole:
Wydawnictwo Uniwersytetu Opolskiego.
Price, M. L. (1991). The subjective experience of foreign language anxiety: Interviews with highly
anxious students. In E. K. Horwitz & D. J. Young (Eds.), Language anxiety: From theory and
research to classroom implications (pp. 101–108). Upper Saddle River, NJ: Prentice Hall.
Rubio-Alcala, F. D. (2017). The Links Between Self-Esteem and Language Anxiety and Implications
for the Classroom. In C. Gkonou, M. Daubney & J.-M. Dewaele (Eds.), New insights into language
anxiety: Theory, research and educational implications (pp.198–216). Bristol: Multilingual Matters.
Sarason, I. C. (1981). Test anxiety, stress, and social support. Journal of Personality, 49, 101–114.
Scovel, T. (1978). The effect of affect on foreign language learning: A review of the anxiety research.
Language Learning, 28(1), 129–142.
Şimşek, E., & Dörnyei, Z. (2017). Anxiety and L2 self-images: The ‘anxious self’. In C. Gkonou, M.
Daubney & J.-M. Dewaele (Eds.), New insights into language anxiety: Theory, research and educa-
tional implications (pp. 51–69). Bristol: Multilingual Matters.
Sparks, R. L., & Ganschow, L. (1991). Foreign language learning differences: Affective or native
language aptitude differences? Modern Language Journal, 75(1), 3–16.
Stephenson Wilson, J. T. (2006). Anxiety in learning English as a foreign language: Its associations with
student variables, with overall proficiency, and with performance on an oral test. Doctoral dis-
sertation, University of Granada. Retrieved from http://hera.ugr.es/tesisugr/16235290.pdf
Stroud, C., & Wee, L. (2006). Anxiety and identity in the language classroom. Regional Language
Centre Journal, 37(3), 299–307.
Subaşi, S. (2010). What are the main sources of Turkish EFL students in oral practice? Turkish Online Journal
of Qualitative Inquiry, 1(2), 29–49. Retrieved from https://core.ac.uk/download/pdf/27171558.pdf
Szyszka, M. (2011). Foreign language anxiety and self‐perceived English pronunciation
competence.Studies in Second Language Learning and Teaching, 1(2), 283–300. https://doi.org/
10.14746/ssllt.2011.1.2.7
Teimouri, Y., Goetze, J., & Plonsky, L. (2019). Second language anxiety and achievement: A meta-
analysis. Studies in Second Language Acquisition, 41(2), 489.
Tóth, Z. (2012). Foreign language anxiety and oral performance: Differences between high- vs. low-
anxious EFL students. Language, 10(5), 1166–1178.
Vasa, R. A., & Pine, D. S. (2004). Neurobiology in anxiety disorders in children and adolescents. In
T. R. Morris & J. S. March (Eds.), Anxiety Disorders in Children and Adolescents (pp. 3–26). New
York: Guilford Press.
95
Language Anxiety
Walker, R. (2011). Teaching the pronunciation of English as a lingua franca. Oxford: Oxford University
Press.
Watson, D., & Friend, R. (1969). Measurement of social-evaluative anxiety. Journal of Consulting and
Clinical Psychology, 33, 448–457.
Young, D. J. (1990). An investigation of students’ perspectives on anxiety and speaking. Foreign
Language Annals, 23, 539– 553.
Young, D. J. (1991). Creating a low-anxiety classroom environment: What does language anxiety
research suggest? The Modern Language Journal, 75, 426–439.
96
PART II
Research Issues
7
SPEAKING RESEARCH
METHODOLOGIES
Charles Nagle, Tracey M. Derwing, and Murray J. Munro
Speaking is an intricate activity which involves cognitive skills (e.g., memory, lexical re-
trieval) (see De Bot, this volume), articulation skills, interaction skills, and culturally de-
termined pragmatic knowledge, all in real time. It is even more complicated in a second
language (L2) because speakers may experience interference from their first language (L1),
and learners, at least, have gaps in knowledge that present them with additional commu-
nication challenges. Given this complexity, researchers from several disciplines, including
linguistics, psychology, and applied linguistics, have examined different aspects of speaking,
each employing methodology from their own fields of study. In all instances, however, the
key issues to consider when posing a research question investigating the nature of speaking
are the type of data collection to employ and the analysis of the resulting data.
The overall design of an experiment can be described as belonging to one of several di-
chotomies. Second language acquisition research was predominantly quantitative in the early
stages; researchers looked for patterns in acquisition which might later inform teaching (see
Dulay & Burt, 1974). Quantitative approaches have developed considerably and now often
involve highly sophisticated statistical techniques. More recently, researchers adopted qua-
litative methods, which document linguistic experiences in a non-numerical way (e.g., so-
ciocultural approaches, see Surtees & Duff, this volume). Another design dichotomy is
observation versus intervention (see discussion of observation tasks in the Historical
Perspectives part). Observation allows researchers to gauge the degree to which an L2
speaker has mastered a given speech variable. In longitudinal studies, systematic observation
can record naturalistic development. Intervention tasks typically involve measuring learners’
facility with an aspect of speech, providing instruction over a set period of time, and then re-
measuring their performance. To determine whether changes are a result of instruction, an
uninstructed control group (learners who share similar performance on the speech variable in
question at the outset of the study) is included, and the performance of the two groups is
compared after the intervention. If the experimental group’s speech is significantly improved
over that of the control group, it is safe to assume that the instruction was effective.
A further methodological dichotomy is cross-sectional versus longitudinal studies. Cross-
sectional research offers a snapshot of speaking performance at a specific point in time. For
example, Baker & Trofimovich (2005) examined age effects on the acquisition of English
DOI: 10.4324/9781003022497-10 99
Charles Nagle et al.
vowels by Korean speakers by sampling vowels from early and late bilingual speakers and
comparing the two. In contrast, a longitudinal study follows the linguistic development of
speakers over an extended period of time. It therefore allows close inspection of differences in
individual learning trajectories, rather than just comparisons of group means. Of course,
even longitudinal studies that focus solely on means do not capture individual differences.
Some longitudinal studies involve groups of learners who perform tasks over time (e.g.,
Derwing & Munro, 2013), and others take the form of introspective, single subject studies,
such as Leopold’s (1949) diary of the development of English and German in his daughter,
Hildegard.
Spoken corpora provide another source of data available to researchers (see Huensch &
Staples, this volume). Despite their limitations, existing corpora can be utilized to examine
grammatical, lexical and phonological features of speech, sometimes along with pragmatics.
Once research questions have been determined, the researcher must decide on the most
appropriate data to address those questions. Sometimes an existing corpus will suffice, but
often researchers design or adapt tasks to elicit spoken language samples. At this point, it is
important to look for congruence among the questions, the tasks, and the resulting data. For
instance, if fluency (in the sense of the flow of language, extent and placement of pausing,
and hesitation forms) is the dimension of interest, a read-aloud task is not an appropriate
choice, because it does not reflect the work that a speaker must do in natural speech, such as
retrieving suitable vocabulary items, making grammatical decisions, and determining fitting
prosody. This work is already done for the speaker who is presented with a passage to read;
thus, these speaking elements, which all affect fluency, are eliminated, making such data of
limited interest.
Once the data are collected, they must be coded in some way. In early studies, for instance,
the occurrences of particular features such as grammatical morphemes, interactional stra-
tegies, and politeness cues were categorized and counted. In pronunciation studies, listeners
are often employed to give scalar ratings of speech samples for speech dimensions such as
comprehensibility, fluency, and accentedness. More recently, listeners have been asked to
rate speakers dynamically, reacting to elements of a speech sample as they occur, rather than
providing a single scalar rating (Nagle et al., 2019). In a novel approach to measuring L2
anxiety, Gregersen et al. (2014) conducted a study in which learners wore heart monitors and
rated their anxiety in real time, with follow-up retrospective interviews.
The earliest research on L2 speaking was often based on findings from first language (L1)
acquisition studies. In a classic, longitudinal, observational enquiry of three children, Brown
(1973) proposed that children shared similar patterns in the L1 development of English
grammatical morphemes. Dulay and Burt (1974) created the Bilingual Syntax Measure
(BSM), a conversational assessment tool, to elicit grammatical morphemes from L1 Spanish
speakers learning English to compare L2 learners with Brown’s L1 data. Using a cross-
sectional design with three groups of learners at different proficiency levels, they elicited
language from children by showing them cartoons and asking questions, the correct answers
to which required the use of particular morphemes. As researchers adopted the BSM for use
in other populations (e.g., adult learners) the methodology of contrasting L1 longitudinal,
naturalistic development with cross-sectional designs was questioned. Rosansky (1976), for
instance, raised such concerns and argued for longitudinal L2 studies of spontaneous speech.
Rather than compare two languages to predict errors that students may make, as in
contrastive analysis (Lado, 1957), the advocates of error analysis (e.g., Corder, 1967) posited
100
Speaking Research Methodologies
that an examination of learners’ actual errors would be more informative. Throughout the
1970s and 1980s, error analyses were conducted, often with listeners judging the severity of
the errors (both spoken and written). However, in 1974, Schachter’s article An Error in Error
Analysis illustrated that non-occurrence of an error does not necessarily mean that a learner
has mastered a particular form; rather it can indicate avoidance of a form that a learner finds
difficult. Although Schachter’s study was based on written passages, it had implications for
speaking as well. Error analysis provides no insight into aspects of language that do not
appear in an L2 speaker’s speech.
Another methodology to investigate L2 speaking development, a case study, was carried
out by Schmidt and Frota (1986). Schmidt kept a diary, outlining observations of his own
learning of Portuguese in addition to recording his own conversations. His linguist co-
author, a native speaker of Portuguese, later analyzed noun and verb phrases in the re-
cordings, noting changes over time. Schmidt took 50 hours of language classes, studied on
his own (especially grammar), and participated in an active social life while in Brazil, giving
him ample opportunities to interact with others. As for the value of formal instruction, the
authors concluded that Schmidt “learned and used what he was taught if he subsequently
heard it and if he noticed it” (p. 279), italics in the original. This is a prime example of the
power of a diary study, in that it resulted in an important theoretical concept, the noticing
hypothesis, that remains influential today.
Conversational analysis (CA), which is also rooted in L1 research, has been used to ex-
amine L2 speaking (e.g., Sacks et al., 1974). CA not only examines the linguistic forms in a
conversation, but also, more importantly, it takes into account interactional behaviours.
Although much of L2 research focuses on speakers, CA considers both interlocutors, and
documents such phenomena as how turn-taking is managed and how repairs are made when
a communication breakdown occurs. Typically, a conversation is recorded and then pains-
takingly transcribed and examined for social patterns. For a comprehensive review, see
Kasper and Wagner (2014).
Another approach to studying L2 speaking acquisition emerged in the 1970s, this time
borrowing from studies of Caregiver Speech (CS) in L1. It was posited that the adjustments
parents make for their young children have a facilitating effect on their offspring’s lin-
guistic development. Ferguson (1975) proposed that foreigner talk (FT), the adjustments
native speakers make for lower proficiency L2 speakers, played a similar role to CS. Like
CA, FT studies examined both sides of the equation in a conversation, but the focus of
analysis was primarily linguistic. Long (1983) argued that native speakers make a range of
adjustments in negotiating meaning with an L2 learner and that these interactional ad-
justments (such as clarification requests, confirmation checks, and paraphrase) are em-
ployed to “avoid conversational trouble and to repair the discourse when trouble occurs”
(p. 131). For instance, in the Find the Difference task that follows, two interlocutors each
had similar pictures, but were required to find six differences through talking only. The
native speaker (NS) took the lead, describing his picture, including the word “chimney,”
which the non-native speaker (NNS) did not know. Using paraphrase, by describing the
function and physical properties of a chimney, the NS was able to assist the learner to
assign an English noun to an object.
NS: I have a house and a tree. NNS: Mmhmm. NS: And uh, the house has a door.
NNS: Yeah. NS: And a window, NNS: Mmhmm. NS: And a chimney. NNS:
Chimney? NS: For the fireplace? On the roof? You know the roof of the house?
NNS: Yeah. NS: And then there’s a chimney? NNS: Yeah. NS: Made of brick?
NNS: Oh, that chimney! NS: Chimney. Do you have a chimney? NNS: Yeah, I
101
don’t have. NS: You don’t have a chimney. NNS: Yeah, I don’t have. (unpublished
data, T. M. Derwing)
Fillmore first brought attention to four distinct types of fluency in L1 in 1979 (reprinted in
2000), the first of which is “the ability to talk at length with few pauses” (p. 51). This is the
type of fluency most often studied in L2 speaking research; it is typically measured by ex-
amining elements of dysfluency, such as pause placement and length, mean length of run
(number of words between pauses), hesitations, repetitions, and false starts. These physical
measures correlate with human listeners’ scalar judgements of fluency, although there is not a
one-to-one relationship (Derwing et al., 2004).
L2 speaking has also been studied in terms of learner affective variables. Test instruments
have been used to measure language anxiety (Horwitz et al., 1986), and speaking strategies
have been explored using interviews, observation, and three types of verbal report: self-
revelation (think-alouds), self-observation, which entails either immediate introspection or
retrospection (e.g., watching a video of oneself speaking), and self-report (Cohen, 2014).
Cohen also offers a comprehensive discussion of the benefits and disadvantages of structured
versus semi-structured interviews. MacIntyre et al. (1998) explored the concept of
Willingness to Communicate, which has been investigated using several different approaches.
Sociocultural studies (Surtees & Duff, this volume) tend to use a wide variety of quali-
tative methods to examine L2 and typically take into account the context in which inter-
actions occur to examine power relations and other connections to the speakers’ identities.
Video and audio recordings, interviews, field notes, mappings, and other data are collated to
bring together a nuanced understanding of a participant’s communication development.
Collection of Speech Data

A fundamental step in all speaking research is to obtain a suitable sample of spoken material –
virtually always recorded – from participants. As observed by Mackey and Gass (2016), there
is no “correct” approach to this step. Rather, the choice depends on several considerations that
balance advantages and drawbacks. Doughty and Long (2000) provide an overview with
sources for about 20 different elicitation techniques. Typical procedures (see Table 7.1) fall on
a continuum from highly controlled elicitations to extemporaneous speech (i.e., unrehearsed
speech elicited by a researcher), with a tradeoff between experimenter control and ecological
validity. A key consideration here is the “Observer Paradox,” according to which the most
realistic productions are likely when the speaker is unaware of being monitored for language
behaviour and therefore makes no output adjustments due to being observed. However, truly
Table 7.1 Elicitation types according to researcher’s degree of control
Maximal Intermediate Minimal
Reading Monologic narratives: Interactive tasks: one- Public domain interviews in

aloud danger of death; picture way, two-way, jigsaw, which language is not a
or film description role-plays focus
Repetition Interviews
Picture Classroom recordings Self-recordings throughout
naming the day (e.g., smartphone)
102
“spontaneous” speech (speech that occurs in a natural discourse without prompting by a re-
searcher) is essentially non-existent in L2 research because of the ethical requirement of in-
formed consent from participants. Perhaps the nearest approximation is illustrated by de
Leeuw (2019), who avoided ethics concerns by analyzing public domain news recordings in a
longitudinal study of tennis player Stefanie Graf’s L2 English speech. Of course, Graf knew
she was being recorded, though not for linguistic reasons.
Applied linguists often use monologic tasks which, though not spontaneous, are designed
to minimize the impact of the observer. One example is the “danger-of-death” narrative,
developed by sociolinguists in the 1970s (see Labov, 1972), in which the speaker orally re-
counts a life-threatening experience (Oyama, 1976). Deep engagement with an emotional
topic is thought to distract the speaker’s attention from linguistic form. Closely related,
though not as emotionally charged, are other monologic activities including picture de-
scriptions, personal narratives, and oral summaries of videos. Given limited control by the
researcher, these are thought to elicit relatively natural speech, but a significant drawback is
that the speaker may not produce the particular vocabulary, grammatical structures or
phonological patterns relevant to the goals of the investigation. Consequently, they are most
beneficial in studies in which specific language structures are not at issue, as in fluency re-
search or global speech analysis (e.g., intelligibility or comprehensibility).
To examine speakers’ use of specific language structures, elicitation procedures must be
more tightly constrained. At the most controlled end of the continuum are read-aloud and
simple oral repetition tasks, in which the linguistic content is fully predetermined. Although
such tasks are still used, they have serious drawbacks. Because both entail relatively un-
common types of speech acts in normal human communication, their outcomes may not be
generalizable to typical speaking performance. Aside from this aspect of ecological validity,
reading aloud may require the speaker to use unfamiliar vocabulary and may yield spelling
pronunciations even of known words. Repetition may yield speech that is heavily shaped by
the characteristics of the model and may therefore fail to represent the speakers’ capabilities.
To some degree, these issues are addressed with modified tasks. In vocabulary and pro-
nunciation research, for example, target lexis items might be elicited from pictures without
orthographic representations. As well, some types of morpho-syntactic knowledge can be
tapped through descriptions of pictures or videos in which responses involve grammatical
forms in obligatory contexts. An alternative to simple repetition is delayed repetition, in
which the speaker’s auditory memory is disrupted by a pause of several seconds (Trofimovich
& Baker, 2006) or an intervening sound or a task (such as counting aloud to 10) before the
repetition is produced. The delay is believed to give more natural performance because it
requires greater processing and coding of the speech material than does immediate repetition.
Because so much human communication is interactional rather than monologically based,
the evaluation of interactional speech material is fundamental in the study of speaking.
Though their methods differ, both quantitative and qualitative researchers have used in-
teractional material effectively. An important source of quantitative data has been classroom
observations focusing on how learning takes place, rather than on specific language content.
Some studies, for instance, have explored teacher–student interactions to establish how
teachers provide corrective feedback and how it is received (Lyster & Ranta, 1997; see Goo,
this volume). Another line of work examines language behaviour during student–student
interactions in task-based learning (e.g., Foster & Skehan, 1996). Still other research ex-
amines the types of adjustments native speakers make in response to L2 speech (Long, 1983).
In qualitative studies, interactions serve as a rich source of information. Surtees and Duff
(this volume), for instance, point to the benefits of micro-level analysis from naturalistic
video and audio recordings obtained in classroom and work environments, and in informal
103
social settings, such as the dinner table. Some researchers have gained insights into learners’
interactive experiences using the introspective technique known as stimulated recall (Gass &
Mackey, 2016), in which research participants consider a video or audio recording of their
own interactions and provide commentary on their experiences at the time of the original
activity.
An important development in recent years is the growing availability of online spoken
corpora (see also Huensch & Staples, this volume.) The Talkbank (2020) project, for in-
stance, consists of multiple repositories covering a wide range of categories including con-
versational materials, clinical recordings, child language data and speech from bilinguals and
second language learners. These can be accessed and assessed using a variety of software
tools included in the project. The Dutch corpus, JASMIN-CGN (Cucchiarini et al., 2008), is
a useful model for corpora; it contains orthographically and phonemically transcribed re-
cordings and part-of-speech tagging of speech produced by children, seniors, L2 Dutch
learners, and has been used for multiple research purposes.
Approaches to Analysis
The range of approaches for analyzing speech data is wide, encompassing transcription,
expert data coding, acoustic measurements, and naive listener ratings. Furthermore, using a
combination of approaches yields deeper insights into speech material than does a single
approach.
The first step in studies of less-controlled elicitation types such as narratives and inter-
actions is transcription, typically in standard orthography. The details in transcription are
determined by the dependent variables at issue and the type of coding planned. Although
researchers may choose not to employ formal transcription conventions for their own pur-
poses, these are an essential part of speech corpora and other data-sharing situations.
Coding of transcribed material requires expertise, and must be checked for reliability,
often through independent coding of the materials by one or more coders in addition to the
main one. Coding may entail identifying, classifying, and tagging parts of speech, gram-
matical errors, particular speech acts, and grammatical structures such as phrases, clauses,
and t-units.
Acoustic analyses are usually performed using software that displays speech waveforms
and other representations. In fluency research, for example, durational data, measured in
milliseconds, may be obtained on filled and unfilled pauses, runs, repetitions, and false starts.
While software allows automation of some such measurements (De Jong & Wempe, 2009),
results must generally be checked for accuracy and reliability. More detailed acoustic ana-
lyses of the type used in pronunciation research are performed with analysis software such as
Praat (Boersma & Weenink, 2020). Among other measurements, these include vocal pitch,
vowel formant frequencies, and temporal properties of consonants such as voice onset time
(VOT). On the one hand, acoustic data often shed light on listeners’ perceptions. For in-
stance, speaking rate measurements can predict fluency ratings. On the other, caution must
be exercised in interpreting such data because there is rarely a straightforward relationship
between the acoustic events in speech and listeners’ perceptions of those events. It is possible,
for instance, to find measurable differences between utterances that are not perceivable by
listeners, and therefore have no relevance to communication. Small differences in VOT in
stop consonants, for instance, might be reliably measured yet imperceptible.
Naive listener ratings have played a role in the analysis of L2 speech for several decades.
Listeners may listen to audio files and then assign scalar ratings on such dimensions as social
appropriateness (pragmatics research), fluency, comprehensibility, and fluency. In sociolinguistic
104
studies, such scales may cover perceived personal characteristics of the speakers, such as their
friendliness, teaching ability, competence, or assertiveness. Protocols for rating studies require
careful administration, including appropriate scale sizes and labelling, speech samples of suitable
duration and content, controlled listening conditions, and checks on inter- and intra-rater
reliability.

Many of the critical issues in speaking research are shared by other sub-fields in applied
linguistics. In particular, the question of validity, whether it pertains to constructs, mea-
surement, generalizability, or other issues, raises many current questions. Other concerns
relate to terminological conventions, data gathering, individual speaker variability, and
measurement techniques.
Some researchers struggle with constructs that remain relatively poorly understood, have
been considered in too narrow a fashion, or have been defined in multiple, inconsistent ways.
At times, the need arises to redefine constructs created for one research context to apply them
in another. To illustrate these problems, we can consider points raised in this volume and
elsewhere. For instance, Schmid (this volume) comments on the problem in L1 attrition
research of defining what actually counts as an “error” in speech output. Another example is
given by Mok (this volume), who observes the absence of any fully acceptable model of
intonation for cross-linguistic analysis. In her discussion of vocabulary research, Horst (this
volume) stresses the importance of placing the focus not simply on individual words but on
multi-word chunks as well. (See also Peters, this volume.) Finally, Lyster and Tedick (this
volume) discuss the importance of the concept of “oracy,” both as a means of capturing a
particular range of skills and strategies, and as a foundation for literacy development.
With respect to data gathering, Huensch and Staples (this volume) address the problem of
balancing the experimenter’s control over elicited content with speech naturalness. They
propose combining experimental with corpus techniques to offset the drawbacks of each. In
the area of pragmatics research, Bardovi-Harlig (2018) has remarked on a fundamental issue
in the use of discourse completion tasks (DCTs): the need for matching modalities. She
points to the faulty assumption in some research that written DCT results correspond to
speaking behaviour. In a related vein, Thomson and Derwing (2015) emphasize that, despite
its continued use, reading aloud cannot generally be considered a valid means of eliciting true
“speaking.”
Several issues associated with individual variability in L2 attainment are covered by Mora
(this volume). Of course, individual differences appear in every area of SLA, and despite a
tendency to dismiss them as merely “noise in the data,” doing so may obscure important
insights. Given the multifarious influences on acquisition, an important problem is the degree
to which it is possible to isolate predictors of such variation and to quantify their con-
tributions. Often it may be necessary to consider these in interaction with one another, rather
than assume an independent effect of each predictor on an observed outcome, and to justify
the rationale for considering certain predictors in the first place.
Issues in measurement arise in many areas of speaking research. Kahng (this volume) lists
a variety of temporal measurements used in fluency studies and discusses research designed
to relate these to listeners’ perceptions. However, some of the pertinent research must be
viewed with scepticism because of the serious problem of multicollinearity, which arises when
multiple predictor variables are closely related to one another. With respect to compre-
hensibility measurement, Trofimovich et al. (this volume) discuss new approaches to the
dynamic assessment of the construct, along with a move to measurement during interactional
105
rather than exclusively monologic speech. Some of the challenges raised by the application of
new technologies in automated assessment of speaking are discussed by Iwashita (this vo-
lume). While noting their administrative advantages she comments on the limitations of the
unidirectional evaluation of speaking on which it typically focuses.

Technology is revolutionizing how we conduct L2 speaking research. At the most basic level,
the ubiquitous smart phones that everyone carries serve as excellent recording devices, which
have been used to obtain natural encounters between native speakers and L2 learners
(Surtees, 2013). At the other extreme, the advent and expansion of online data collection
platforms offers the opportunity to take a big data approach to L2 speech. Amazon
Mechanical Turk has already been used to recruit participants for linguistic research, in-
cluding L2 speech studies (Nagle & Rehman, 2021). Online data collection can connect re-
searchers with participant groups to whom they might not otherwise have had access, and
online tools may even help researchers recruit more interesting, if not more representative,
samples. Current work has focused on recruiting listeners to rate speech samples, but there is
no reason that speaking data cannot be collected online. Doing so will allow researchers to
recruit large-scale cross-sectional samples that can enrich current research on topics such as
how L1 background affects L2 speaking (expanding from current research that includes
groups from a few different L1 backgrounds to future work including dozens of different L1
backgrounds) and the importance of learning context (comparing speakers in rural and
urban settings, in monolingual and multilingual communities, and so on). In fact, the ability
to adopt a multisite design is one of the primary strengths of an online approach. Thus,
whereas current researchers working in two cities could collaborate to examine samples in
their respective contexts, through online data collection researchers could examine samples
from many contexts. Online tools might be useful for longitudinal data collection as well, in
which case researchers could examine L2 speech learning over years of L2 study or im-
mersion, a design that often proves difficult to carry out in face-to-face research.
Appropriate quality control measures must be taken when collecting data online. For L2
speaking research, this may mean including attention catch trials where participants are given
specific instructions so that inattentive responders (i.e., individuals who do not follow the
instructions) can be identified and screened from the data. Another necessary step is to ensure
that participants record themselves in a quiet environment with a microphone of sufficient
quality. Here, as part of the online task, participants could be directed to an online microphone
test, which could even be included as a screening condition. For example, Amazon Mechanical
Turk allows researchers to create and award study-specific qualifications. In this way, speakers
could complete the microphone test, receive the corresponding qualification, and advance to
the experimental portion of the research. In short, effectively using online tools demands
creativity and flexibility. Although researchers will lose some oversight over study procedures,
they stand to gain data sets that will allow them to ask and answer the types of complex
research questions that have the potential to reshape the state of the art.
Researchers need tools that enable them to process and code speaking data quickly and
accurately. The reality is that L2 speaking research, from processing recordings, to tran-
scription, to detailed linguistic analysis, can be time consuming. Fortunately, automated
speech measurement tools have progressed to the point where they can be successfully used
with L2 speech. Moreover, these tools cover different facets of L2 speaking, including au-
tomated phonetic measurement (Goldrick et al., 2020), fluency measurement (De Jong &
Wempe, 2009), and lexical concordancing (Cobb, 2020). Researchers also need appropriate
106
data analysis tools. L2 speaking research, regardless of its scope, often generates complex,
hierarchical data sets that include speakers and observations (e.g., individual sounds, words,
sentences, monologues, and sometimes interlocutors in conversation). It has been common
practice to average over different facets of the data to compute single-measure averages (e.g.,
speaker averages) that are necessary for ANOVA and other statistical tests. Likewise, when
two facets of the data were of interest (e.g., speakers and items, speakers and listeners),
separate analyses have been conducted. These analyses make it difficult to consider how
various dimensions of the data interact, that is, how variables nested within different facets
of the data influence L2 speaking outcome measures. Researchers are increasingly embracing
statistical tests such as mixed-effects modelling that allow them to estimate within a single
analysis a wide range of effects. These techniques also enable researchers to evaluate the
extent to which any given effect varies for the speakers, items, and listeners in the sample. By
using these techniques, researchers can gain a far more accurate and nuanced portrait of
speaking-related phenomena.

One of the most important recommendations we can make for speaking research is to match
the method to the research question and to interpret findings in light of the specific method
used. Studies asking developmental questions should use longitudinal methods, and studies
examining one specific aspect of L2 speech learning should avoid overgeneralizing results to
broader domains. L2 speaking research is a complex undertaking that must be studied from
multiple perspectives using diverse methods. There is a growing recognition of open practices
in SLA research. Language Learning has implemented Registered Reports (Marsden et al.,
2018), Studies in Second Language Acquisition has introduced a Methods Forum (Gass &
Plonsky, 2020), and increasingly SLA journals encourage researchers to make their tasks,
protocols, and data available through online repositories such as OSF and IRIS (Marsden
et al., 2016). Each of these initiatives aims at greater transparency and rigour in data col-
lection and analysis. Registered reports flip the review process by relocating peer review to
the proposal stage, before data collection. Authors submit a proposal, including hypotheses
and predictions, complete methodological details, and an analysis plan and receive feedback
from experts. By offering front-end feedback on the potential impact of the proposed idea
and design, registered reports help researchers to make an informed decision about whether
they should move forward with their proposal and, if so, what methodological issues they
should address to assure study quality. They also ensure that researchers clearly distinguish
between planned and exploratory analyses. The Methods Forum and repositories share the
common goal of equipping researchers with empirically validated and reliable tasks and tools
(and in the case of the forum, novel methods with which researchers may not be familiar).
They also lay the groundwork for successful replication studies by giving researchers access
to the tasks and protocols used in previous work.
These and other initiatives are transforming the way L2 speaking is researched. By using
publicly available vetted tasks, we ensure methodological parity and, by extension, maximum
comparability among conceptually related studies. This, in turn, promotes meaningful
synthesis, leading to an accumulation of knowledge on a given topic. Because of the growing
emphasis on pre-validation and open practices, we can also crowdsource the best materials,
protocols, and analyses by submitting and receiving feedback on pre-registered proposals
and by collaborating with researchers from a range of disciplinary backgrounds. It is our
experience that innovative ideas and methods often arise out of such collaborations. The
availability of online spoken learner corpora offers an opportunity to examine different
107
aspects of L2 speaking development in the same learner sample from different perspectives.
Once a large body of work is accumulated on these samples, interesting comparisons of
grammatical, lexical, pronunciation, and pragmatic development will be possible. Of course,
as we avail ourselves of open tools, tasks, and analyses, so too must we ensure that our
methods are as transparent and replicable as possible by publishing our own tasks, data sets,
and analyses whenever feasible.
We also encourage collaboration between researchers and teachers. Teachers often have
an intuitive sense of areas of special difficulty for their students and the factors that play a
role in predicting achievement. Researcher–teacher collaborations can help to bridge the gap
between research and practice by designing classroom studies that are both empirically sound
and ecologically valid. To do so, researchers may have to let go of some of the traditional
control mechanisms that they have in the laboratory. For instance, a focus on whole
classrooms often prevents random assignment of students into control and experimental
groups. Despite the problems of interpretation that may result, these circumstances reflect
the reality of the learning context in which many teachers work.
7 Future Directions
The interdisciplinary scope of L2 speaking research demands multidimensional theoretical
and methodological approaches. As the state of the art evolves, bringing with it new research
questions, so too must methods evolve to address them. We offer three methodological re-
commendations to enhance the breadth and depth of L2 speaking research. First, more work
is needed on languages other than English. To date, the majority of studies have focused on
L2 English, and the few studies that have investigated other L2s have mostly recruited L1
English speakers. Thus, as a research community, we have ignored a substantial proportion
of language learners worldwide who are not L1 English speakers and who choose to learn an
L2 other than English. As we expand the languages we investigate, we should also strive for
greater diversity in the social backgrounds of research participants and aim especially for
non-academic samples (e.g., Andringa & Godfroid, 2020).
We also need more longitudinal studies, particularly research examining speakers over
more than three data points and over longer periods. We can gain a more nuanced under-
standing of the long-range dynamics of L2 speaking through multiwave longitudinal re-
search. For example, Derwing and Munro (2013) were able to provide unique insights into
issues such as non-linear change over time and individual developmental trajectories because
they examined their speakers over a 7-year period. Another example is Huensch and Tracy-
Ventura’s work on foreign language learners’ fluency development before, during, and after
study abroad (2017), including follow-up 4 years after learners’ study abroad experience
(Huensch et al., 2019). This research addresses complex questions related to the role of in-
dividual differences in rate of change and L2 maintenance over many years. There is also a
need for dense, multiwave research conducted over shorter timeframes, as well as dynamic
methods focussing on system-level change and emphasizing the interconnectedness of ele-
ments (e.g., Hiver & Al-Hoorie, 2019).
Finally, we should strive for a multifaceted view of L2 speaking, which includes linking
different types of L2 speaking measures over time and various tasks using robust quantitative
and qualitative methods. This also calls for working at the interface of different subdomains
of SLA using diverse methods, for instance, Ruivivar and Collins’ (2018) work on the in-
terplay between non-native accents and spoken grammar. To a certain extent, we are already
making progress towards a multidisciplinary, multimethod approach to L2 speaking. As L2
108
speaking researchers, even though we may approach questions from different perspectives,
we are interested in the same fundamental issues: What are the components of L2 speaking?
How do those components relate to listeners’ impressions of L2 speech, and how does L2
speaking relate to global communicative competence? How do L2 learners become compe-
tent L2 speakers? Ultimately, all our theoretical perspectives and methodological skillsets
will be needed to understand L2 speaking in its complexity.
Further Reading
Burns, A. (2017). Research and the teaching of speaking in the second language classroom. In E. Hinkel
(Ed.), Handbook of research in second language teaching and learning (pp. 242–256). Routledge.
Burns addresses cognitive and affective aspects of L2 speaking. She summarizes key differences between
spoken and written discourse and discusses the contributions discourse analysis can make to L2
speaking pedagogy.
Day, R. R. (Ed.) (1986). Talking to learn: Conversation in second language acquisition. Rowley MA:
Newbury House.
A collection of classic papers foundational to our current understanding of L2 speaking.
Mackey, A., & Gass, S. (2015). Second language research: Methodology and design (2nd edn). London:
Routledge.
An in-depth analysis of the research process, addressing topics such as study design, data collection and
analysis, and reporting. The authors review a wide range of data collection instruments with illustrative
examples of the data the instruments generate.
Plonsky, L. (Ed.) (2015). Advancing Quantitative Methods in Second Language Research. London:
Routledge.
An edited volume containing both a critical appraisal of basic quantitative principles such as descriptive
statistics, p values, and effect sizes, as well as targeted summaries of advanced statistical techniques.
Chapters include step-by-step instructions on how to carry out analyses, drawing upon examples from
published research.
References
Andringa, S., & Godfroid, A. (2020). Sampling bias and the problem of generalizability in applied
linguistics. Annual Review of Applied Linguistics, 40, 134–142.
Baker, W., & Trofimovich, P. (2005). Interaction of native-and second-language vowel system (s) in
early and late bilinguals. Language and Speech, 48(1), 1–27.
Bardovi-Harlig, K. (2018). Matching modality in L2 pragmatics research design. System, 75, 13–22.
Boersma, P. & Weenink, D. (2020). Praat: doing phonetics by computer [Computer program]. Version
6.1.21, retrieved 20 August, 2020 from http://www.praat.org/
Brown, R. (1973). A first language: The early stages. Cambridge, MA: Harvard University Press.
Cobb, T. (2020). Compleat lexical tutor, v. 8.3. Accessed 28 September, 2020 at https://www.lextutor.ca
Cohen, A. D. (2014). Strategies in learning and using a second language (2nd edn). London & New York:
Routledge.
Corder, S. P. (1967). The significance of learners’ errors. International Review of Applied Linguistics, 5,
160–170.
Cucchiarini, C., Dreisen, J., Van Hamme, H., & Sanders, E. (2008). Recording speech of children, non-
natives and elderly people for HLT applications: The JASMIN-CGN corpus. Proceedings of the 6th
international conference on language resources and evaluation, LREC 2008 (pp. 1445–1450).
De Jong, N. H., & Wempe, T. (2009). Praat script to detect syllable nuclei and measure speech rate
automatically. Behavior Research Methods, 41(2), 385–390.
de Leeuw, E. (2019). Native speech plasticity in the German-English late bilingual Stefanie Graf: A
longitudinal study over four decades. Journal of Phonetics, 73, 24–39.
A 7-year study. Language Learning, 63(2), 163–185.
Derwing, T. M., Rossiter, M. J., Munro, M. J., & Thomson, R. I. (2004). L2 fluency: Judgments on
different tasks. Language Learning, 54, 655–679.
109
Doughty, C. J., & Long, M. H. (2000). Eliciting second language speech data. In L. Menn, &
N. Bernstein Ratner (Eds.), Methods for studying language production (pp. 149–177). Mahwah,
NJ: Lawrence Erlbaum.
Dulay, H. C., & Burt, M. K. (1973). Should we teach children syntax? Language Learning, 23(2),
245–258.
Dulay, H. C., & Burt, M. K. (1974). Natural sequences in child second language acquisition. Language
Learning, 24(1), 37–53.
Ferguson, C. A. (1975). Towards a characterization of English foreigner talk. Anthropological
Fillmore, C. J. (2000). On fluency. In H. Riggenbach (Ed.), Perspectives on fluency (pp. 43–60). Ann
Arbor: University of Michigan Press.
Foster, P., & Skehan, P. (1996). The influence of planning and task type on second language perfor-
mance. Studies in Second Language Acquisition, 18(3), 299–323.
Gass, S. M., & Mackey, A. (2016). Stimulated recall methodology in applied linguistics and L2 research.
Milton Park: Taylor & Francis.
Gass, S., & Plonsky, L. (2020). Introducing the SSLA methods forum. Studies in Second Language
Goldrick, M., Shrem, Y., Kilbourn-Ceron, O., Baus, C., & Keshet, J. (2021). Using automated acoustic
analysis to explore the link between planning and articulation in second language speech produc-
tion. Language, Cognition, and Neuroscience, 36(7), 824-839.
Gregersen, T., MacIntyre, P. D., & Meza, M. D. (2014). The motion of emotion: Idiodynamic case
studies of learners’ foreign language anxiety. The Modern Language Journal, 98, 574–588.
Hiver, P., & Al-Hoorie, A. (2019). Research methods for complexity theory in applied linguistics. Blue
Ridge Summit, PA: Multilingual Matters.
Horwitz, E. K., Horwitz, M., & Cope, J. A. (1986). Foreign language classroom anxiety. Modern
Huensch, A., & Tracy-Ventura, N. (2017). L2 utterance fluency development before, during, and after
residence abroad: A multidimensional investigation. The Modern Language Journal, 101(2),
275–293.
Huensch, A., Tracy-Ventura, N., Bridges, J., & Cuesta-Medina, J. (2019). Variables affecting the
maintenance of L2 proficiency and fluency four years post-study abroad. Study Abroad Research in
Second Language Acquisition and International Education, 4, 96–125.
Kasper, G. & Wagner, J. (2014). Conversation analysis in applied linguistics. Annual Review of Applied
Labov, W. (1972). Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press.
Lado, R. (1957). Linguistics across cultures: Applied linguistics for language teachers. Ann Arbor, MI:
University of Michigan.
Leopold, W. (1949). Speech development of a bilingual child (Volume 4). Evanston, IL: Northwestern
University Press.
Linck, J. A., & Cunnings, I. (2015). The utility and application of mixed-effects models in second
language research. Language Learning, 65(S1), 185–207.
Long, M. H. (1983). Native speaker/non-native speaker conversation and the negotiation of compre-
hensible input. Applied Linguistics, 4, 126–142.
Lyster, R. & Ranta, L. (1997). Corrective feedback and learner uptake: Negotiation of form in com-
municative classrooms. Studies in Second Language Acquisition, 20, 37–66.
Mackey, A. (2012). Input, interaction, and corrective feedback in L2 learning. Oxford, England: Oxford
University Press.
Mackey, A., & Gass, S. (2016). Second language research: Methodology and design (2nd edn).
MacIntyre, P. D., Dörnyei, Z., Clément, R., & Noels, K. A. (1998). Conceptualizing willingness to
communicate in a L2: A situational model of L2 confidence and affiliation. The Modern Language
Journal, 82(4), 545–562.
Marsden, E., Mackey A., & Plonsky, L. (2016). The IRIS Repository: Advancing research practice and
methodology. In A. Mackey & E. Marsden (Eds.), Advancing methodology and practice: The IRIS
repository of instruments for research into second languages (pp. 1–21). New York: Routledge.
Marsden, E., Morgan-Short, K., Trofimovich, P., & Ellis, N. C. (2018). Introducing Registered Reports
at Language Learning: Promoting transparency, replication, and a synthetic ethic in the language
sciences. Language Learning, 68(2), 309–320. 10.1111/lang.12284
110
Nagle, C. L. & Rehman, I. (2021). Doing L2 speech research online: Why and how to collect online
ratings data. Studies in Second Language Acquisition, 43(4), 916–939.
Nagle, C., Trofimovich, P., & Bergeron, A. (2019). Toward a dynamic view of second language
comprehensibility. Studies in Second Language Acquisition, 41(4), 647–672.
O’Brien, M. G. (2016). Methodological choices in rating speech samples. Studies in Second Language
Acquisition, 38(3), 587–605.
Oyama, S. (1976). A sensitive period for the acquisition of a nonnative phonological system. Journal of
Psycholinguistic Research, 5(3), 261–283.
Rosansky, E. J. (1976). Methods and morphemes in second language acquisition research 1. Language
Learning, 26(2), 409–425.
Ruivivar, J., & Collins, L. (2018). The effects of foreign accent on perceptions of nonstandard
grammar: A pilot study. TESOL Quarterly, 52(1), 187–198.
Sacks, H., Schegloff, E. A., & Jefferson, G. (1974). A simplest systematics for the organization of turn-
taking for conversation. Language, 50(4), 696–735.
Schachter, J. (1974). An error in error analysis. Language Learning, 24, 205–214.
Schmidt, R. W., & Frota, S. N. (1986). Developing basic conversational ability in a second language: A
case study of an adult learner of Portuguese. In R. R. Day (Ed.), Talking to learn: Conversation in
second language acquisition (pp. 237–326). Rowley, MA: Newbury House.
Surtees, V. (2013). Mobile tracking of L2 interactions: Identifying speech act contexts for inclusion in
pragmatic assessment tools. Paper presented at the Canadian Association of Applied Linguistics
Conference, Victoria, Canada.
Talkbank (2000). Electronic resource retrieved from https://talkbank.org, September 10, 2020.
Trofimovich, P., & Baker, W. (2006). Learning second language suprasegmentals: Effect of L2 ex-
perience on prosody and fluency characteristics of L2 speech. Studies in Second Language
Thomson, R. I. & Derwing, T. M. (2015). The effectiveness of L2 pronunciation instruction: A nar-
rative review. Applied Linguistics, 36, 326–344.
111
8
SPOKEN CORPORA
In second language acquisition research on L2 speaking,1 there is growing interest in the use
of spoken corpora to understand language development. Corpora (the plural of corpus) are
generally defined as large collections of speech (or writing) that are balanced and re-
presentative of a particular discourse domain (Biber et al., 1998; McEnery & Hardie, 2012).
Most corpus researchers consider corpora to be collections of naturally occurring oral and/or
written texts. However, learner corpus researchers often also include in their definitions
collections of texts containing elicited material, such as classroom or assessment tasks. We
propose a definition of corpora incorporating a cline of spoken data that is more controlled
and more naturally occurring, with items like reading of words or sentences at one end of the
spectrum; followed by picture description tasks, or narrative recount tasks; followed by
speaking performance assessments (e.g., oral interviews or monologues); followed by open-
ended classroom tasks (e.g., introducing oneself or describing a trip); followed by con-
versation and other spoken domains outside of the classroom. As with the surge of written
corpora starting in the 1980s, when computing capabilities improved, one of the reasons why
spoken corpora are growing in popularity is the increase of digital tools that can make
building, analyzing, and sharing spoken corpora easier.
Early “corpora” (which were not referred to as such) consisted of individual words, focusing
primarily on the study of spoken utterances. These datasets were rightly criticized as un-
representative of speech as a whole and were particularly attacked for their focus on em-
piricism versus rationalism by Chomsky (1962). However, interestingly, even after the
Chomskian revolution of the 1950s, phoneticians continued to work with naturally observed
data, as did second language acquisition researchers (see McEnery & Wilson, 2001). With the
advent of machine-readable capabilities, modern corpora were built in the 1950s and 1960s,
particularly in English. Most of these comprised writing by L1 speakers, and were not in-
tended for the study of second language acquisition. In fact, of the earliest modern (com-
puterized) corpora, only one made a significant contribution to understanding spoken
English, the London-Lund corpus. Started in 1975 and completed in the early 1980s, for
112 DOI: 10.4324/9781003022497-11

Spoken Corpora
years it was the only corpus of spontaneous spoken English with prosodic annotation. In the
1990s, an explosion of learner and other corpora usable for SLA research took place. Two
major databases for accessing these corpora are the Université Catholique de Louvain’s
CECL2 list of learner corpora around the world and TalkBank’s SLABank.3 Major spoken
corpora used for study of SLA include the ESF4 family of corpora, LANGSNAP5 (as well as
FLLOC6 and SPLLOC7), LINDSEI,8 and TLC.9 Many other corpora exist (see Appendix
8A for spoken corpora and Appendix 8B for selected existing corpora) but few languages are
represented (mostly English, French, or Spanish). In addition, there are limitations on the
types of analyses that can be conducted. Many corpora (including LINDSEI) do not provide
researchers with sound files but are limited to transcripts. For those that do include sound
files (e.g., FLLOC and SPLLOC), no phonological annotation is provided, and thus analysis
of features such as prosody would be very time consuming.
The Potential of Learner Corpora for Spoken SLA enquiry

Many scholars have argued for the potential benefits of bringing together the fields of corpus
linguistics and SLA. Although not specifically focused on speaking, the general arguments for
the benefits of more collaboration in these fields applies here. For example, scholars have argued
that to best understand SLA, multiple types of data are essential: corpus data, experimental data,
and information about individual differences (MacWhinney, 2017; Meunier & Littre, 2013).
Using both experimental and corpus data has the potential to avoid the disadvantages of each
(Gilquin & Gries, 2009). For instance, the use of corpora can be advantageous because, if large
enough, corpora can provide SLA scholars with rich data sets from many learners to document
the paths of second language learning (Granger, 2009; McEnery et al., 2019). As Myles (2015)
reminds us, however, for corpora to be useful in this way, they must contain enough examples of
the target feature in question to be analyzed, and must include the full array of contexts where
that feature would normally occur to avoid misinterpretation of the findings (p. 314).
As development over time is a critical variable in SLA research (Ortega & Iberri-Shea,
2005), a subset of corpora particularly beneficial for SLA enquiry are longitudinal. These
types of corpora track the same language learners across multiple data collection points.
Nevertheless, longitudinal corpora require collaborative efforts/teamwork as data collection
is particularly work and time intensive, and researchers must keep in mind questions of
planning, research design, and participant attrition (Tracy-Ventura & Huensch, 2018). It is
important to point out that it is not necessary for corpora to be large, longitudinal, general in
scope, or naturally occurring to benefit the study of spoken SLA. Many smaller, more
specialized corpora exist that maintain balance and representativeness (see Main Research
Methods part) for their domain of enquiry. A final advantage of using corpora for SLA
enquiry is that they can often be easily shared (McEnery et al., 2019; Myles, 2015) which
increases the impact of the data because it allows the possibility for more researchers to use
the data, for new questions to be asked and answered with the data, and for replication
studies to occur. Researchers have to start with the mindset of sharing from the beginning,
however, to ensure ethical use of corpus data.
Research Questions Using Corpora to Investigate Spoken SLA

Despite the fact that written corpora outnumber spoken corpora, there are still many re-
search questions being asked and answered using spoken corpora. These range from the
113
acquisition of grammatical and lexical features (e.g., Crossley et al., 2015), pragmatic fea-
tures (e.g., Fernández, 2013), utterance fluency (e.g., Huensch & Tracy-Ventura, 2017),
phonological features (e.g., Götz, 2013), complexity/accuracy/fluency (CAF) analyses (e.g.,
Vercellotti, 2017), and more. A 2019 special issue in the International Journal of Learner
Corpus Research (IJLCR) highlights some of the possibilities of using oral corpora to explore
spoken SLA using the TLC9, a large (4.2 million words) corpus collected from 2012–2018
which includes monologic and interactive speech from the Graded Examinations in Spoken
English assessment developed by Trinity College London.
One area of spoken SLA research fairly well-represented by the use of corpora is oral
fluency development (Huensch, 2020). In the IJLCR special issue, Götz (2019) used a subset
of the TLC to investigate utterance fluency, specifically the relationship between filled pause
frequency and variables such as proficiency level, country of origin, and age of acquisition.
Using regression modelling to predict filled pause frequency, Götz demonstrated that the
factor with the strongest explanatory power was country of origin, which is a loose proxy for
L1 background. With evidence that filled pause usage is particularly linked to L1 influence,
Götz calls into question the practice of high-stakes assessment such as the Common
European Frame of Reference explicitly mentioning this feature in rubrics designed to test all
learners on the same scale. Many other studies have used spoken corpora to explore oral
fluency, such as the PAROLE10 corpus which includes speech from learners of English and
French as well as NS control groups and has been used to compare utterance fluency
characteristics among NSs and learners at different proficiency levels (Hilton, 2014). The
WiSP11 corpus, including English and Turkish L1 learners of L2 Dutch, has also been used
to explore multiple research questions regarding L2 fluency, including investigations of
L1–L2 fluency relations (e.g., De Jong et al., 2015).
Another area of research using corpora to investigate SLA pertains to the development
of constructions, or form-meaning pairings ranging from morphemes to words to idiomatic
expressions to syntactic frames (Ellis et al., 2016). Verb constructions are the focus of
Gilquin (2019) and Römer and Garner (2019) in the IJLCR special issue. Römer and
Garner examined the development of verb argument constructions (e.g., V about n, V for n)
across proficiency levels (low intermediate to high advanced). One benefit of using corpora
for such an analysis is the ability to compare results to a large reference corpus, in this case
the British National Corpus. Römer and Garner discovered that learners at advanced
proficiency levels evidenced similar distributions to the British National Corpus in both the
number and distribution of verbs in the constructions and were also able to demonstrate
how lower-level learners differed in terms of the types of verbs used in the constructions.
A host of other studies have focused on lexico-grammatical patterns of learner speech
across proficiency levels (e.g., Biber et al., 2016; Staples et al., 2017). A fairly consistent
finding across research contexts is that task type strongly influences the use of features as-
sociated with informational elaboration (e.g., use of nouns and noun modifiers, longer
words, passive voice, and relative clauses), more often associated with writing than speech.
While mode clearly plays a major role in determining learners’ use of these features, tasks
requiring more informational content (e.g., an oral interview focused on students’ profes-
sional experience or an integrated speaking task) lead to greater production of these features.
In addition, speakers at higher proficiency levels use more of these features within in-
formationally driven tasks.
The final two studies in the IJLCR special issue used the TLC to investigate pragmatic
development in the use of backchannels (Castello & Gesuato, 2019) and stance adverbs
(Pérez-Paredes & Díez-Bedmar, 2019). Pérez-Paredes & Díez-Bedmar explored the impact of
114
Spoken Corpora
task (monologic vs. dialogic) and proficiency level on the use of adverbs such as really,
actually, and obviously to display stance. Using both quantitative and qualitative analyses,
the researchers provided evidence that task type differentially impacted adverb usage: ac-
tually was more task-independent compared to really.
Pragmatics of spoken language development has been the subject of several studies of L2
spoken discourse outside of assessment contexts (Fernández & Yuldashev, 2011; Friginal
et al., 2017; Gilquin, 2008; Polat, 2011). Friginal et al. (2017) explore how hedges (e.g., think,
sort of ) and boosters (e.g., so) along with first person pronouns and modal verbs are used by
learners in EAP classroom discourse. Their results show that learners used think over-
whelmingly as a hedging device, and did not use modals for this purpose as much as their
teacher interlocutors. Modal verbs were also used more frequently by L2 learners in colla-
borative tasks when compared to non-collaborative tasks. Possibility, ability, and permission
modals (e.g., can, could ) were particularly frequent, reflecting learners’ negotiation of
meaning during collaborative tasks (e.g., can you explain…).
Current Gaps in the Literature

While the development and use of spoken corpora for SLA research is on the rise with
several research questions being explored, there are notable gaps in the literature. The first
relate to a dearth of two types of learner corpora: phonological and longitudinal.
Phonological corpora include both audio (or video) data and time-aligned annotations of
some phonological feature (Gut & Voormann, 2014). Some explanations for the limited
research using L2 phonological corpora are (1) because relatively few phonological corpora
exist, (2) some accessible spoken corpora lack sound files for researchers to make their own
time-aligned annotations, and (3) creating time-aligned annotations of phonological features
requires specialized phonological knowledge and much time. Some examples of L1 phono-
logical corpora include IViE,12 PFC,13 and the child phonology component of the TalkBank,
PhonBank.14 While some L2 phonological corpora exist (e.g., L2-Arctic15 and LeaP,16 dis-
cussed later) they are certainly in the minority of spoken corpora. Similarly, (dense) long-
itudinal corpora are rare despite being argued to be particularly critical for SLA research
(Granger, 2009; MacWhinney, 2017; Ortega & Iberri-Shea, 2005). As with phonological
corpora, the limited number of longitudinal corpora is most likely because they are parti-
cularly time-consuming and expensive to compile and annotate.
Another gap pertains to limited investigations using corpora in addition to other types of
research methods. Specifically, there have been multiple calls to conduct more work combining
corpus research methods with experimental methods and for those using corpora for SLA
research to make greater connections to SLA theory (McEnery et al., 2019; Myles, 2015).
McEnery et al. (2019) argued that “the key fault line between SLA research and [learner corpus
research] LCR” is that “SLA research has been largely theory-driven…test[ing] theory through
psycholinguistic and other (quasi)experimental methods” while “by contrast, learner corpus
researchers have been more exploratory and pre-theoretical in their approach to learner lan-
guage” (p. 83). Meunier and Littre (2013) provide an example of this type of approach, albeit
with written production. They investigated the development of tense and aspect in French-
speaking learners of English using evidence from a longitudinal learner corpus of argu-
mentative essays. Features identified as problematic for the learners based on continued errors
in use from the corpus (e.g., the present progressive) were then used to create stimuli for
multiple experimental tasks whose purpose was to tease apart which specific functions con-
tinued to be problematic.
115

The four corpora described here were selected to demonstrate breadth and variation with
regard to the L1s/L2s represented, accessibility to the data/materials, tasks used for data
collection, and research questions asked.
LANGSNAP
As described earlier, corpora have been used to investigate many research questions in spoken
second language acquisition. The Languages and Social Networks Abroad Project (LANG-
SNAP, Mitchell et al., 2017) is a good example of the benefits of publicly shared longitudinal
corpora and how a corpus can be designed and used to answer a wide range of research
questions. The LANGSNAP corpus contains data from UK university students who were L2
learners of French or Spanish and required to spend their third year of a four-year degree
programme living in a French- or Spanish-speaking country. From 2011 to 2013, 56 partici-
pants completed a picture-based narration and a semi-structured interview at each of six data
collection points before, three times during, and two times after returning home from their
9-month sojourn abroad. Participants also completed an argumentative writing task. The
audio files and transcriptions (in CHAT format, discussed later) are available for download on
TalkBank. The oral data have been used to explore spoken language development of modality
(McManus & Mitchell, 2015), CAF (McManus et al., 2020), L1–L2 fluency relationships
(Huensch & Tracy-Ventura, 2017), and identity (Mitchell et al., 2020). Because it is a publicly
available corpus, it has also been used by other research groups. For example, Gudmestad
et al. (2019) explored the development of grammatical gender marking in L2 Spanish from a
variationist SLA perspective and, using a multifactorial analysis, demonstrated how multiple
linguistic (e.g., noun gender, noun frequency) and extralinguistic (e.g., task) factors contribute
to different components of stability and variability in the gender marking of advanced L2
speakers. Data are still being added to this “productive” corpus. In 2016 and 2019, 33 and 31,
respectively, of the original 56 speakers participated in two additional rounds of data collec-
tion, bringing the total project to 8 years and allowing new research questions examining
factors that impact foreign language attrition/development/maintenance (Huensch et al., 2019).
LINDSEI
The LINDSEI corpus (Gilquin et al., 2010) has been used for an impressive number of research
studies (see https://uclouvain.be/en/research-institutes/ilc/cecl/lindsei-bibliography.html). LINDSEI
was designed as a spoken counterpart to written argumentative essays provided in the International
Corpus of Learner English (ICLE) corpus (Granger, 1998). The LINDSEI corpus consists of in-
terviews with university-level English as a Foreign Language learners following a set structure in
three parts. Each interview begins with a warm-up comprising a monologic speaking task on a
given topic followed by an informal dialogic interview about speakers’ lives at university. To finish,
speakers completed a picture description task. The corpus (transcripts only) is available for pur-
chase and currently includes interviews with 554 participants. Two main strengths of the LINDSEI
corpus are the variety of L1s represented (11 different backgrounds) and its parallel L1 English
corpus, the Louvain Corpus of Native English Conversation (LOCNEC, De Cock, 2004). This
design allows for both cross-linguistic and L1–L2 comparisons. Several studies have focused on
discourse markers and other “small words” in LINDSEI (Buysse, 2012; Guilquin, 2008). For
instance, Buysse (2012) explored spoken usage of the discourse marker so in the LINDSEI Dutch
L1 subcorpus (n = 40 interviews) between learners majoring in English Linguistics versus those
116
Spoken Corpora
majoring in Commercial Sciences and also compared the learners to the L1 English LOCNEC
corpus. Results indicated that both groups of learners and L1s evidenced the use of so in a variety
of different functions, but that learners (from both majors) tended to overuse so in comparison to
the L1 reference corpus. Similarly, Götz (2013) is a book-length treatment exploring native and
non-native speaker utterance fluency using the German L1 subcorpus of LINDSEI. One finding
from her analysis of the patterns of use of discourse markers was that learners often underused
them and used a limited variety in comparison to native speakers. Rosen (2016) used the French
L1 subcorpus of LINDSEI to explore the constructs of error and innovation by comparing the
LINDSEI corpus data to a variety of English influenced by Norman French, Jersey English.
Rosen’s analysis brings together SLA research and research on indigenized varieties of English and
asserts that “the difference between the notions of (not yet conventionalized) innovations on the
one hand and errors on the other seems to be terminological and attitudinal – a matter of per-
spective and norm-orientation rather than a linguistic difference” (p. 304). Data collection for the
LINDSEI corpus involves multiple researchers across several international institutions following a
protocol to ensure that data collected are suitable for comparison. Additional subcorpora are
continually being added to the LINDSEI corpus.
LeaP
The LeaP corpus (Gut, 2012) is one of the few L2 phonological corpora. The corpus consists of
spoken data from L2 learners of German and English collected between 2001 and 2003. The
project examined the acquisition of prosodic features (e.g., intonation, stress) and the potential
impact of factors such as proficiency, formal instruction, and individual differences variables such
as motivation and musicality. Over 12 hours of speech was collected from learners and native
speakers completing tasks comprising both read and spontaneous speech. The reading tasks in-
cluded a list of nonsense words and a narrative passage. The spontaneous speech tasks included a
re-telling of the narrative passage and an informal interview. Time-aligned annotations were
completed in Praat (Boersma & Weenink, 2020) and included segmentation of words, syllables,
phonemes, tone, and pitch. The corpus (including sound files, texgrids, xml files, and manual) is
freely available for download. One potential limitation is that the tools developed for its analysis
are not publicly available and likely require basic knowledge of programming in Perl language
(Edalatishams, 2017). Investigations of both the development of phonological features and oral
language fluency have been published. For example, Gut (2017) used a subset of learners from the
LeaP corpus to conduct a mixed-methods analysis of the effects of learning context on phono-
logical development in different tasks over time. Contexts included study abroad, study abroad
with participation in a phonology course, and at-home learners who participated in a phonology
course. Phonological variables included vowel reduction, intonation, and fluency (articulation rate
and mean length of run). The quantitative results showed no clear advantage for one of the
contexts over another (although there were trends indicating benefits for the groups who received
explicit teaching). Additionally, the qualitative analysis revealed a large amount of individual
variability across learners in all contexts, and indicated that making gains in a phonological feature
typically resulted in doing so across multiple tasks in the corpus.
CCOT
17
The Corpus of Collaborative Oral Tasks (CCOT; Crawford, 2021) was created at Northern
Arizona University between 2009 and 2012. The tasks in the corpus were given to students as
part of their achievement tests during their study in an Intensive English Programme, from one
to three times. There are 24 tasks, with at least ten learner performances of each task for a total
117
of 775 files. There are 600 speakers from three proficiency levels. The most common tasks are
problem solving (e.g., where learners decide which patient to treat or create an advertisement
together). Both the audio files and the transcriptions are available by contacting the creator,
William Crawford. An edited volume (Crawford, 2021) includes research on lexico-grammar,
pronunciation, and other types of speech analysis. For example, Staples (2021) investigates
lexico-grammatical features (e.g., nouns, conditional clauses, that complement clauses), in-
teractional features (turn length, backchannels, questions), fluency (speech rate, length of
pauses), and pitch range across task types (informational and argumentative). Not surpris-
ingly, nouns and other informational features were used more in the informational task while
conditional clauses were more common in the argumentative task. These findings align with
numerous studies supporting the use of these lexico-grammatical features for these particular
purposes. However, perhaps more interesting are the findings for interactional features, flu-
ency, and pitch range. Backchannels were used more frequently in informational tasks, per-
haps reflecting the listener’s uptake of information provided by the speaker. Speech rate was
faster and number of pauses was lower for the argumentative tasks, likely reflecting the less
dense use of informational content in the argumentative tasks. Pitch range was also higher in
the argumentative task, perhaps due to the need to stress syllables at higher pitch to make
points more salient and arguments appear stronger. These findings have important implica-
tions for the understanding of interactional variables, fluency, and pronunciation across tasks.
Corpus Building and Research Design

Methods for corpus design rely heavily on the definition of corpus used by the researcher.
This part assumes the definition used by corpus linguists: spoken corpora consist of speech
samples from a naturally occurring discourse domain. From this perspective, corpus de-
velopers generally work to ensure two characteristics of corpora: balance and representa-
tiveness. Balance refers to providing appropriate numbers of texts associated with
subdomains within the research domain one is investigating. For example, researchers in
SLA often work to balance the data across task type, or across speaker L1 groups, among
other variables. Depending on the research questions to be answered by the corpus, the type
of balance required will change. In addition, corpus developers work to ensure that the
sample included in their corpus is representative of the domain they are trying to represent.
So, a corpus that consists of words read aloud cannot represent conversational discourse.
However, a corpus of spoken assignments can represent what learners are doing in a
classroom context. Thus, it is important to consider the research questions when evaluating
the representativeness of the corpus for a given project. In addition to evaluating extra-
linguistic characteristics of the corpus (e.g., L1 background of the speakers or task types),
representativeness can also be evaluated linguistically. Depending on the type of language
data a researcher is investigating, a corpus may be more or less representative of that lan-
guage feature. For example, if a researcher is investigating syllable stress, as long as multi-
syllabic words are represented in the corpus, the corpus can be used for that linguistic fea-
ture; the corpus size may be small as long as there are multi-syllabic words at a high enough
rate. However, if a researcher is investigating particular idioms, it will be harder to find a
representative corpus, as some idioms occur quite infrequently and thus are not well re-
presented in all spoken corpora. This is a reason to have a large corpus. In general, features
like pausing (both filled and unfilled), stress patterns, vowel or consonant sounds, or
grammatical features that are common to speech can be well represented in many types of
118
Spoken Corpora
corpora. However, to investigate intonation and rhythm, more naturally occurring speech is
needed. To investigate vocabulary, larger corpora are needed.
Corpus methods typically take three different approaches to research design (coined Type
A, Type B, and Type C by Biber & Jones, 2009). In type A studies, researchers investigate a
linguistic feature to determine how that feature varies based on the linguistic environment.
For example, one might examine copular verbs in Spanish and Portuguese learner corpora to
see how they vary depending on type of complement. Logistic regression can then identify
whether the patterns vary across L1 background or learner level (Picoral, 2020).
Type B studies take as their unit of observation an individual text. Linguistic features are
examined within each text, but the output is the frequency of occurrence of that feature in each
text. Thus, the focus is not on the behaviour of a linguistic feature in a linguistic environment, but
rather how frequent that feature is used across L1 backgrounds, learner levels, and/or text types.
Types of statistical methods used with these corpora include the ANOVA family, to investigate
differences across subgroups (e.g., by proficiency level or L1 background, for example) or from the
correlation/regression family, to determine relationships between a continuous operationalization
of proficiency (e.g., scores on a proficiency test) and linguistic features.
Type C studies are similar to type B, but they investigate the frequencies of an entire
subcorpus rather than getting the frequency for each individual text within that subcorpus.
In this case, the use of inferential statistics is more limited, and it is commonplace for re-
searchers to report frequency data. Reporting range along with normed frequencies is ad-
visable, to help researchers determine whether the phenomena are spread throughout
speakers in a subcorpus or are used by only one or two speakers.
Most corpora are sampled from one period of time and thus are typically cross-sectional.
Corpus compilers ideally balance the corpus across score or proficiency levels, and also typi-
cally try to balance across L1 backgrounds. Such corpora provide valuable information about
linguistic and other features that characterize performance at different levels. However, more
recently, there has been a call for more quantitative longitudinal studies. Longitudinal corpora
provide an ideal dataset for examining spoken development. One of the choices researchers
must make is whether they prioritize the similarity of task across time periods (e.g., the same
task is administered to learners at two or more points in time) or whether they want to
prioritize the type of tasks suitable for learners at different developmental stages. The former
has the obvious advantage of being more controlled, while the latter has the advantage of more
ecological validity. Researchers are exploring these two options in corpus data, and it is clear
that new methods are needed to address different types of longitudinal datasets.
For corpora consisting of read words or sentences, balance and representativeness are not
important considerations. As discussed earlier, such corpora have the advantage that they
can be designed to have the control of a psycholinguistic experimental setting with shared
prompts and lab-quality recordings but have the disadvantage of not representing a spoken
discourse domain. Methods for these types of corpora are similar to those for psycho-
linguistic data, discussed in Nagle et al., this volume.
Digital Tools
A variety of digital tools exist to transcribe, annotate (including tagging and segmentation),
and analyze spoken corpora. This part describes these processes and some of the most
commonly used tools to complete them.
Transcription is the process of representing oral language in some form of written script,
such as orthographic transcription (e.g., following typical spelling conventions) or phonemic
or phonetic transcription (e.g., using IPA symbols and diacritics). While digital tools can
119
assist in (semi-) automating other processes with spoken corpora, manual transcription is
often a necessary and time-consuming first step. For instance, Brezina et al. (2019) reported
that it took 5 years and nearly 3,500 hours to transcribe the TLC (see footnote 8). Spoken
corpora can be transcribed in text editors (e.g., Microsoft Notepad++, Mac TextEdit) or
software programs specialized for linguistic analysis such as the freely available CLAN or
ELAN. CLAN (Computerized Language Analysis, MacWhinney, 2000) is a software pro-
gram developed for the TalkBank system. CLAN is designed to work in conjunction with the
CHAT (Codes for Human Analysis of Transcripts) transcription and coding format, a set of
standardized conventions for creating computerized transcripts of speech. ELAN (EUDICO
Linguistic Annotator, Wittenburg et al., 2006) is another software program that allows for
transcription and analysis of audio and video. A useful feature of ELAN is its organization
around tiers, which can be hierarchically structured.
Annotation is the process of providing additional linguistic information to the tran-
scription. One of the most common forms of annotation across both written and spoken
corpora is known as tagging. This is the process of marking up words in the corpus with part-
of-speech information based on the word and its context. For example, in Figure 8.1, lines 17
and 18 represent the part-of-speech (POS) tagged words from the orthographically tran-
scribed Spanish utterance in line 16. As shown in Figure 8.1, POS tagging in this case
provides information about word class, tense, gender, number, etc. Typically, the tagging
process is automatic, although some follow-up disambiguation might be necessary depending
on the accuracy of the tagger. At a minimum, it is important to include accuracy checking as
one of the steps when using automatic annotators.
Once transcribed and potentially POS annotated, concordancing tools can be used for analyses
related to the frequency and distribution of words in a corpus. These often involve extracting not
only key words or phrases, but also the words occurring before and after them [known as Key
Word In Context (KWIC) analyses]. These tools are available as stand-alone (e.g., AntConc) or
web-based (e.g., SketchEngine) applications and have been used not only for linguistic research,
but also as pedagogical tools. For example, SketchEngine provides access to 500+ corpora in over
90 languages, but researchers can also upload their own corpora for analysis.
Many other forms of annotation are possible (for an overview, see Leech, 2005).
Regarding annotation specific to spoken corpora, for example, prosodic annotation could be
used to indicate information about intonation, stress, and pausing. Additionally, symbols
may be added to transcripts for features such as filled pauses (e.g., uh, um), repeated or
reformulated words or phrases. Segmentation is a common form of annotation in spoken
corpora and can be used at multiple levels. For instance, segmentation might be used to
separate speech from silence, to indicate discourse units such as turns, to separate phonemes
or syllables within a word, etc. In addition to ELAN, Praat is a commonly used digital tool
for segmentation and annotation of speech. Annotations in Praat are created in TextGrid
files, which can have multiple tiers as shown in Figure 8.2. Praat has a built-in feature to
automatically segment silence from speech called Annotate To TextGrid (silences…). Its
accuracy varies depending on the sound quality of the file, so manual post-checking is re-
commended. Given Praat’s wide usage, many other digital corpus tools (e.g., CLAN, ELAN,
Figure 8.1 Example POS-tagged utterance in the CLAN software program
120
Spoken Corpora
Figure 8.2 Praat TextGrid with annotation including a word tier (1), syllable tier (2), and a consonant/
vowel tier (3). Reprinted from Ghanem et al. (2020) with permission
Phon) can read and/or write Praat TextGrid files. Praat has also been used for annotation of
phonological corpora, which are particularly time- and effort-intensive to annotate. For
example, the LeaP corpus involved approximately 1,000 events annotated per minute (Gut,
2012) whose reliability varied as a factor of what was being annotated. One of the most
important benefits of annotation, however, is that once completed, it allows for automatic
analysis of the corpus.
Ghanem et al. (2020) provide an overview and evaluation of five commonly used digital
tools for spoken discourse, and recommendations for combining them efficiently across
different stages of data preparation and analysis for pronunciation corpora. They provide
documentation and evaluation of the five digitals tools on their website.18

Increasing the trend of data-sharing is an important and efficient way to move forward. A
simple step is to encourage researchers collecting oral data to include permissions in IRB/
Human Ethics consent forms for sharing data. TalkBank provides a template and another
example can be found on the Ghanem et al. (2020) website (see footnote 18). Once approval
has been received, multiple options exist for sites to share sound files and other data, such as
researchers’ institutional websites (if available), SLABank (if CHAT transcripts are in-
cluded), Instruments for Research into Second Languages (IRIS, Marsden et al., 2016), or
the Open Science Framework (OSF, osf.io). For those data including annotation, it is useful
to share protocols describing annotation procedures and decisions. Documenting and pro-
viding access to these facilitates other researchers’ use of the data. Finally, in choosing a data
format, researchers should consider using programs designed with interoperability in mind
(e.g., CLAN, ELAN, Praat) and/or plain text. Beyond data-sharing efforts, encouraging
project collaboration across multiple institutions and researchers is another possibility for
building corpus resources that has been successful in the past (e.g., the LINDSEI project).
121
Two major database providers, CECL and TalkBank, provide suggestions and/or example
guidelines for creating new corpora and embarking on multi-institutional collaborations.
Another recommendation is to increase training opportunities and materials for re-
searchers who currently work with or would like to work with spoken corpora. Many
programs are available for transcription, annotation, and analysis of spoken corpora – so
many that those new to the field might have difficulty deciding where to start. More es-
tablished programs (e.g., Praat, AntConc) usually have detailed documentation user guides
on the web. Providing additional training opportunities during pre-conference workshops or
conference presentations at venues such as the American Association of Corpus Linguistics
(AACL), the Pronunciation in Second Language Learning and Teaching (PSLLT) con-
ference, or the Second Language Research Forum (SLRF) or as part of a summer workshop
session is a great way to increase skills and encourage wider use of spoken corpora for SLA.
7 Future Directions
Ultimately, while many potential benefits of using spoken corpora for SLA research exist,
the field is in its early stages. We have indicated areas of research that represent important
next steps. Thus far, we have highlighted the need for more longitudinal and phonological
corpora and research as well as examinations of spoken SLA combining experimental and
corpus-based methods. For projects such as these, collaborative efforts that bring together
researchers from multiple institutions and methodological expertise are likely to be most
successful. Such efforts require careful planning as well as consistency in data collection and
preparation.
One future direction that deserves mention is the need for spoken corpora representing
less commonly taught languages (LCTLs) as well as greater L1–L2 pairings in general. Not
surprisingly, most corpora represent languages such as English, French, or Spanish. One
project working to collect data from two LCTLs, Russian and Portuguese, is the
Multilingual Corpus of Assignments – Writing and Speech (MACAWS).19
Notes
1 To clarify, by “second language acquisition of speaking,” we mean L2 acquisition (in or outside
instructional contexts) in the spoken mode, including investigation of phonology, syntax, and
pragmatics.
2 Université Catholique de Louvain’s Centre for English Corpus Linguistics (CECL); https://
uclouvain.be/en/research-institutes/ilc/cecl/corpora.html
3 SLABank; MacWhinney (2020), https://slabank.talkbank.org/
4 European Science Foundation Second Language (ESF); (Perdue, 1993), https://slabank.talkbank.org/
access/Multiple/ESF/
5 Languages and Social Networks Abroad Project (LANGSNAP); (Mitchell et al., 2017), http://
langsnap.soton.ac.uk/, https://scholarcommons.usf.edu/langsnap/
6 French Learner Language Oral Corpora (FLLOC); http://www.flloc.soton.ac.uk/
7 Spanish Learner Language Oral Corpora (SPLLOC); http://www.splloc.soton.ac.uk
8 Louvain International Database of Spoken English Interlanguage (LINDSEI); (Gilquin et al.,
2010), https://uclouvain.be/en/research-institutes/ilc/cecl/lindsei.html
9 Trinity-Lancaster Corpus (TLC); (Brezina et al., 2019), http://cass.lancs.ac.uk/trinity-lancaster-
corpus/
10 Parallèle Oral en Langue Etrangère ‘Parallel Oral Foreign Language’ (Parole); (Hilton, 2009),
https://slabank.talkbank.org/access/English/PAROLE.html
11 What is Speaking Proficiency (WiSP); (De Jong et al., 2015).
12 Intonational Variation in English (IViE); (Grabe et al., 2001), http://www.phon.ox.ac.uk/files/
apps/IViE/
122
Spoken Corpora
13 Phonologie du Français Contemporain (PFC); (Durand et al., 2002), https://www.projet-pfc.net/

14 PhonBank; Rose and MacWhinney (2014), https://phonbank.talkbank.org/
15 L2-ARCTIC; Zhao et al. (2018), https://psi.engr.tamu.edu/l2-arctic-corpus/
16 Learning Prosody in a Foreign Language (LeaP); (Gut, 2012); https://sourceforge.net/projects/
leapcorpus/
17 Corpus of Collaborative Oral Tasks (CCOT); Crawford (under contract)
18 Digital Tools Used with Pronunciation Corpora; Ghanem et al. (2020), https://sites.google.com/
view/psllt2019/home
19 Multilingual Corpus of Assignments – Writing and Speech (MACAWS), http://macaws.corporaproject.org
Further Reading
Biber, D., & Reppen, R. (Eds). (2015). The Cambridge handbook of English corpus linguistics.
Cambridge: Cambridge University Press.
This handbook covers major research areas within corpus linguistics, with an emphasis on English but
with relevance to research of other languages as well. The book contains helpful chapters on common
research areas, such as keyword and collocational analysis, as well as introductions to both spoken
corpus and learner corpus research.
Granger, S., Gilquin, G., & Meunier, F. (Eds.) (2015). The Cambridge handbook of learner corpus
research. Cambridge: Cambridge University Press.
This handbook provides a comprehensive guide to the rapidly‐developing field of learner corpus re-
search. The volume contains 27 chapters divided among parts devoted to corpus design and metho-
dology, learner language analysis, and intersections between learner corpus research and SLA, language
teaching, and natural language processing.
Tracy-Ventura, N., & Paquot, M. (2020). The Routledge handbook of SLA and corpora. New York:
Routledge.
This handbook begins with introductory chapters on corpus linguistics, LCR, SLA, and the intersec-
tions of SLA and LCR. The remainder of the handbook is comprised of three parts (a) aspects of
corpus design, annotation, and analysis, (b) the role of corpora in SLA theory and practice, and (c)
SLA constructs (e.g., input, interaction, accuracy) and corpora. The handbook ends with a chapter on
future directions of the use of corpora in SLA.
References
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use.
Biber, D., Gray, B., & Staples, S. (2016). Predicting patterns of grammatical complexity across textual
task types and proficiency levels. Applied Linguistics, 37, 639–668.
Biber, D., & Jones, J. K. (2009). Quantitative methods in corpus linguistics. In Corpus linguistics: An
international handbook (pp. 1286–1304). Berlin: De Gruyter Mouton.
Boersma, P., & Weenink, D. (2020). Praat: Doing phonetics by computer (Version 6.1.09) [Computer
program]. Retrieved from http://www.praat.org/
Brezina, V., Gablasova, D., & McEnery, T. (2019). Corpus-based approaches to spoken L2 production:
Evidence from the Trinity Lancaster Corpus. International Journal of Learner Corpus Research, 5, 119–125.
Buysse, L. (2012). So as a multifunctional discourse marker in native and learner speech. Journal of
Pragmatics, 44, 1764–1782.
Castello, E., & Gesuato, S. (2019). Holding up one’s end of the conversation in spoken English: Lexical
backchannels in L2 examination discourse. International Journal of Learner Corpus Research, 5,
231–252.
Chomsky, N. (1962). Paper given at the University of Texas 1958. In 3rd Texas conference on problems
of linguistic analysis in English. Austin, TX: University of Texas.
Crawford, W. (2021). Multiple perspectives on learner interaction: The corpus of collaborative oral tasks.
New York: DeGruyter.
Crossley, S. A., Salsbury, T., & Mcnamara, D. S. (2015). Assessing lexical proficiency using analytic
ratings: A case for collocation accuracy. Applied Linguistics, 36, 570–590.
De Cock, S. (2004). Preferred sequences of words in NS and NNS speech. Belgian Journal of English
Language and Literatures, 2, 225–246.
123
De Jong, N. H., Groenhout, R., Schoonen, R., & Hulstijn, J. H. (2015). Second language fluency:
speaking style or proficiency? Correcting measures of second language fluency for first language
behavior. Applied Psycholinguistics, 36, 223–243.
Durand, J., Laks, B., & Lyche, C. (2002). La phonologie du français contemporain: usages, variétés et
structure. In C. Pusch & W. Raible (Eds.), Romanistische Korpuslinguistik- Korpora und gesprochene
Sprache/Romance corpus linguistics – Corpora and spoken language (pp. 93–106). Tübingen: Gunter
Narr Verlag.
Edalatishams, I. (2017). LeaP corpus (review). In M. O’Brien & J. Levis (Eds.), Proceedings of the 8th
pronunciation in second language learning and teaching conference, ISSN 2380-9566, Calgary, AB,
August 2016 (pp. 236–240). Ames, IA: Iowa State University.
Ellis, N. C., Römer, U. & O’Donnell, M. B. (2016). Usage-based approaches to language acquisition and
processing: cognitive and corpus investigations of construction grammar. Language Learning
Monograph Series. Hoboken, NJ: Wiley-Blackwell.
Fernández, J. (2013). A corpus-based study of vague language use by learners of Spanish in a study
abroad context. In C. Kinginger (Ed.), Social and cultural aspects of language learning in study
abroad (pp. 299–332). Philadelphia: John Benjamins.
Fernández, J., & Yuldashev, A. (2011). Variation in the use of general extenders and stuff in instant
messaging interactions. Journal of Pragmatics, 43, 2610–2626.
Friginal, E., Lee, J. J., Polat, B., & Roberson, A. (2017). Exploring spoken English learner language
using corpora: Learner talk. New York: Springer.
Gablasova, D., Brezina, V., & McEnery, T. (2019). The Trinity Lancaster corpus: Development, de-
scription and application. International Journal of Learner Corpus Research, 5, 126–158.
Gilquin, G. (2008). Hesitation markers among EFL learners: Pragmatic deficiency or difference? In J.
Romero-Trillo (Ed.), Pragmatics and corpus linguistics: A mutualistic entente (pp. 119–149). Berlin:
Mouton de Gruyter.
Gilquin, G. (2019). Light verb constructions in spoken L2 English: An exploratory cross-sectional
study. International Journal of Learner Corpus Research, 5, 181–206.
Ghanem, R., Edalatishams, I., Huensch, A., Puga, K., & Staples, S. (2020). The effectiveness of
computer programs in the transcription and analysis of spoken discourse: towards a protocol for
pronunciation corpora. In O. Kang, S. Staples, K. Yaw, & K. Hirschi (Eds.), Proceedings of the 11th
pronunciation in second language learning and teaching conference (pp. 97–114). Ames, IA: Iowa State
University.
Gilquin, G., De Cock, S., & Granger, S. (2010). The Louvain international database of spoken English
interlanguage. Handbook and CD-ROM. Louvain. Belgium: Presses universitaires de Louvain.
Gilquin, G., & Gries, S. (2009). Corpora and experimental methods: A state-of-the-art review. Corpus
Linguistics and Linguistic Theory, 5, 1–26.
Götz, S. (2019). Filled pauses across proficiency levels, L1s and learning context variables: A multi-
variate exploration of the Trinity Lancaster Corpus Sample. International Journal of Learner Corpus
Research, 5, 159–180.
Götz, S. (2013). Fluency in native and nonnative English speech. Philadelphia, PA: John Benjamins.
Grabe, E., Post, B. & Nolan, F. (2001). The IViE corpus. Department of Linguistics, University of
Cambridge. http://www.phon.ox.ac.uk/old_IViE.
Granger, S. (2009). The contribution of learner corpora to second language acquisition and foreign
language teaching: A critical evaluation. Corpora and Language Teaching, 33, 13–32.
Granger, S. (1998). The computerized learner corpus: A versatile new source of data for SLA research.
In S. Granger (Ed.), Learner English on computer (pp. 3–18). London: Longman.
Gudmestad, A., Edmonds, A., & Metzger, T. (2019). Using variationism and learner corpus research to
investigate grammatical gender marking in additional language Spanish. Language Learning, 69, 911–942.
Gut, U. (2017). Phonological development in different learning contexts. International Journal of
Learner Corpus Research, 3, 196–222.
Gut, U. (2012). ‘The LeaP corpus. A multilingual corpus of spoken learner German and learner
English. In Th. Schmidt, & K. Wörner, K. (Eds.), Multilingual corpora and multilingual corpus
analysis (pp. 3–23). Amsterdam: John Benjamins.
Gut, U., & Voormann, H. (2014). Corpus design. In J. Durand, U. Gut, & G. Kristoffersen (Eds.), The
Oxford handbook of corpus phonology (pp. 13–26). Oxford: Oxford University Press.
Hilton, H. E. (2014). Oral fluency and spoken proficiency: Ideas for testing and research. In P. Leclercq,
A. Edmonds, & H. Hilton (Eds.), Measuring L2 proficiency: Perspectives from SLA (pp. 27–53).
Bristol, UK: Multilingual Matters.
124
Spoken Corpora
Hilton, H. (2009). Annotation and analyses of temporal aspects of spoken fluency. CALICO Journal,
26, 644–661.
Huensch, A. (2020). Fluency. In N. Tracy-Ventura, & M. Paquot (Eds.), The Routledge handbook of
SLA and corpora. New York: Routledge.
Huensch, A., & Staples, S. (2018). Towards a protocol for a multilingual corpus for pronunciation
researchers. Pronunciation in Second Language Learning and TeachingAmes, Iowa. https://
apling.engl.iastate.edu/conferences/pronunciation-in-second-language-learning-and-teaching-
conference/psllt-archive/
Huensch, A., & Tracy-Ventura, N. (2017). Understanding second language fluency behavior: The ef-
fects of individual differences in first language fluency, cross-linguistic differences, and proficiency
over time. Applied Psycholinguistics, 38, 755–785.
Huensch, A., Tracy-Ventura, N., Bridges, J., & Cuesta-Media, J. (2019). Variables affecting the
Second Language Acquisition and International Education, 4, 96–125.
Leech, G. (2005). Adding linguistic annotation. In M. Wynne (Ed.), Developing linguistic corpora: A
guide to good practice (pp. 17–29). Oxford: Oxbow Books, http://users.ox.ac.uk/~martinw/dlc/
index.htm
MacWhinney, B. (2020). TalkBank and SLA. In N. Tracy-Ventura, & M. Paquot (Eds.), The Routledge
handbook of SLA and corpora. New York: Routledge.
MacWhinney, B. (2017). A shared platform for studying second language acquisition. Language
Learning, 67, 254–275.
MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk (3rd edn). Mahwah, NJ:
Lawrence Erlbaum.
Marsden, E., Mackey A., & Plonsky, L. (2016). The IRIS Repository: Advancing research practice and
methodology. In A. Mackey & E. Marsden (Eds.), Advancing methodology and practice: The IRIS
repository of instruments for research into second languages (pp. 1–21). New York: Routledge.
McEnery, T., Brezina, V., Gablasova, D., & Banerjee, J. (2019). Corpus linguistics, learner corpora,
and SLA: Employing technology to analyze language use. Annual Review of Applied Linguistics,
39, 74–92.
McEnery, T., & Hardie, A. (2012). Corpus linguistics: Method, theory and practice. Cambridge:
McEnery, T., & Wilson, A. (2001). Corpus linguistics (2nd edn). Edinburgh: Edinburgh University
Press.
McManus, K., Mitchell, R., & Tracy-Ventura, N. (2020). Longitudinal study of advanced learners’
linguistic development before, during, and after study abroad. Applied Linguistics. doi: 10.1093/
applin/amaa003.
McManus, K. & Mitchell, R. F. (2015). Subjunctive use and development in L2 French: A longitudinal
study. Language, Interaction and Acquisition, 6(1), 42–73.
Meunier, F., & Littre, D. (2013). Tracking learners’ progress: Adopting a dual ‘corpus cum experi-
mental data’ approach. The Modern Language Journal, 97, 61–76.
Mitchell, R., Tracy-Ventura, N., & Huensch, A. (2020). After study abroad: The long-term evolution of
multilingual identity among anglophone languages graduates. Modern Language Journal.,104(2),
327-344
Mitchell, R., Tracy-Ventura, N., & McManus, K. (2017). The Anglophone student abroad: Identity,
social relationships and language learning. New York: Routledge.
Myles, F. (2015). Second language acquisition theory and learner corpus research. In A. Granger, G.
Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 309–331).
Myles, F. (2008). Investigating learner language development with electronic longitudinal corpora:
Theoretical and methodological issues. In L. Ortega, & H. Byrnes (Eds.), The longitudinal study of
advanced L2 capacities (pp. 58–72). New York: Routledge.
Ortega, L., & Iberri-Shea, G. (2005). Longitudinal research in second language acquisition: Recent
trends and future directions. Annual Review of Applied Linguistics, 25, 26–45.
Perdue, C. (Ed.) (1993). Adult language acquisition. Vol 1: field methods. Cambridge: Cambridge
University Press.
Pérez-Paredes, P., & Díez-Bedmar, M. B. (2019). Certainty adverbs in spoken learner language: The
role of tasks and proficiency. International Journal of Learner Corpus Research, 5, 253–279.
125
Picoral, A. (2020). L3 Portuguese by Spanish-English bilinguals: Copula construction use and acqui-
sition in corpus data (Publication No. 27957666). [Doctoral dissertation, University of Arizona].
ProQuest Dissertations Publishing.
Polat, B. (2011). Investigating acquisition of discourse markers through a developmental learner
corpus. Journal of Pragmatics, 43, 3745–3756.
Rose, Y., & MacWhinney, B. (2014). The PhonBank project: Data and software-assisted methods for
the study of phonology and phonological development. In J. Durand, U. Gut, & G. Kristoffersen
(Eds.), The Oxford handbook of corpus phonology (pp. 380–401). Oxford: Oxford University Press.
Rosen, A. (2016). The fate of linguistic innovations: Jersey English and French learner English com-
pared. International Journal of Learner Corpus Research, 2, 302–322.
Römer, U., & Garner, J. R. (2019). The development of verb constructions in spoken learner English:
Tracing effects of usage and proficiency. International Journal of Learner Corpus Research, 5, 207–230.
Staples, S. (2021). Exploring the impact of situational characteristics on the linguistic features of spoken
oral assessment tasks. In W. Crawford (Ed.), Multiple perspectives on learner interaction: The corpus
of collaborative oral tasks (pp. 123–144) Berlin: DeGruyter.
Staples, S., LaFlair, G., & Egbert, J. (2017). A multi-dimensional comparison of oral proficiency in-
terviews to conversation, academic and professional spoken registers. Modern Language Journal,
101, 194–213.
Tracy-Ventura, N., & Huensch, A. (2018). The potential of publicly shared longitudinal learner corpora
in SLA research. In A. Gudmestad & A. Edmonds (Eds.), Critical reflections on data in second
language acquisition (pp. 149–170). Philadelphia/Amsterdam: John Benjamins.
Vercellotti, M. L. (2017). The development of complexity, accuracy, and fluency in second language
performance: A longitudinal study. Applied Linguistics, 38, 90–111.
Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., Sloetjes, H. (2006). ELAN: A professional
framework for multimodality research. In Proceedings of LREC 2006, Fifth international conference
on language resources and evaluation. https://tla.mpi.nl/tools/tla-tools/elan/
Zhao, G., Sonsaat, S., Silpachai, A., Lucic, I., Chukharev-Hudilainen, E., Levis, J. M., & Gutierrez-Osuna, R.
(2018). L2 ARCTIC: A non-native English speech corpus. Proceedings of interspeech (Hyderabad, India).
Appendix 8A List of Spoken Corpora

(modified from Huensch & Staples, 2018)
L2 Corpora and Datasets
1. BeMaTaC (Berlin Map Task Corpus) https://hu-berlin.de/bematac

2. DiapixFL https://datashare.is.ed.ac.uk/handle/10283/346
3. EuroCoAT (European Corpus of Academic Talk) http://www.eurocoat.es/web_
sections_1/the_corpus_eurocoat_the_european_corpus_of_academic_talk_12
4. FLLOC (French Learner Language Oral Corpora) http://www.flloc.soton.ac.uk/
5. The Hong Kong Bilingual Child Language Corpus http://www.cuhk.edu.hk/lin/home/
bilingual.htm
6. Hong Kong Corpus of Spoken English http://rcpce.engl.polyu.edu.hk/HKCSE/
7. IDEA (International Dialects of English Archive) http://www.dialectsarchive.com
8. IJAS (International Corpus of Japanese as a Second Language) https://chunagon.ninjal.ac.jp/
static/ijas/about.html
9. Japanese polite speech by native speakers and non-native speakers https://www12
0.secure.griffith.edu.au/research/items/b11042e5–7588-4d0d-b1ea-dad2320716cc/1/
10. Japanese learners’ conversations (contains OPI interviews with transcriptions) https://
nknet.ninjal.ac.jp/nknet/ndata/opi/
11. L2-ARCTIC https://psi.engr.tamu.edu/l2-arctic-corpus/
12. L2 Mandarin Chinese by non-native speakers https://www120.secure.griffith.edu.au/
research/items/9a3e0b74-20f8–4229-baf1-d9ec84d300da/1/
126
Spoken Corpora
13. LeaP (Learning Prosody in a Foreign Language) https://benjamins.com/#catalog/books/

hsm.14.03gut/details
14. LANGSNAP (Languages and Social Networks Abroad Project) http://langsnap.soton.
ac.uk/; http://scholarcommons.usf.edu/langsnap/
15. LINDSEI (Louvain International Database of Spoken English Interlanguage) https://
uclouvain.be/en/research-institutes/ilc/cecl/lindsei.html
16. MICASE (Michigan Corpus of Academic Spoken English) https://quod.lib.umich.edu/
m/micase/
17. NBTale (Norwegian database) https://www.nb.no/sprakbanken/show?serial=sbr-31&
lang=en
18. NIM (Spanish, English and Catalan) https://psico.fcep.urv.cat/utilitats/nim/eng/
about.php
19. PRESEEA http://preseea.linguas.net/
20. Speech Accent Archive http://accent.gmu.edu
21. Spin TX (Spanish in Texas) http://spanishintexas.org/https://www.coerll.utexas.edu/
spintx/home
22. SPLLOC (Spanish Learner Language Oral Corpora) http://www.splloc.soton.ac.uk/
23. Trinity Lancaster Corpus (http://cass.lancs.ac.uk/trinity-lancaster-corpus/)
24. Wildcat corpus http://groups.linguistics.northwestern.edu/speech_comm_group/wildcat/
25. VOICE (Vienna-Oxford International Corpus of English) https://www.univie.ac.at/
voice/
L1 Phonology Corpora
1. IViE (Intonational Variation in English) http://www.phon.ox.ac.uk/files/apps/IViE/

2. NoTa-Oslo (Norwegian Spoken Language Corpus) http://www.tekstlab.uio.no/nota/
oslo/english.html
3. PFC Programme (Phonologie du Français Contemporain: usages, variétés et structure)
https://www.projet-pfc.net/
4. PhonBank https://phonbank.talkbank.org/
5. TAUS (Spoken Language Investigation in Oslo) http://www.tekstlab.uio.no/nota/taus/
english.html
Other Widely Used Spoken Corpora
1. ANC (American National Corpus) http://www.anc.org/

2. BASE (British Academic Spoken English Corpus) https://warwick.ac.uk/fac/soc/al/
research/collections/base/
3. BNC (British National Corpus Audio Edition) http://www.phon.ox.ac.uk/AudioBNC
4. BYU Corpora https://corpus.byu.edu/
5. COCA (Corpus of Contemporary American English) https://corpus.byu.edu/COCA/
6. C-Oral-Rom (Benjamins) https://benjamins.com/#catalog/books/scl.15/main
7. ICE (International Corpus of English) http://ice-corpora.net/ice/index.html
8. Santa Barbara http://www.linguistics.ucsb.edu/research/santa-barbara-corpus
Appendix 8B Summary of Select Existing Corpora (modified from Huensch &

Staples, 2018)
127
Corpus Language/Proficiency Size Data/Annotations Strengths Weaknesses Task Access
BeMaTaC German L2 (n = 10); 18,123 Transcripts Rich metadata Limited PRON Info-gap Free; download
[advanced prof = C1/ (EXMARaLDA) Dialogic annotations ANNIS
C2]German NS (n Sound files (wav, mp3) Single task
= 24) Video files
(mov, webm)
CCOT English L2 (n = 268,324; 775 files TranscriptsSound Dialogic; No PRON 24 different tasks Free; contact creator
600);three proficiency files (wav) multiple L1s annotations (William.Crawfor-
levels [TOEFL 32–69] represented; Variable sound d@nau.edu)
multiple quality
tasks
represented
FLLOC Collection of 8 corpora(n 40,00 files>3 Transcripts (CHAT/ Large; Limited PRON Elicitation tasks; Free; download
= 491 participants, million words CLAN) Sound files Comparisons annotations Narratives;
aged 5–23) [varying (wav, mp3) POS- w/ SPLLOC Variable sound Interview
proficiency] tagged/MOR quality
HKCSE L1 Hong Kong Chinese, 900,214 words; Transcripts (txt) Brazil Large; Wide Highly proficient Business (e.g., job $125CD-ROM
(Prosodic) L2 English 311 recordings annotations range of L2 speakers No interview;
(n=643,286)[advanced (searchable through naturally sound files presentations,
proficiency] L1 English iConc interface) occurring service
128
(227,894) L1 Other tasks; encounters);
(29,064) Annotation Academic
using Brazil’s (student
system presentations,
lectures); Public
(speeches,
interviews)
LANGSNAP French L2 (n=29) 742,203 words; Transcripts (CHAT/ Longitudinal; Limited PRON Interview; Story re-
Free; download
Spanish L2 (n=27) 1,238 files CLAN)Sound files Controlled annotations tell; Essay
[advanced proficiency] (wav, mp3)POS- and free Variable sound
French NS (n=10) tagged/MOR tasks; quality
Spanish NS (n=10) Multiple L2s
LeaP L1 German, L2 English 73,941 words; 12 Transcripts (XML-based Detailed Low inter- Free speech in an Free; download
(n=176) L1 English, hours TASX format) segmental/ annotator interview;
L2 German (n=183) Syllables, segments, supraseg- agreement; Reading a story;
[advanced and pitch accents and mental Relatively small Retelling a story;
intermediate? boundary tones, annotation; Reading nonsense
proficiency] L1 English intonation contours, Controlled word list
(n=8) L1 part of speech, lemmas and free tasks
German (n=10) (Praat) Sound
files (wav)
LINDSEI ~50 files each of L2 > 1 million Transcripts (XML) Multiple L1s; Limited PRON Interview; Informal €211.75CD-ROM
English from the words; 792,000 Fluency annotations: Controlled annotations No chat; Picture
following L1s: of learner filled and unfilled and free tasks sound files description
Bulgarian, Chinese, language 554 pauses Proficiency
Dutch, French, interviews 130 information not
German, Greek, hours clear (only 5
Italian, Japanese, interviews from
Polish, Spanish, each group were
Swedish (Intermediate rated)
to Advanced [10 files
per L1 CEFR
evaluated])
Speech L1 and L2 English; ~350 2,642 samples 69 Phonetic transcription Extremely Read speech No One passage of read Free
Accent L1s [proficiency not words per broad range proficiency speech
Archive provided] sample of L1s; information
Comparabili-
ty across
samples
SPLLOC 2 corpora, each having: 575 files Transcripts (CHAT/ Comparisons w/ Limited PRON Elicitation tasks; Free; download
L2 Spanish (n=60) CLAN)Sound files FLLOC annotations Narratives;
[varying proficiency] (wav, mp3) POS- Variable sound Interview
129
L1 Spanish (n=15) tagged/MOR quality
Spoken Corpora
9
SPEAKING ASSESSMENT
Noriko Iwashita
In today’s globalized society, where the world is interconnected through vastly increased
trade and cultural exchanges, the demand for excellent communicative competence continues
to grow for employment, study, immigration, and travel opportunities. This gives rise to a
need for appropriate assessment tools for speaking. While speaking is one of the most valued
skills in language teaching (Lado, 1961), its assessment is relatively new compared with other
skills (Fulcher, 2003).
Since speaking assessment tools were first introduced in the early 1980s, their format has
undergone many changes. Following the prevalent assessment format over time, various
aspects of speaking assessment have been investigated to allow an appropriate inference of
test-takers’ language ability. The studies were initially conducted with large sample sizes in
quantitative studies. More recently, however, as test-takers’ interactional ability has been
increasingly recognized as an important element of speaking assessment, there has been a
shift from individualistic assessment to an approach that incorporates co-construction by
seeking for an alternative view of assessment that recognizes the fundamentally social nature
of language assessment (Roever & Kasper, 2018). This has resulted in a variety of qualitative
methods.
With advances in communication technology and its increased use in daily communica-
tion, computer-mediated tests have been implemented in large standardized tests (e.g.,
APTIS, Duolingo, PTE Academic, TOEFL iBT, Versant English Test) and automated
scoring system has been introduced to some of the large commercial tests. This new mode
of speaking assessment has brought challenges in terms of operationalizing the speaking
construct while acknowledging the benefits of overcoming the complexity in administering
speaking assessment (Galaczi & Taylor, 2018).
This chapter presents an overview of the current research on speaking assessment with a
historical overview. Several scholars have already contributed reviews on broader areas of
speaking assessment (e.g., Ginther, 2013; Isaacs, 2017; O’Sullivan, 2013). To complement
their reviews, this chapter focuses on studies that examined test-taker performance in
speaking assessment. Speaking assessment here includes monologue, conversation, and in-
terviews involving oral interaction with another person(s) who could be an examiner or
interviewer or fellow test-taker. Although other issues, such as washback and test use, are
130 DOI: 10.4324/9781003022497-12

Speaking Assessment
important areas of speaking research, the discussion in this chapter does not cover these
issues because of space constraints.
The direct format of speaking assessment, in which test-takers’ speaking ability is assessed
through face-to-face communication or in a form of monologue, was not introduced until the
early 1980s (Clark, 1979). It is worth noting that the format of speaking assessment corre-
sponded with the teaching methodologies employed at the time. For example, when a
structure-based curriculum was prevalent, speaking assessment consisted of an “indirect” test
where test-takers were required to identify sounds, words, or other aspects of the language in
a spoken text (e.g., dialogue). As a communicative-oriented approach to language teaching
became applied widely, direct assessment was gradually implemented. Over time, direct as-
sessment – referred to as performance assessment, where test-takers perform a task con-
sidering the use of language by test-takers outside the classroom, such as face-to-face
interviews, group orals – and semi-direct tests (i.e., test-takers are required to speak in re-
sponse to a prompt delivered by phone, audio-recording, or computer) have become
common (Harding, 2014). While performance assessments involving interaction with an
interviewer or other test-taker(s) are extensively implemented in both commercial tests [e.g.,
Cambridge Examination, the Trinity College Integrated Skills in English (ISE), IELTS] and
classroom assessments, it should be noted that traditional formats of speaking assessment
such as monologues and responding to prompts are also widely available in both computer-
mediated tests (e.g., APTIS, Duolingo, PTE Academic, TOEFL iBT, Versant English Test)
and in classroom assessment.
In the early days of language assessment and testing, scholars argued whether language
proficiency was unitary or multi-trait (Oller, 1979), in which language ability cannot be di-
vided up into components, but instead there is a general factor of language proficiency. It is,
however, now generally agreed that language proficiency is multi-componential (Bachman,
1990). Currently, the communicative language ability model proposed by Bachman (1990)
and Bachman and Palmer (1996) (based on Canale & Swain, 1980) is the best-known model
in the field of language assessment. The model, therefore, has been used widely as a basis for
many communicative language tests and frameworks (e.g., Common European Framework
of References – CEFR) although the model may not be explicitly mentioned. In this model,
the overall language competence is first comprised of organizational and pragmatic com-
petences, and then each competence is further categorized into several different types of
competence (i.e., organizational – grammatical and textual competences; or pragmatic –
illocutionary and sociolinguistic competences). The model provides a comprehensive list of
competences involved in language ability and is, therefore, considered as the most elaborate
model (Alderson & Banerjee, 2002).
Research in speaking assessment has investigated how various aspects of speaking per-
formance (e.g., test-taker, rater, task, and interlocutor) have an impact on the performance,
drawing on the models of speaking performance proposed by McNamara (1996), Fulcher
(2003), and Skehan (2009). In their models, the quality of the test performance that is re-
flected in the test score is dependent on not only the test-taker’s language ability, but also the
test tasks and rating. In other words, it is usually easier to do the task when the topic is
familiar than when it is unfamiliar, but also some test-takers may find it easier to respond to
interview questions than present a speech. Task conditions such as preparation (planning)
time or access to prompt materials may influence the difficulty of the task. For rating, test
scores are dependent on what is described on the rating scale (e.g., delivery, grammatical
131
Noriko Iwashita
accuracy), and they are also impacted by the raters’ leniency/harshness. In essence, these
scholars suggest that test performance is interpreted in terms of the interactions among
the abilities of the test-taker, the task, and the rating involved in speaking assessments. These
models were developed incrementally and are not mutually exclusive, although the emphasis
in each model differs slightly.
In early research on speaking assessment, studies were mainly conducted in the context of
the oral proficiency interview (OPI), developed by the Foreign Service Institute (FSI) in the
United States and its associated US Government agencies, which is known as the ACTFL
Oral Proficiency Interview (ACTFL OPI). The construct of speaking assessment, rating, and
test scores were the main focus of their investigations.
For example, some scholars investigated how different aspects of language ability con-
tribute to overall communicative language ability. This line of enquiry was undertaken
largely in the context of rating scale development and contributed to clarifying the construct
of language proficiency through the analyses of test scores and/or teachers’ and/or assessors’
perceptions of test-taker performances. Higgs and Clifford (1982), for instance, analyzed
raters’ perceptions of the relative role of the five component factors constituting global
proficiency (i.e., vocabulary, grammar, pronunciation, fluency, and sociolinguistics).
Subsequently, they proposed a relative contribution model (RCM), arguing that various
aspects of language contribute differently to overall language proficiency at the levels defined
in the FSI scale. According to the RCM, vocabulary and grammar contribute to overall
proficiency across all levels, but as levels increase, some aspects, such as pronunciation,
fluency, and sociolinguistic factors become more prominent than other factors. In the de-
velopment of the ACTFL Guidelines based on the FSI scale, proficiency was described in
terms of communicative growth comprising four main contributors: function, content,
context, and accuracy (Breiner-Sanders et al., 1999). Other studies focused their investiga-
tions on communicative language ability through tasks (Raffaldini, 1988), evaluation of the
construct validity of ACTFL guidelines (Alonso, 1997), and rater behaviour (Thompson,
1995). These studies were conducted across levels (secondary to university) and languages
(e.g., Mandarin Chinese, English, German, Japanese, and Spanish).
As the OPI became widely implemented, scholars questioned whether test-takers’ con-
versational ability could be satisfactorily assessed with an OPI (e.g., van Lier, 1989).
Accordingly, employing discourse and conversation analysis methods, various aspects of
OPI were investigated including the resemblance of an OPI and a conversation, and the
nature of communication observed in an OPI as a speech event (e.g., He, 1998; Johnson &
Tyler, 1998; Lazaraton, 1992). These studies made a significant contribution to uncovering
the characteristics of spoken interactions observed in an OPI. They concluded that the
speech elicited in this context cannot be considered to reflect a conversation in real life be-
cause of the lack of natural conversation features, including turn-taking management and
topic negotiation (e.g., Lazaraton, 1992).
The limitations of OPI reported in the studies resulted in the exploration of alternative
modes of speaking assessment such as group/paired interviews in both in-house classroom
and standardized tests. In the group/paired interview format of speaking assessment, test-
takers are asked to perform the task with another test-taker (paired interview) or in a group
with/without the interviewer’s presence. The introduction of pair/group interviews in large
commercial tests and classroom contexts was motivated by practicality in terms of cost, time,
and resources (e.g., Ockey, 2009; Van Moere, 2006). Also, the pair/group interview format is
similar to learners’ engagement experiences in their classroom activities (e.g., Taylor, 2000),
resulting in positive washback in the classroom for both students and teachers. In addition,
the pair/group interview is considered to be more authentic for test-takers’ real-life situations
132
Speaking Assessment
and, therefore, may provide a closer link between test results and the target language use
situation (Bachman & Palmer, 1996).
As the pair/group interview format was introduced, various aspects of this format of
speaking assessment have been investigated. The earlier research focused the effect of in-
terlocutor variables on test-taker performance in terms of score and features identified in
discourse analyses. The variables include interlocutor status (i.e., interviewer/examiner or
test-taker) (e.g., Brooks, 2009; Taylor, 2000), personality traits (e.g., Nakatsukahara, 2011;
Ockey, 2009), proficiency (e.g., Davis, 2009; David et al. 2018; Iwashita, 1996; Lazaraton,
1992; Nakatsukahara, 2006), and familiarity (e.g., O’Sullivan, 2002). On the whole, positive
findings in the use of pair interview formats have been reported. In some studies, however,
the quality of language produced in this format of assessment was observed according to
interlocutor variables (e.g., proficiency, familiarity, and personality), but it was reported that
variables such as personality traits were harder to control (Iwashita, 2019). Furthermore,
some studies found that the features identified in the analysis of test-discourse were not
always found in the rating scale (e.g., Brooks, 2009; Nakatsuhara, 2006; Taylor, 2000), which
raised a question about the validity of pair/group interviews.
In the broader context of communicative language tests, there is ongoing research interest
in the aspects of speaking in the speaking performance models introduced earlier. This in-
cludes how task content and format (i.e., independent vs. integrated, e.g., Frost et al., 2012;
Iwashita et al., 2008); implementation conditions (i.e., planning time, e.g., Elder & Iwashita,
2005); ratings (i.e., holistic vs. analytic scale, rater background, rater cognition, automated
scoring, e.g., Ducasse & Brown, 2009); and test-taker attributes (i.e., L1, working memory
capacity, personality trait, and test-taking strategy, e.g., Crossley & Kim, 2019) influence test
performance in the context of both commercial and classroom/in-house tests. These studies
were undertaken in both monologic and interactional tasks in the context of face-to-face or
computer-mediated settings.
As explained earlier, recent advances in communication technology have contributed to the
wide implementation of computer-mediated and automated scoring systems. Accordingly,
some studies investigated the equivalence of different modes of the oral proficiency interview
(e.g., SOPI and OPI) (e.g., O’Loughlin, 1995), while others explored test-takers’ reaction to
face-to-face interviews and computer-mediated tests (e.g., Qian, 2009). Similarly, for rating, the
comparability of automated scoring with human ratings has been explored (e.g., Neumeyer
et al., 2000). While the empirical findings from studies in computer-based assessment and
automated scoring systems indicate that they overcome some challenges in face-to-face as-
sessment (i.e., cost, administration, subjective rating, e.g., Isaacs, 2017), the findings clearly
show limitations in the use of these modes of assessment and scoring systems. In particular,
because of technological limitations, it is difficult to assess the interactional aspects of speaking
because of the unidirectional characteristics of computer-based assessment, which results in a
narrowing of the construct in this mode of assessment. Further discussion will be presented in
the Current Contributions and Research Part.
In summary, the format of speaking assessment has undergone many changes in tandem
with the development of teaching methodology and technological advances. While both
commercial and classroom assessments employ a traditional mode of speaking assessment
(i.e., monologue), speaking assessment involving face-to-face interaction is increasing.
Furthermore, technological advances have contributed to the wider implementation of
computer-mediated testing, and automated scoring which is now being used for high-stake
purposes. The development of a variety of performance assessment and implementation of
computer-mediated tests has resulted in studies that have contributed to identifying those
aspects of a test that potentially influence performance. In particular, the introduction of
133
Noriko Iwashita
pair/group interviews has challenged the existing construct of speaking proficiency by

incorporating interactional competence (IC) into the construct. In addition, computer-based
assessment opens new dimensions for speaking assessment and contributes to further
interpretation of speaking proficiency.

In earlier studies on speaking assessment, research was conducted largely drawing on the
psycholinguistics view of language testing with an emphasis on individual performance.
More recently, however, the field has acknowledged the contextual contribution to perfor-
mance, known as the non-cognitive view of performance, and has become critical of
individualistic views of test performance with little consideration of the context (Chalhoub-
Deville, 2003; O’Sullivan & Weir, 2011). The co-constructed nature of interactions has been
recognized and the construct of speaking proficiency has been revisited in relation to IC.
Further, introduction of computer-mediated assessment brought new challenges re-
garding construct definition incorporating interactional aspects of speaking into assessment.
That is, the assessment format of computer-mediated tests is mostly monologue largely
drawing on psycholinguistic definitions of the speaking construct which emphasize the
cognitive dimension of speaking. In the following part, a discussion on two emerging
issues in the assessment of speaking – assessment of IC and incorporation of IC in computer-
mediated testing – will be presented, together with a review of current research.

Kramsch (1986), who first introduced IC to the field of applied linguistics, was critical of the
notion of proficiency widely adopted in foreign language education in the United States
because the existing models largely ignored communication skills such as listener responses,
use of non-verbal cues and the inherently co-constructed nature of the interaction. Building
on Kramsch’s seminal work, many scholars have elaborated on the concept of IC in the
context of pedagogy and assessment. For the past 30 years, it has been acknowledged that IC
comprises knowledge about the language in general, as well as specific features appropriate
to a certain context, knowledge about the context of the communication (including roles of
individuals in an interaction), and the ability to deploy this knowledge for communication
(Hall & Pekarak Doehler, 2011; Ross, 2018; Young, 2008; 2011).
As the concept of IC has been further elaborated, scholars have explored ways to in-
vestigate IC in speaking assessment. For example, Roever and Kasper (2018) recently
identified features of IC referring to generic properties of interaction such as turn-taking to
perform actions, open/close interactions, and conversational repair. They also identified how
participants adopt various devices of conversation according to specific interactional con-
texts involving interlocutors. Whether participants can employ appropriate turn-taking and
open/close the interaction depends on their interlocutors’ interactional behaviours, thus re-
flecting the inherently co-constructed nature of the interaction.
These articulations of the concept of IC could have a significant impact on task design,
scoring, and inferences made about test-takers’ ability from the score. Existing assessment
tasks largely focus on individual ability, guided by psychometrics (Roever & Kasper, 2018).
With increasing use of pair and group interviews, an interactionist view of assessment has
been employed in task design, drawing on non-cognitive perspectives. Assessment tasks
based on non-cognitive perspectives elicit purposeful interactive language use to provide
evidence about test-takers’ ability to tailor their talk to their interlocutor and context for
134
Speaking Assessment
communication (Roever & Kasper, 2018). Commercial tests such as Cambridge English:
First, Trinity College ISE, and the Examination for the Certificate of Competency in English
incorporate IC explicitly or implicitly in the rating scale, such as assessing interactive aspects
of task fulfilment, turn-taking, including initiation and elaboration of turns, repair, and topic
negotiation.
While increased attention has been drawn to non-cognitive views of assessment, this type
of assessment has also attracted criticisms including difficulties with standardization, and the
impact of interlocutors’ interactional behaviour on test-takers (Galaczi & Taylor, 2018).
There is a long tradition of research on interactional behaviours in both face-to-face inter-
views and pair/group tasks. For example, McNamara and Lumley (1997) and Brown (2003)
reported a significant impact from interlocutors’ behaviours, such as questioning techniques
and rapport, on the test-takers. More recently, Roever and Kasper (2018) and Ross (2018)
showed that how the interviewers/examiners shape the interaction (e.g., question technique,
rapport, topic shift) determines test-takers’ opportunities to demonstrate their interactional
competence.
In contrast to the current trend of adopting a non-cognitive view in test development and
research, the recent development of computer-mediated tests largely draws on the psycho-
linguistics view of assessment (Galaczi & Taylor, 2018). In psycholinguistic-based assess-
ments, the language elicited in the test performance is not usually embedded in interaction, as
social interaction is not included in the assessment criteria. Within this perspective, the
speaking construct is described in terms of efficiency of processing and automaticity (referred
as to “near effortless processing of language”, p. 326) with strong emphasis on the cognitive
dimension of speaking in monologue (Van Moere, 2012). Van Moere (2012) is critical of the
language assessment literature for paying little attention to automaticity as a characteristic of
a competent L2 speaker. Further, he recommends that, to assess well-defined psycho-
linguistic constructs, tasks should require evidence of a test-taker’s capacity to handle the
language (i.e., morphology, syntax, and lexis) that would be processed in real-life domains.
This perspective of assessment considers interactional strategies (such as turn-taking and
organizing ideas) as social skills, and incorporation of these skills into speaking assessment
consequently may become less standardized and reliable. Furthermore, incorporation of
these strategies may lead to the inclusion of other factors such as personality (referred to as a
“construct irrelevant trait”) (Van Moere, 2012). It should be noted that many native speakers
with poor turn-taking or other interactional strategies still able to communicate well.
Similarly, according to Kormos (2006) and Dörnyei and Kormos, 1998), based on the speech
processing production model (Levelt, 1989), communicative success largely depends on an
individual speaker’s ability to employ linguistic resources.
Current Studies on Interactional Competence

Research on IC in speaking assessment was initially born out of the validation studies of
pair/group interview tests (Alderson & Banerjee, 2002). Earlier studies mainly identified
features of IC observed in paired or group interaction assessment in the classroom. More
recently, the features of IC observed in different levels of performances have been examined
to investigate whether some features could be explicitly incorporated into existing rating
scales (e.g., Galaczi, 2014; Lam, 2018; Roever & Kasper, 2018). For example, Galaczi (2014)
analyzed the interaction by learners of different proficiency in a pair speaking test and
identified key distinguishing interactional features (i.e., topic development, turn-taking, and
listener support across levels. Similarly, Lam (2018) found some differences in the way
students construct their responses as contingent on a previous speaker’s contribution in his
135
Noriko Iwashita
analyses of group assessment tasks in Hong Kong employing conversation analysis method.
Roever and Kasper (2018) also found both quantitative and qualitative differences in the use
of preliminaries (i.e., the conversational move which occurs before initiative actions such as
invitations, announcements, and requests) between higher and lower L2 speakers. While
some differences of the features under study across the levels were observed in the studies
identified earlier, how the features observed in their studies represent IC remains unclear
(Galaczi & Taylor, 2018).
Features of IC have also been examined in rater studies through verbal protocols col-
lected during rater assessment of paired interview performances for validation of rating
scales. For example, May (2009) reports that 12 native speaker raters identified three key
features of the interaction (i.e., collaborating, cooperating, and assisting each other).
Similarly, Ducasse and Brown (2009) found non-verbal interpersonal communication (use
of body language and gaze), interactive listening (the test-takers’ manner of displaying
attention or engagement), and interactional management (the management of the topics
and turns) as the main features of interaction. These findings revealed that the interaction
aspect of performance reported in the raters’ verbal report is rarely mentioned in the scale
in general and suggested a scale that reflects the complexities of IC in a paired speaking test
to assess test-takers’ IC. Building on earlier studies, May et al. (2020) recently analyzed the
stimulated verbal reports on paired interactions, focusing on interactional features of test-
taker performance. Through thematic analysis of examiner comments, 9 main categories
and over 50 sub-categories were identified. These three studies support the claim of Fulcher
et al. (2011) for a performance drive approach (PDA) that could provide a richer de-
scription of test-taker performance.
Although the majority of IC studies in speaking assessment draw on sociolinguistics for
theoretical orientation, a range of interactional strategies (e.g., negotiation of meaning, re-
sponding to clarification requests) observed in test performance (e.g., Ramazani et al., 2018;
van Batenburg, Oostdam et al., 2018) has been investigated on a basis of psycholinguistic
and cognitive underpinnings. For example, van Batenburg et al. (2018) examined test-takers’
abilities to employ self-supporting and other-supporting strategies in the performance of six
dialogic tasks with a scripted speech by pre-vocational learners in the Netherlands using both
holistic (focusing on linguistic accuracy and interactional ability) and analytic categories (in
terms of compensation, meaning negotiation and correcting misinterpretation) based on the
CEFR descriptors (Council of Europe, 2001). The high levels of agreement among the three
raters led the researchers to justify the assessment of interactional ability by examining in-
dividual test-takers’ use of strategies; they concluded that IC can be regarded as an individual
trait and an integral part of speech production.
Although psycholinguistic-based research has contributed to the current assessment of IC
literature, there are concerns about employing this view (e.g., Roever & Kasper, 2018) be-
cause of its focus on individual performance and its treatment of interactional features of
performance as construct irrelevant variables (Van Moere, 2012). The same concern has been
raised about the construct drawn from the current format of computer-based tests because it
does not include interactions, unlike face-to-face interviews (e.g., Plough et al., 2018).
However, new developments have been proposed to overcome these limitations through the
introduction of video-conference technologies such as Skype and Zoom, and speech re-
cognition systems to provide opportunities for interaction.
The video-conference modes enable both speaking and visual input to occur, which may
tap into interactional resources (Iwashita et al., 2021). Further, according to Davis et al.
(2018), incorporating face-to-face interaction into computer-based platforms with these
devices potentially diversifies task types in these assessments. Accordingly, a small number of
136
Speaking Assessment
studies have examined the feasibility of this new development in existing testing by com-
paring test performance with face-to-face interviews.
Kim and Craig (2012) compared test-taker performance in face-to-face and video-
conferenced oral interviews and found no significant difference in either overall or analytic
scores between the two test modes. Qualitative analysis of test performance also revealed
comparability in the two modes in terms of comfort, computer familiarity, environment,
non-verbal linguistic cues, interests, speaking opportunities, and topic/situation factors, with
little interviewer effect.
More recently, building on earlier studies, Nakatsuhara et al., (2017) compared perfor-
mances of IELTS speaking in face-to-face and video-conference modes in terms of the scores
and features observed in discourse analysis. While the scores were identical, some differences in
the language test-takers used in the two modes were found. In the video-mediated mode, more
clarifications were requested in Speaking Part 1 (i.e., test-takers answer questions about family,
work, and interests) and Part 3 (i.e., test-takers answer questions relating to the topic that the
test-taker spoke about in Part 2) suggesting that in video-conference mode, skills such as in-
teractive listening and signalling communication breakdown may be required. The impact of
the two modes on the interaction was further investigated in examiners’ use of verbal and non-
verbal cues, such as back-channelling, nodding, eye contact, and other gestures. The analysis
revealed that examiners used different cues according to the interaction mode. Further, ex-
aminers said that turn-taking and eye contact are challenging in video-conference mode. The
differences between face-to-face and video modes that emerged in this study point to the in-
tegral role of IC in speaking construct as well as the context-dependent nature of IC.
Davis et al. (2018) investigated the feasibility of delivering interactive speaking assessment
online through the perceptions of participants in the United States and China and their per-
formances on speaking tasks involving a moderator and two or three participants. Turn-
taking, collaboration (i.e., sharing the floor), engagement (i.e., contributing to the elaboration
of topics), and appropriateness (i.e., communicating in a pragmatically appropriate way) were
included in the scoring rubric. As reported in Nakatsuhara et al. (2017), technological stability
and test-taker familiarity with access to technology were challenges. Analysis of test-taker
performance revealed limited ranges of visual input available to participants in a face-to-face
interaction through video. However, considering that conversations and job interviews often
take place via video-conference, Davis et al. (2018) recommend that this assessment mode be
introduced along with research on spoken interaction in this new format.
In addition to face-to-face interviews through video-conferencing, other recent develop-
ments include speaking assessment with use of the virtual environment, where test-takers are
required to participate in a discussion in a library setting and interact with avatars (Ockey
et al., 2017), and speech technologies that enable an electronic device to recognize and
analyze spoken words such as a speech recognition device (e.g., Litman et al., 2018).
Attempts to incorporate IC in computer-mediated testing are promising, but there are
challenges in terms of stable internet connection and test-takers’ access to and familiarity
with devices. Furthermore, research has revealed a new aspect of IC in online environment
contexts other than a face-to-face environment, as shown by examiners’ interactional be-
haviours, should be considered.
In summary, the growing recognition of IC as an integral part of speaking assessment has
resulted in studies identifying features of IC, incorporation of IC competence into tasks, and
rating scales incorporating IC. The research findings have contributed to further under-
standing of IC in speaking assessment, but at the same time have raised several challenges
(Galaczi & Taylor, 2018) regarding the definition of IC; the scalability of these features in
rating scales, and the potentially task-dependent nature of the features of IC. These
137
Noriko Iwashita
challenges have added further complexities to incorporating IC into tasks and rating scales in
the growing number of computer-mediated tests, considering the current limitations of
communication technology. Developments have been made by introducing video-conference
mode in face-to-face interviews, the use of a virtual environment, and speech recognition
systems. The findings of the small number of research projects have shown promise for the
future of incorporating technology in the assessment of IC in computer-mediated tests.

With the introduction of new types of speaking assessment and subsequent research, there
has been a shift in research methods in speaking assessment research. Early research on the
assessment of speaking employed quantitative methods with large sample sizes. The data
collected for investigation were largely test scores. Several sophisticated statistical analyses
(e.g., Many-Racets Rasch Measurement, Generalizability Theory, Structural Equation
Modelling, Differential Item Functioning) were employed (e.g., McNamara, 1996; Bolus
et al., 1981; Purpura, 1998; Elder, 1996, respectively). Although quantitative methods with
large sample sizes are still employed, as the field grows and multiple theoretical perspectives
are introduced, scholars have sought a variety of methods used in applied linguistics and
have analyzed data other than test scores, including verbal reports, and rater notes.
Qualitative methods with small sample sizes or mixed methods are not uncommon in
current speaking assessment research. In particular, Conversational Analysis has been widely
implemented to identify characteristics of conversation and IC (e.g., Galaczi, 2008; Lam,
2018). Although the fine-tuned analyses of the Conversational Analysis method have re-
vealed many IC features, an issue of generalizability of Conversational Analysis-based stu-
dies is also reported (e.g., Marian & Balaman, 2018). An increasing number of studies have
opted for automated analyses using software such as Co-Metrix (e.g., Crossley & Kim, 2019).

Along with the development of teaching methodologies and the recent advancement of com-
munication technology, there has been an increasing need for appropriate assessment tools for
communicative competence in the globalized societies. The empirical studies reported earlier
contribute to our understanding of the characteristics of speaking proficiency. The findings of
the studies provide some implications for classroom assessment practice. For example, multi-
componential and differential development of varied aspects of speaking proficiency (e.g., Higgs
& Clifford, 1982) imply different aspects of speaking proficiency (e.g., grammatical accuracy,
vocabulary – type and token; production features – fluency, intelligibility and comprehensi-
bility; discoursal features), and can be focused according to the level of test-takers. For instance,
for beginners, the focus on assessment could be accuracy and range of grammar forms and
vocabulary items as well as intelligibility. As the level goes up, the assessment focus could be
text level (discourse) using tasks that require interviewing, making a speech, etc.
Further, various aspects of speaking tasks (e.g., content, availability of prompt materials,
topic, genre) and implementing conditions (e.g., planning time) have an impact on test-taker
performance (e.g., Frost et al., 2012; Huang et al., 2018). These findings provide practitioners
with useful information for developing assessment tasks.
Research shows that features of natural conversation are elicited more in a pair/group
interview than an interview with an examiner (e.g., Brooks, 2009). However, it is also ac-
knowledged that test-takers’ successful communication is contingent on context and on
138
Speaking Assessment
conversational partners (Ross, 2018; Young, 2011). It is, therefore, important to help lear-
ners understand the context of communication (e.g., situation, interlocutor, the speaker’s
role), and to deploy the language knowledge specific to the context.
Finally, the increased implementation of large, commercial, computer-mediated speaking
tests has increased test-taker access, but communication via technology results in different
communication strategies being employed in video-mediated interviews (Davis et al, 2018;
Nakatsuhara et al., 2017). Considering that communication via technology is a part of our
daily lives, teachers are encouraged to include this new context in their teaching materials and
test preparation. With an increased demand for global communication and advancement of
communication technology, practitioners are expected to devise appropriate speaking assess-
ments, considering context and test-takers’ needs. Successful implementation of appropriate
assessment tools requires continuing collaboration between practitioners and researchers.
7 Future Directions
The field has come a long way since the direct format of speaking assessment was introduced in
early 1980. The historical overview of the development of types of speaking assessment and
research contributions presented here clearly shows that contextual variables (e.g., current
teaching methodology, test-users’ needs in terms of societal trends, and advancement of
communication technology) play a prominent role in deciding on the test format and its focus
in the assessment, its impact on curriculum, and vice versa. Incorporation of IC into speaking
assessments and technological advancements including computer-mediated tests and auto-
mated scoring systems have challenged scholars to revise existing constructs of speaking
proficiency. Considering that test validity (i.e., appropriateness for its purpose, use and impact)
is a core business of language assessment research, validation research is expected to continue
to draw on varied theoretical perspectives, using a range of research methods.
Further Reading
Fulcher, G. (2003). Testing second language speaking. Harlow: Longman/Pearson Education.
A comprehensive overview of the issues involved in second language speaking tests incorporating
practice and theory.
Galaczi, E. D. & Taylor, L. (2018). Interactional competence: Conceptualisations, operationalisations, and
outstanding questions. Language Assessment Quarterly, 15(3), 219–236. doi:10.1080/15434303.2018.1453816
An historical and current overview of IC in the context of spoken language use which discusses its
operationalization in tests and assessment scales, posing several challenges associated with this activity.
Harding, L. (2014). Communicative language testing: Current issues and future research, Language
Assessment Quarterly, 11(2), 186–197, doi:10.1080/15434303.2014.895829
Harding discusses a range of current issues and future research directions in Communicative Language
Testing (CLT) based on key questions which emerged at the CLT symposium at the 2010 Language
Testing Forum. He suggests a reinvigorated communicative approach that focuses on “adaptability” in
language testing, and several future research directions.
Nakatsuhara, F., Inoue, C., Berry, V., & Galaczi, E. (2017). Exploring the use of video-conferencing
technology in the assessment of spoken language: A Mixed-Methods study. Language Assessment
Quarterly 14(1), 1–18, doi:10.1080/15434303.2016.1263637
This study investigated the comparability of test performance in Internet-based video-conferencing
technology and standard face-to-face modes. It presents a comprehensive overview of video-mediated
testing and reports the findings of similarities and differences in performance under the two modes.
Van Moere, A. (2012). A psycholinguistic approach to oral language assessment. Language Testing,
29(3), 325–344. doi:10.1177/0265532211424478
A framework for incorporating the assessment of psycholinguistic constructs into spoken language
proficiency testing.
139
Noriko Iwashita
References
Alderson, C. & Banerjee, J. (2002). Language testing and assessment (Part 2). Language Teaching, 35,
79–113. doi:10.1017/S0261444802001751
Alonso, E. (1997). The evaluation of Spanish-speaking bilinguals’ oral proficiency according to ACTFL
guidelines (trans. from Spanish). Hispania, 80(2), 328–341.
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford, UK: Oxford University
Press.
Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice. Oxford, UK: Oxford University Press.
Bolus, R. E., Hinofotis, F. & Bailey, K. M. (1981) An introduction to generalizability theory in second
language research. Language Learning, 32(1), 245–258. doi:10.1111/j.1467-1770.1982.tb00970.x
Breiner-Sanders, K. E., Lowe, P. Jr., Miles, J., & Swender, E. (1999). ACTFL Proficiency guidelines –
speaking revised 1999. Foreign Language Annals, 33(1), 13–18. doi:10.1111/j.1944-9720.2000.tb00885.x
Brooks, L. (2009). Interacting in pairs in a test of oral proficiency: Co-constructing a better perfor-
mance. Language Testing, 26(3), 341–366. doi:10.1177/0265532209104666
Brown, A. (2003). Interviewer variation and the co-construction of speaking proficiency. Language
Testing, 20(1), 1–25. doi:10.1191/0265532203lt242o
Canale, M. & Swain, M. (1980) Theoretical bases of communicative approaches to second language
teaching and testing. Applied Linguistics, 1(1), 1–47. doi:10.1093/applin/I.1.1
Chalhoub-Deville, M. (2003). Second language interaction: Current perspectives and future trends.
Language Testing, 20, 369–383. doi:10.1191/0265532203lt264oa
Clark, J. L. D. (1979). Direct vs. semi-direct tests of speaking ability. In E. J. Briere & F. B. Hinofotis
(Eds.), Concepts in language testing: Some recent studies (pp. 35–49). Washington, DC: TESOL.
Crossley, S. A., & Kim, You Jin. (2019). Text integration and speaking proficiency: Linguistic, in-
dividual differences, and strategy use considerations. Language Assessment Quarterly, 16(2),
217–235. doi:10.1080/15434303.2019.1628239.
Council of Europe. (2001). Common European framework of reference for languages: Learning,
teaching and assessment. Cambridge, U.K.:Cambridge University Press.
Davis, L., Timpe-Laughlin, V., Gu, L. & Ockey, G. (2018). Face-to-face speaking assessment in the
digital age: Interactive speaking tasks online. In J. M. Davis, J. Norris, M. Malone, T. McKay, & Y.
A. Son (Eds.), Useful assessment and evaluation in language education (pp. 115–130). Washington,
DC: Georgetown University Press.
Davis, L. (2009). The influence of interlocutor proficiency in a paired oral assessment. Language
Testing, 26(3), 367–396.
Dornyei, Z., & Kormos, J. (1998). Problem-solving mechanisms in L2 communication. Studies in
Second Language Acquisition, 20(3), 349–385.
Ducasse, A. M., & Brown, A. (2009). Assessing paired orals: Raters’ orientation to interaction.
Language Testing, 26(3), 423–443. doi:10.1177/0265532209104669
Elder, C. A. (1996). The effect of language background on foreign language test performance: The case
of Chinese, Italian, and Modern Greek. Language Learning, 46(2), 233–282. doi:10.1111/j.1467-
1770.1996.tb01236.x
Elder, C. & Iwashita, N. (2005). Planning for test performance: Does it make a difference? In R. Ellis
(Ed.), Planning and task performance in a second language (pp. 219–238) Amsterdam, Netherlands:
John Benjamin.
Frost, K., Elder, C. A., & Wigglesworth, G., (2012). Investigating the validity of an integrated listening-
speaking task: A discourse-based analysis of test takers’ oral performances. Language Testing, 29(3),
345–369. doi:10.1177/0265532211424479
Fulcher, G. (2003). Testing second language speaking. New York, NY: Routledge.
Fulcher, G., Davidson, F., & Kemp, J. (2011) Effective rating scale development for speaking tests:
Performance decision trees. Language Testing, 28(1), 5–29.
Galaczi, E. D. (2008). Peer–peer interaction in a speaking test: The case of the First Certificate in
English examination. Language Assessment Quarterly, 5(2), 89–119. doi:10.1080/15434300801934702
Galaczi, E. D. (2014). Interactional competence across proficiency levels: How do learners manage
interaction in paired speaking tests? Applied Linguistics, 35(5), 553–574. doi:10.1093/applin/amt017
Galaczi, E. D. & Taylor, L. (2018). Interactional competence: Conceptualisations, operationalisations,
and outstanding questions. Language Assessment Quarterly, 15(3), 219–236. doi:10.1080/
15434303.2018.1453816
140
Speaking Assessment
Ginther, A. (2013). Assessment of speaking. In C. A. Chapelle (Ed.), The encyclopedia of applied lin-
guistics. Hoboken, NJ: Blackwell. doi:10.1002/9781405198431.wbeal0052.
Graham, C. R., Lonsdale, D., Kennington, C., Johnson, A., & McGhee, J. (2008). Elicited imitation as
an oral proficiency measure with ASR scoring. Proceedings of the sixth international conference on
language resources and evaluation (LREC 2008) (pp. 1604–1610). Marrakech, Morocco: LREC.
Hall, J. K., & Pekarak Doehler, S. (2011) L2 interactional competence and development, In J. K. Hall,
J. Hellermann, & S. Pekarak Doehler (Eds.), L2 Interactional competence and development
(pp. 1–15). Clevedon, OH: Multilingual Matters.
Harding, L. (2014). Communicative language testing: Current issues and future research, Language
Assessment Quarterly, 11(2), 186–197. doi:10.1080/15434303.2014.895829
He, A. W. (1998). Answering questions in LPIs: A case study, In Young, R. & He, A. W. (Eds.),
Talking and testing: Discourse approaches to the assessment of oral proficiency, studies in bilingualism
(Vol. 14, pp. 10–16). Amsterdam, Netherlands: John Benjamins.
Higgs, T. & R. Clifford. (1982). The push towards communication. In T. V. Higgs (Ed.), Curriculum,
competence, and the foreign language teacher (pp. 57–79). Lincolnwood, IL: National Textbook
Company.
Huang, Heng-Tsung Danny, Hung, Shao-Ting Alan & Plakins, L. (2018). Topical knowledge in L2
speaking assessment: Comparing independent and integrated speaking test tasks. Language Testing,
35(1), 27–49. doi:10.1177/0265532216677106
Isaacs, T. (2017). Fully automated speaking assessment: Changes to proficiency testing and the role of
pronunciation. In O. Kang, R. I. Thomson, & J. M. Murphy (Eds.), The Routledge handbook of
contemporary English pronunciation (pp. 570–584). New York, NY: Routledge.
Iwashita, N. (1996). The validity of the paired interview format in oral performance assessment.
Melbourne Papers in Language Testing, 5(2), 51–65.
Iwashita, N. (2019). Peer interaction assessment: Overview of research and directions. In Carsten
Roever and Gillian Wigglesworth (Ed.), Social perspectives on language testing: Papers in honour of
Tim McNamara (pp. 105–120) Berlin, Germany: Peter Lang.
Iwashita, N., Brown, A., McNamara, T., & O’Hagan, S. (2008). Assessed levels of second language
speaking proficiency: How distinct? Applied Linguistics, 29, 24–49. doi:10.1093/applin/amm017
Iwashita, N., May, L. & Moore, P. (2021). Operationalising interactional competence in computer-
mediated speaking tests. In M. R. Salaberry & A. R. Burch (Eds.), Assessing speaking in context -
expanding the construct and its applications (pp. 283-302). Bristol, UK: Multilingual Matters.
Johnson, M. & Tyler, A. (1998). Re-analyzing the OPI: How much does it look like natural
conversation? In R. Young & A. W. He (Eds.), Talking and testing: Discourse approaches to the
assessment of oral proficiency, Studies in Bilingualism (Vol. 14, pp. 27–51). Amsterdam, Netherlands:
John Benjamins.
Kormos, J. (2006). Speech production and second language acquisition. Mahwah, NJ: Lawrence Erlbaum
Associates.
Kim, J. & Craig, D. A. (2012). Validation of a video conferenced speaking test.Computer Assisted
Kramsch, C. (1986). From language proficiency to interactional competence. The Modern Language
Journal, 70(4), 366–372. doi:10.1111/modl.1986.70.issue-4
Lado, R. (1961). Language testing: The construction and use of foreign language tests. London, UK:
Longman.
Lam, D. M. K. (2018). What counts as ‘responding’? Contingency on previous speaker contribution as a
feature of interactional competence. Language Testing, 35(3), 377–401. doi:10.1177/0265532218758126
Lazaraton, A. (1992). The structural organization of a language interview: A conversation analytic
perspective. System, 20(3), 373–386. doi:10.1016/0346-251X(92)90047-7
Litman, D., Strik, H. & Lim, G. (2018). Speech technologies and the assessment of second language
speaking: Approaches, challenges and opportunities. Language Assessment Quarterly, 13(3),
294–309. doi:10.1080/15434303.2018.1472265
Marian, K. S., & Balaman, U. (2018). Second language interactional competence and its development:
An overview of conversation analytic research on interactional change over time. Language and
Linguistics Compass, 18(8). doi:10.1111/lnc3.12285
May, L. (2009). Co-constructed interaction in a paired speaking test: The rater’s perspective. Language
Testing, 26(3), 397–421. doi:10.1177/0265532209104668
141
Noriko Iwashita
May, L., Nakatsuhara, F., Lam, D., Galaczi, E. (2020). Developing tools for learning oriented as-
sessment of interactional competence: Bridging theory and practice. Language Testing, 37(2),
165–188.
McNamara, T. F. (1996). Measuring second language performance. London, UK: Longman.
McNamara, T. F. & Lumley, T. (1997). The effect of interlocutor and assessment mode variables in
overseas assessments of speaking skills in occupational settings. Language Testing, 14(1), 140–156.
doi:10.1177/026553229701400202
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd edn, pp. 13–103). New
York, NY: American Council on Education & Macmillan.
Nakatsuhara, F. (2006). The impact of proficiency level on conversational styles in paired speaking
tests. Cambridge ESOL Research Notes, 25, 15–20.
Nakatsuhara, F. (2011). Effects of test-taker characteristics and the number of participants in group
oral tests.Language Testing, 28(4), 483–508.
Nakatsuhara, F. (2013). The co-construction of conversation in group oral tests. Frankfurt/Main,
Germany: Peter Lang.
Nakatsuhara, F., Inoue, C., Berry, V., & Galaczi, E. (2017). Exploring the use of video-conferencing
technology in the assessment of spoken language: A Mixed-Methods study. Language Assessment
Quarterly 14(1), 1–18. doi:10.1080/15434303.2016.1263637
Neumeyer, L., Franco, H., Digalakis, V., & Weintraub, M. (2000). Automatic scoring of pronunciation
quality. Speech Communication, 30, 83–94. doi:10.1016/S0167-6393(99)00046-1
O’Loughlin, K. (1995). Lexical density in candidate output on direct and semi-direct versions of an oral
proficiency test. Language Testing, 12(2), 217–237. doi:10.1177/0265532208101010
O’Sullivan, B. (2002). Learner acquaintanceship and oral proficiency test pair-task performance.
Language Testing, 19(3), 277–295.
O’Sullivan, B. (2013). Assessing speaking. In J. Kuna (Ed.), The companion to language assessment.
New York, NY: John Wiley, & Sons. doi:10.1002/9781118411360.wbcla084.
O’Sullivan, B., & Weir C. J. (2011). Test development and validation. In B. O’Sullivan (Ed.), Language
testing: Theories and practices (pp. 13–32). Basingstoke, UK: Palgrave Macmillan.
Ockey, G. J. (2009). The effects of group members’ personalities on a test taker’s L2 group oral dis-
cussion test scores. Language Testing, 26(2), 161–186. doi:10.1177/0265532208101005
Ockey, G. J., Gu, L. & Keehner, M. (2017). Web-based virtual environments for facilitating assessment
of L2 oral communication ability. Language Assessment Quarterly, 14(4), 346–359. doi:10.1080/
15434303.2017.1400036
Oller, J. W., Jr. (1979). Language tests at school: A pragmatic approach. London, UK: Longman
Plough, I., Banerjee, J. & Iwashita, N. (2018). Interactional competence: Genie out of the bottle.
Language Testing, 35(3), 427–455. doi:10.1177/0265532218772325.
Purpura, J. (1998). Investigating the effects of strategy use and second language test performance with
high- and low-ability test takers: A structural equation modelling approach. Language Testing,
15(3), 333–379. doi:10.1177/026553229801500303
Qian, D. (2009). Comparing direct and semi-direct modes for speaking assessment: Affective effects on
test takers. Language Assessment Quarterly, 6(2), 113-125.
Raffaldini, T. (1988). The use of situation tests as measures of communicative ability. Studies in Second
Language Acquisition, 10(2), 197–216.
Ramazani, M., Behnam, B., Ahangari, S. (2018). Psychometric characteristics of a rating scale for
assessing interactional competence in paired-speaking tasks at micro-level. The Journal of English
Language Pedagogy and Practice 11(23), 180–206. doi:10.30495/jal.2019.664545
Roever, C. & Kasper, G. (2018). Speaking in turns and sequences: Interactional competence as a target
construct in testing speaking. Language Testing, 35(3), 331–355. doi:10.1177/0265532218758128
Ross, S. J. (2018). Listener response as a facet of interactional competence. Language Testing 35(1),
357–375. doi:10.1177/0265532218758125
Skehan, P. (2009). Modelling second language performance: Integrating complexity, accuracy, fluency,
and lexis. Applied Linguistics, 30(4), 510–532. doi:10.1093/applin/amp047.
Taylor, L. (2000). Investigating the paired speaking test format. Cambridge ESOL Research Notes,
2, 14–15.
Thompson, I. (1995). A study of interrater reliability of the ACTFL Oral Proficiency Interview in five
European languages: Data from ESL, French, German, Russian and Spanish. Foreign Language
Annals, 28(3), 407–422. doi:10.1111/j.1944-9720.1995.tb00808.x
142
Speaking Assessment
van Batenburg, E., Oostdam, R., van Gelderen, A., & de Jong, N. (2018). Measuring L2 speakers’
interactional ability using interactive speech tasks. Language Testing, 35(1) 75–100. doi:10.1177/
0265532216679452
van Lier, Leo (1989). Reeling, writhing, drawling, stretching and fainting in coils: Oral proficiency
interviews as conversations. TESOL Quarterly, 23, 489–508. doi:10.2307/358692
Van Moere, A. (2006). Validity evidence in a university group oral test. Language Testing, 23, 411–440.
doi:10.1191/0265532206lt336oa
Van Moere, A. (2012). A psycholinguistic approach to oral language assessment. Language Testing,
29(3), 325–344. doi:10.1177/0265532211424478
Young, R. (2008). Language learning and discursive practice. Language Learning 58, 135–181.
doi:10.1111/j.1467-9922.2009.00492.x
Young, R. (2011). Interactional competence in language learning, teaching and testing. In E. Hinkel
(Ed.), Handbook of research in second language teaching and learning (Vol. 2, pp. 426–443).
New York, NY: Routledge.
143
PART III
Core Topics
10
PRONUNCIATION LEARNING AND
TEACHING
Several features are eye-catching when two people meet for the first time. Age, gender, race,
height, weight – these are all very noticeable. Another characteristic that jumps out as soon
as two people talk is pronunciation, which can index a speaker’s social class, regional
background, and education. A second language accent is immediately discernable; in fact,
Flege (1984) determined that people can sometimes recognize a non-native utterance with
only 30 milliseconds of exposure. Moreover, even if recordings are played backwards,
listeners can reliably distinguish between native and non-native speech (Munro et al.,
2010). “Pronunciation,” or the way people speak (including their production of individual
sounds, prosody, speech rate, and voice quality) is highly salient to listeners. Surprisingly,
then, over a period of about 30 years in the past century, applied linguists and many
language teachers paid relatively little attention to second language (L2) pronunciation,
thinking that learners’ productions would improve with massive amounts of input.
However, in the 21st century, pronunciation has gone from a minor issue in L2 research
journals, to a highly prominent topic that now garners tremendous interest (Levis &
Sonsaat, 2020). Significant numbers of adult L2 learners, though, have always been con-
cerned with their pronunciation; one need only look online at the proliferation of accent
reduction programs to see that. “Accentedness” or the degree to which one’s speech differs
from that of a local community, is not the most important speech dimension for social
interaction; “intelligibility” or the degree to which a listener understands the speaker’s
intention, and “comprehensibility” the degree of effort required of a listener to understand
a speaker’s message (also known as processing fluency in social psychology) are far more
critical to successful communication. It is quite possible to have a very heavy accent and yet
be fully intelligible and easy to understand (Derwing & Munro, 1997; Munro & Derwing,
1995a). The identification of the features of an accent that interfere with comprehensibility
and intelligibility is the key to helping L2 speakers. Some differences in an L2 speaker’s
productions, though highly salient, have little or no impact on listener understanding,
while others cause communicative breakdowns. Another central aspect of pronunciation
research is “fluency” or the flow of speech, that is, the degree to which a speaker can talk
without noticeable pauses mid-clause or phrase and dysfluencies such as repetitions and
false starts (Derwing et al., 2004; Kahng, this volume).
DOI: 10.4324/9781003022497-14 147

Although phonetics, the study of speech, dates back more than two millennia, many people’s
first introduction to it is George Bernard Shaw’s (1912) play, Pygmalion, or its movie adap-
tation, My Fair Lady, in which the character of Henry Higgins changes the pronunciation of a
lowly flower seller to that of an aristocrat. It has been suggested that Higgins was based on
Henry Sweet and Daniel Jones, both British phoneticians from Shaw’s day. Although the
Pygmalion story revolved around L1 markers of class, interest in helping L2 speakers change
their pronunciation to conform to a local standard had existed for hundreds of years. An early
volume addressing the notorious spelling system in English was also aimed at improving the
pronunciation of “outlanders” (Price, 1665). The rise of phonetics in Britain and elsewhere in
the 20th century led to a stronger focus on L2 pronunciation. David Abercrombie, a former
student of Daniel Jones, was initially interested in L1 phonetics, but unlike the fictional
Higgins, he was not at all concerned with promoting the use of Received Pronunciation (Kelly,
1993). This acceptance of difference was evident in his approach to L2, which he summarized
as follows: “I believe that pronunciation teaching should have, not a goal which must of
necessity be normally an unrealized ideal, but a limited purpose which will be completely
fulfilled; the attainment of intelligibility” (Abercrombie, 1949, p. 120).
Abercrombie’s commitment to intelligibility was not always sustained in the language
teaching community. The Audiolingual Method (Lado, 1964), a type of language teaching
which stressed native-like pronunciation and correct grammar, took hold first in the United
States, but then spread across the world in the 1950s and 1960s. Audiolingualism was based
on the imitation of exemplars; language students repeated dialogues until they were mem-
orized. Concomitant developments in technology meant that language labs gave teachers the
opportunity to assign hours of “listen and repeat” exercises. The format of the Audiolingual
method was a boon to teachers who did not have a good grasp of the L2 themselves, but on
the downside, students found it to be extremely boring (Flanders & Nuthall, 1972). A re-
action to its monotony appeared in the 1970s, with the rise of what have come to be known
as the “designer methods” including Suggestopedia, The Silent Way, Total Physical
Response, and Community Counselling Learning (see Richards & Rodgers, 2014 for in-
depth descriptions of these methods). The Silent Way (Gattegno, 1972), in particular, was
geared to nativelike pronunciation; in fact, learners’ attention was focused on individual
sounds to a greater extent than any other approach to language teaching. Each of the de-
signer methods required extensive training on the part of the instructor, and eventually they
were superseded by the Communicative Approach to Language Teaching in the early 1980s.
Early proponents of this approach argued that accurate L2 pronunciation would develop
naturally with sufficient input (Krashen, 1982). Furthermore, as it became clear that adult L2
learners were unlikely to ever achieve a native-like accent, the whole issue of pronunciation
teaching was seen by many as pointless (Murphy & Baker, 2015). A small cadre of expert
practitioners, including Judy Gilbert, Joan Morley, Clifford Prator and Betty Robinett,
continued to press for more attention to L2 pronunciation instruction (PI), but, for the most
part, it was abandoned in the L2 classroom. Applied research on L2 learners’ phonological
development became scarce.
In sum, Abercrombie’s plea for comfortably intelligible pronunciation was largely for-
gotten, replaced by concerns for perfectly native-like productions, and then by the notion
that massive exposure was the only way to improve a learner’s speech. In our view, the lack
of research to support the arguments of experienced practitioners was problematic. The
recent renewal of interest in L2 pronunciation is a direct result of empirical evidence de-
monstrating the importance of intelligibility and comprehensibility over accentedness.
148
Pronunciation Learning and Teaching
Moreover, it has become clear that individual differences in learning trajectories are far more
varied than was previously believed (Derwing & Munro, 2015). Inventories of pronunciation
difficulties, such as Swan and Smith (2001) and Nilsen and Nilsen (2010) identify problems
for people from different L1 backgrounds, but often these are over-simplifications that do
not hold for many learners (Munro, 2018).
The past two decades have witnessed an upsurge in L2 pronunciation research, including a
special issue of TESOL Quarterly in 2005, the establishment of an annual conference in 2009,
Pronunciation in Second Language Learning and Teaching, the appearance of a dedicated
academic journal, Journal of Second Language Pronunciation, in 2015, and a dedicated strand
at the American Association of Applied Linguistics, starting in 2018. The first three of these
four developments were initiated by John Levis, who has both led and pushed the applied
study of pronunciation forward. For an in-depth historical account of pronunciation
teaching, see Murphy and Baker (2015).

Pronunciation research is broadly divided into two areas: naturalistic and interventionist.
Naturalistic studies provide descriptions of learners’ productions. In some cases, these are
snapshots at given times in cross-sectional work, such as Trofimovich and Baker’s (2006)
examination of three groups of Korean speakers of English whose arrival in the United
States was 3 months, 3 years, and 10 years prior to the study. Naturalistic investigations can
also trace pronunciation development using longitudinal methods. For example, Derwing
and Munro (2013) followed two groups of adult L2 speakers for more than 7 years.
Interventionist studies usually involve some sort of instruction, either with a teacher or using
technology. Some research is “local” in that it addresses a very narrowly defined issue, such
as a particular segment or prosodic element, whereas other interventionist studies are
“global,” in that the instruction includes many elements, and progress is usually assessed
through comprehensibility and fluency ratings. Although interesting from the standpoint of
learnability, local studies do not always focus on aspects of phonology that interfere with
either comprehensibility or intelligibility, but often probe features that are of interest to
researchers, sometimes for theoretical reasons (Thomson & Derwing, 2015).
For years, it was assumed that speakers of a given L1 would exhibit the same problems when
pronouncing a shared L2, but it has become apparent that individual differences are far more
extensive than previously believed (Derwing & Munro, 2015; Munro, 2021; Munro et al., 2015).
This has major implications for teaching pronunciation. For instance, even though the Russian
sound inventory does not include aspirated plosives, one cannot assume that every Russian
learner of English will have difficulty learning to aspirate an English initial /p/. In fact, Munro
and Derwing (in preparation) determined that some learners were able to produce an accurate
initial /p/ very early in L2 acquisition, while others quickly acquired it without intervention. Still
other learners were slow, and some did not produce intelligible tokens after 7 years.
Even further complications in individual performance were observed by Munro (2021).
He found that performance on English vowels by Cantonese speakers was not uniform
across or even within phonetic contexts. For example, speakers who produced an intelligible
/ɪ/ in “hit” did not necessarily produce the same vowel intelligibly in “sit.” Moreover, some
speakers produced “sit” intelligibly but not “hit” while others showed exactly the reverse
pattern. These results suggest that segmental learning is far more idiosyncratic than pre-
viously thought. The reasons for this lack of uniformity may be highly complex. A learner’s
difficulty with a particular word, for example, might be the result of fossilization early in the
acquisition process. It may also be tied to the particular speaker model to which the learner
149
was exposed and to the learner’s perceptual and production capabilities at the time of
learning. It is thus important not to focus exclusively on the collective performance of groups
of learners, but to examine individual learning trajectories. Error hierarchies, in particular,
can be misleading. To be effective, instructors should carry out a needs analysis for each
student, placing the emphasis on difficulties that interfere with intelligibility or compre-
hensibility. The functional load principle, “a measure of the work which two phonemes (or a
distinctive feature) do in keeping utterances apart” (King, 1967, p. 831), can be applied at
this level. Conducting needs analyses, however, requires a knowledge foundation that entails
a basic background in phonetics and an understanding of the functional load principle.
Furthermore, effective teaching requires skill in listening, at both segmental and supraseg-
mental levels, to isolate problematic elements needing intervention. Unfortunately, formal
training in these areas is not available for many prospective language teachers (Foote et al.,
2011; Huensch, 2019).
Another critical issue in the area of pronunciation is the reluctance of L2 speakers to
receive instruction from non-native teachers. Often, they express a preference for native
speakers, yet as Levis et al. (2016) demonstrated in a controlled comparison of two pro-
nunciation classes, one taught by a native speaker and the other by a nonnative teacher,
listeners assigned equivalent comprehensibility ratings to students in both classes. As
Derwing and Munro (2005) pointed out, the key to effective language teaching is not tied to
the instructor’s native speaker status. Rather, appropriate pedagogical training, linguistic
knowledge, and proficiency in the language taught all contribute to success, such that L1
status is immaterial if the other factors are met. In fact, given reasonably easy access to
global resources featuring a wide range of accents, learners should be encouraged to un-
derstand the value of seeking out a variety of speech models, using multimedia technology,
for their own perception and production of English.
A critical area needing considerably more research is the determination of the relative
importance of various accent features to intelligibility and comprehensibility. Hahn (2004)
examined the effect of primary stress on intelligibility. By comparing appropriate stress with
misassigned and monotone utterances, she found that the latter two types of productions
impeded understanding. Her work provides a valuable model for exploring other elements of
L2 speech. Munro and Derwing (2006) used listener judgements of comprehensibility to
determine the relative importance of some English segments on the basis of functional load.
They found empirical evidence for what several practitioners had hypothesized: some errors
matter much more than others. Similar hierarchies should be explored in other L2s.
Increasingly, researchers have realized that pronunciation and other aspects of language
interact. Varonis and Gass (1982) were the first to examine the relationship between
grammar and listeners’ perceptions of L2 pronunciation, and others have now pursued this
in more depth (see Ruivivar and Collins, this volume). An exploratory study of the effects of
pragmatics instruction on comprehensibility demonstrated that using predictable pragma-
linguistic formulas facilitates listeners’ ease of understanding with no change to pro-
nunciation (Derwing et al., 2021). Yates (this volume) calls for more attention to the
interaction of pragmatics and pronunciation.

Levis (2020) surveyed changes in the field of L2 pronunciation research over the past quarter
century. Despite important innovations, recent trends indicate a continued strong interest in
many late 20th-century issues. With the gradual demise of the nativeness principle,
comprehensibility-based studies have become increasingly common (Kennedy & Trofimovich,
150
2019), with some work extending the concept to interactive speaking situations (Crowther,
2020). Concern about the role of international speech varieties in communication, especially
World Englishes, also continues to command attention (Kang et al., 2019; Llurda, this vo-
lume). Much of that work focuses on international intelligibility, and the long-standing issue of
choosing pronunciation models for instruction and assessment (Levis, 2018).
Meanwhile, a new emphasis is being placed on individual differences in learner performance,
which, as noted earlier, appear to be much greater than previously suspected. Wade et al. (2021)
for example, show how overemphasis on mean performance of groups can obscure important
subtleties in individual L1 pronunciation features. In their study, idiosyncratic voice onset time
was found to be stable in individual talkers. In other words, between-speaker variability was not
simply noise, but was a reliable feature of the individuals’ speech. This phenomenon is likely to
extend to L2 production and needs to be taken into account in assessing L2 speech.
With respect to technological innovations, notable developments include work on
computer-assisted pronunciation teaching (CAPT). One especially intriguing CAPT concept
is the “golden speaker,” a synthetic rendition of the L2 learner’s own voice, with the pro-
nunciation characteristics of a native speaker of the L2. Findings from Ding et al. (2019)
point to its effectiveness in improving learners’ comprehensibility and fluency. In other work,
Garcia et al. (2020) reported improvements in an intervention study over 15 weeks (one
group received traditional instruction and the other group used ASR technology). The
traditional instruction group showed long-term benefits in comprehensibility, while the ASR
approach was found to be more effective at targeting individual phonemes. The authors
recommend using a hybrid approach to maximize benefits for learners.
It is now well-established that L2 speech perception is closely linked to performance in
production (see Thomson, this volume). Moreover, evidence from High Variability Phonetic
Training (HVPT) demonstrates the effectiveness of instruction in one domain (perception)
on performance in another (production) (Thomson, 2018). An important issue that has
arisen in this research is the failure of the training to generalize to some new phonetic
contexts (Thomson, 2011). This outcome is consistent with Munro (2021) and provides
further evidence that L2 phonemes do not emerge simultaneously across the entire lexicon.
This has very important ramifications for teaching in that we cannot assume that teaching a
particular sound in one context will generalize to another. Rather, language instructors need
to be aware of pronunciation difficulties experienced by their learners at the lexical level. All
vocabulary instruction, for instance, should incorporate a pronunciation component to en-
sure that L2 learners acquire the appropriate production of new words, along with their
meanings (see Horst, this volume).
It has been clear for some time that corrective feedback can benefit pronunciation learners,
but a recent innovative study (Martin & Sippel, 2021) examined feedback in a novel way. The
authors compared four groups of learners of German in a pronunciation training study: one
group received teacher feedback; one group provided feedback to their peers (according to a
checklist designed by the researchers, which focused on the segments targeted in the instruc-
tion); one group were the recipients of feedback from their peers; finally, a control group was
included, who received neither the pronunciation intervention nor corrective feedback.
Recordings of each group were assessed for comprehensibility before and after the 3 weeks of
training. The three intervention groups all performed better than the control group, but the
group with the most improvement was the peer provider group, followed by the teacher
feedback group. Apparently, the engagement required to provide pronunciation feedback to
classmates had a significant and salutory effect on the learners’ own pronunciation.
Research on pronunciation assessment has lagged far behind other aspects of speaking
assessment (Isaacs & Trofimovich, 2016). Until recently, assessment of speaking in high
151
stakes tests tended to provide descriptors in lockstep progression, culminating in native-like

mastery. This has changed, as tests such as the TOEFL iBT now refer to intelligibility, rather
than nativeness. Still, most high stakes tests and proficiency descriptors, such as the Common
European Framework of Reference (CEFR) developed by the Council of Europe (2018)
continue to assume that pronunciation will improve incrementally with proficiency gains.
Although there is some indication that higher proficiency is associated with better pro-
nunciation, far more individual variation exists than is reflected in documents such as
the CEFR.
Given the current interest in World Englishes, the reliance on Received Pronunciation and
General American English in large-scale tests has been decried not only as unfair, but as
reverting to nativeness criteria. In response to these criticisms, researchers have investigated
the possible consequences of including several varieties of English on a high stakes test (Kang
et al., 2019). Because such a change may have huge ramifications for testees, work continues
on this front (Kang et al., 2020).

Crowther (2021) provides a detailed overview of methods used in L2 speech measurement.
Most discussions of research methods in the field concern the elicitation and evaluation of
speech samples. With respect to the first of these, Munro and Derwing (2020) observe that
researchers must take care in selecting both target content and manner of elicitation. Targets
may range from individual words to sentence-length utterances or much longer spoken texts,
depending on the aspects of speech under study. If the focus is “local” phenomena, such as
vowel intelligibility, accuracy of lexical stress or articulation of consonant clusters, short
stretches of speech are typically used, but have the disadvantage of not necessarily re-
presenting spontaneous utterances. To assess “global” speech characteristics, such as
rhythm, intonation, comprehensibility or fluency, longer samples of connected speech must
be obtained. Manner of elicitation refers to the specific tasks performed by speakers during
recording. Here, the chief concerns – most notably the trade-off between experimenter
control and ecological validity – are the same as for other types of speaking research and are
described in detail in Nagle et al. (this volume).
Evaluating L2 pronunciation from recorded speech samples requires specialized techni-
ques. Because intelligibility, comprehensibility and accentedness arise from listeners’ ex-
periences of speech, direct assessment of these dimensions is possible only through listener
judgements. For intelligibility, the method of choice is listener transcriptions of sentence-
length utterances, scored in terms of words correct, an approach widely used across the
speech sciences (Schiavetti, 1992). Other measures, such as comprehension questions ad-
ministered after spoken texts (Hahn, 2004) and true/false evaluations of sentences (Munro &
Derwing, 1995b) have also been employed.
Scalar ratings of comprehensibility and accentedness have been used in a wide array of
research types, including longitudinal studies (Derwing & Munro, 2013), interventions
(Gordon & Darcy, 2016), investigations of between-listener differences (Jułkowska &
Cebrian, 2015) and attitudinal studies (Reid et al., 2019). Typically, audio recordings are
presented to listeners who rate each recording on one or more scales. Samples of 20 to 30
seconds are known to yield reliable ratings, while greater durations increase task time and
may impose an unacceptable burden on listeners that compromises validity. Moreover,
longer durations have little likelihood of giving more “accurate” results, because listeners
cannot possibly track and recall all the variables affecting pronunciation in long speech
samples (Munro & Derwing, 2020).
152
The nature of rating scales for these tasks has been the subject of considerable discussion.
Munro (2018) found that both comprehensibility and accentedness are amenable to equal-
interval scaling with numbered points. For comprehensibility, a numbered nine-point scale
anchored with the labels “very easy to understand” and “very difficult to understand” has
often been used with reliable results. However, the number of points to which listeners are
actually able to resolve their ratings is unknown. Some researchers have successfully used
anchored but otherwise unnumbered quasi-continuous scales with up to 1,000 underlying
points (Reid et al., 2019). The gradations, however, are not visible to the rater, who simply
clicks a location on a seemingly continuous line on the computer screen. Current evidence
does not indicate any advantage of one approach over the other in terms of listener relia-
bility, though large scales have the added benefit of suitability to mixed-effects statistical
modelling (Huensch & Nagle, 2021). Also, whether listeners evaluate multiple target con-
structs simultaneously or carry out the ratings for each construct separately does not appear
to affect results (O’Brien, 2016).
Indirect investigation of intelligibility and comprehensibility can sometimes be accom-
plished using acoustic measurements and techniques from artificial intelligence (Dalby &
Kewley-Port, 1999). While automatic speech recognition shows promise as a pedagogical
tool for the future, acoustic measurements are prone to serious misuse if researchers do not
have extensive training in phonetics. No straightforward characteristic, or combination of
characteristics, of the acoustic speech signal is known to correspond to any of the global
measures of L2 speech. As a result, acoustic assessment of global intelligibility would require
advance knowledge of all the specific acoustic dimensions that influence listeners, an im-
possible expectation. While acoustic measurements of segments can provide useful in-
formation if expertly carried out and interpreted, testing their validity requires listener
judgements. To illustrate the relevant dangers, imagine an intervention study in which vowel
quality is taught, and pre- and post-intervention measurements reveal a change in second
formant frequencies toward native-like values. In the first place, a change of this type in no
way guarantees improved intelligibility; only listener assessments can establish such a con-
nection. In the second, a measurable “improvement” in one acoustic dimension may be offset
by a worsening in some other dimension, perhaps one not actually evaluated by the re-
searcher, such that the changes together yield a net impact of zero on intelligibility. Given
these complexities, it is unsurprising that Chan and Hall (2019) found that the degree of
acoustic deviations from native vowel norms failed to account for listeners’ perceptions.
Recent innovations in L2 speech measurement illustrate intriguing refinements on com-
monly used procedures. One proposal is to use multiple measures to gain simultaneous,
complementary perspectives on the same speech material. Kang et al. (2018), for instance,
compared five different assessment types to determine the strengths of each. One of these,
commonly used in the study of pathological speech, was listener transcriptions of nonsense
utterances, which require a focus on segmental phonemic details, while filtered speech was
more suited to a suprasegmental analysis. A recently developed approach to speech ratings
entails dynamic assessment, in which listeners give multiple judgements over time as they
listen to speech samples (Nagle et al., 2019). While this early research suggests considerable
interrater variability in such tasks, more work remains to be done.

The past two decades have provided some helpful preliminary directions to those who teach
L2 pronunciation. First, it is now generally accepted that the instructional goal should not be
native-like productions, but rather, that PI should focus on enhancing intelligibility and
153
comprehensibility. To achieve that aim, however, teachers require some expertise in identi-
fying aspects of L2 speech that interfere with listeners’ understanding, as opposed to ele-
ments of an accent that may be salient but which have little or no effect on conveyance of
meaning. Language teacher preparation programs should include courses on how to teach
pronunciation, and those courses should minimally provide ample opportunity for trainees
to assess L2 speakers’ utterances for intelligibility and comprehensibility (Derwing, in press;
Murphy, 2017). Such courses should also touch on research that sheds a light on features
shown to affect these two dimensions, such as functional load (Munro & Derwing, 2006),
primary stress (Hahn, 2004), and speech rate (Munro & Derwing, 2001).
A great deal of PI is directed at learners who may be assumed to have plateaued in their
learning with little expectation of further improvement without intervention. Although PI
can be effective even then (Derwing et al., 2014), it makes more sense to intervene early if
intelligibility is at risk. Derwing and Munro (2015) have identified the first months of massive
exposure to the L2 as the “Window of Maximal Opportunity” –the period when naturalistic
improvements in pronunciation are most prominent, before speech patterns become en-
trenched. It stands to reason that PI should be introduced as early as possible to take ad-
vantage of this window. Zielinski and Yates (2014) make a compelling argument for starting
PI with beginner learners and provide many suggestions for how to do so. Moreover, PI
should not be restricted to a stand-alone course (although in some settings, a dedicated
pronunciation course is entirely warranted). Pronunciation should be integrated, not only
into speaking courses, but in any language course where L2 speakers have comprehensibility
issues, just as incidental vocabulary is introduced across the curriculum in all types of L2
courses.
A consequence of the recognition that comfortable intelligibility should be the goal is that
many L2 learners do not need PI. Some have a high aptitude for pronouncing well in an L2
(see Mora, this volume) and therefore require minimal help. It has become clear, however,
that individual differences are far more prevalent and diverse than was originally believed. It
is also evident that learning segments in one particular phonetic context does not necessarily
generalize to others. It is thus imperative that learners be given supports that are customized
to their needs at multiple levels (segmental, lexical, phrasal, and so on). Technology offers
considerable promise in this regard, especially for segmentals. Thomson (2018) reviewed over
30 studies which showed the efficacy of HVPT, yet most teachers are unfamiliar with this
technique. Developments such as the English Accent Coach (Thomson, 2021) can provide
learners with access to HVPT training on precisely those segments causing difficulty.
Although numerous L2 pronunciation resources are available online, many provide ad-
vice that is misleading or simply wrong (see Derwing & Munro, 2015 for examples). A key
service that instructors can offer their students is a frank caveat emptor discussion regarding
accent reduction scams and well-meaning but ill-informed coaches. As more authoritative
resources appear online (e.g., pronunciationforteachers.com) and in print, we can hope for a
reduction in opportunism that exploits learners.
6 Future Directions
Teaching
Despite the numerous articles that have appeared in the past two decades examining the
learnability of various phonemic contrasts (Thomson & Derwing, 2015), the increase in
teaching materials, and many studies on teacher cognition (Murphy & Baker, 2015), few
teacher preparation programs include courses on how to teach L2 pronunciation. Until
154
language programs have staff who are well-versed in identifying their students’ commu-
nication problems and addressing them accordingly, students will continue to look elsewhere
to improve their productions. With the turn to online courses in many educational institu-
tions as a result of COVID, perhaps for-credit teacher preparation courses at reputable
institutions, offered by leading experts, will become available.
In view of the current lack of adequate pronunciation courses, one approach to help
learners in the short term is technological assistance for self-study. The English Accent
Coach (Thomson, 2021), for example, employs HVPT to assist learners with perception of
segmentals, but perhaps a more macro level of exposure to suprasegmentals could be in-
corporated in similar software. A key feature of HVPT underlying its efficacy is its utilization
of multiple talkers for training stimuli. Parallel variability may also improve suprasegmental
training. Similar platforms should be developed for other heavily studied languages as well
(e.g., Mandarin and Spanish). Other apps have also been shown to benefit both perception
and production, but as Fouz-González (2020) points out, students are unlikely to spend
much money on apps. It is imperative, then, that evidence-based apps should be attractive to
potential users in terms of both cost and functionality. Some of the technological tools de-
veloped thus far are expensive to maintain and require buy-in from funders and colla-
boration across disciplines (O’Brien et al., 2018).
As for classroom-based instruction, promising studies such as Galante and Thomson
(2017) offer direction for effective activities. More studies like theirs, using a range of ac-
tivities, participants from different L1s, and learners of different L2s would be useful. Direct
collaboration among teachers and researchers could encourage incorporation of effective PI
in the classroom, as was the case with Rojczyk (2015) who showed that students’ imitation of
an English accent while speaking their L1, Polish, benefited their pronunciation of English.
Martin and Sippel’s (2021) research demonstrates that full engagement can lead to better
noticing of one’s own pronunciation. Learners who are called on to provide peer feedback
appear to monitor and modify their own productions to a greater extent than if they relied
solely on the teacher’s corrections. In a workplace pronunciation course, Derwing et al.
(2014) noted that once an atmosphere of trust was established, learners felt comfortable
listening to and correcting their peers. Employing peer feedback with guidance from the
teacher may have positive implications for classroom teaching going forward, with the caveat
that the quality of peer feedback be supervised by the instructor.
Research
The future of pronunciation research looks bright; many more scholars are engaged in this
area than just a decade ago. Calls for replications of existing research in languages other than
English are now being answered (e.g., Huensch & Nagle, in press for Spanish; Zhang &
Yuan, 2020 for Mandarin). More longitudinal research has also started to appear (e.g.,
Huensch et al., 2019), as have explorations of relatively new data collection approaches, such
as crowdsourcing, using Mechanical Turk (Nagle, 2019). Each of these streams is promising
and we look forward to seeing far more work in these areas. O’Brien et al. (2018) offer
extensive lists of recommendations for pronunciation research, particularly with reference to
the use of various technologies. Other areas are worthy of exploration as well, including a
focus on the interaction of different approaches to PI. Extensive analyses of the relative
contributions of PI techniques, both individually and in combination would have practical
consequences for the classroom.
Assessment is another issue to which researchers are turning their attention (e.g., Isaacs &
Trofimovich, 2016; Kang et al., 2018) but this field needs considerably more exploration,
155
especially in view of the complexity of World Englishes (Kang et al., 2020). Hansen Edwards
et al. (2020) call for the development of a diagnostic tool that teachers could use to identify
learner needs. Further, an interest in regional dialect acquisition by L2 learners and the
factors that determine uptake is worthy of probing (Schoonmaker-Gates, 2020).
Researchers should think well into the future when designing studies with the possibility
of multi-purposing of data. Rather than conducting one-off studies that focus on a very
limited research question, some strategic planning to maximize the use of data for several
purposes (prior to the ethics submission stage) would be useful. Contributions to existing
corpora, for example, could be collected at the same time as the data for addressing a
particular question. A related consideration is the benefit of collaborative work which may
allow for more uses of the same set of data.
Finally, pronunciation learning needs to be contextualized by the recognition that ev-
eryone speaks with an accent. And nearly everybody communicates with people from outside
their own speech community. Rather than focus on discriminatory reactions to accent, re-
searchers are encouraged to investigate ways to improve listening capacity in target language
speakers and to enhance listeners’ willingness to communicate with interlocutors whose
pronunciation differs from their own (Derwing et al., 2002).
Further Reading
Critical issues relevant to optimal teaching and learning of L2 pronunciation are covered in a clear,
comprehensive manner.
Grant, L. (Ed.) (2014). Pronunciation myths: Applying second language research to classroom teaching.
Ann Arbor: Michigan University Press.
This very readable collection of papers addresses popular myths related to pronunciation learning and
teaching.
Levis, J. M. (2018). Intelligibility, oral communication, and the teaching of pronunciation. Cambridge:
With a strong emphasis on classroom practice and how pronunciation teaching can be more effectively
approached in different teaching contexts, this book is an important resource for pronunciation researchers.
References
Abercrombie, D. (1949). Teaching pronunciation. ELT Journal, 3, 113–122.
Chan, K. Y. & Hall, M. D. (2019). The importance of vowel formant frequencies and proximity in
vowel space to the perception of foreign accent. Journal of Phonetics, 77, 1–22.
Council of Europe. (2018). Common European Framework of Reference for languages: Learning,
teaching, assessment:Companion volume with new descriptors. Strasbourg, France: Council of
Europe.
Crowther, D. (2021). Measuring phonology. In F. Winke & T. Brunfaut (Eds.), The Routledge
handbook of second language acquisition and language testing (pp. 243–253). New York, NY &
Abingdon OX: Routledge.
Crowther, D. (2020). Rating L2 speaker comprehensibility on monologic vs. interactive tasks: What is
the effect of speaking task type? Journal of Second Language Pronunciation, 6(1), 96–121.
Dalby, J., & Kewley-Port, D. (1999). Explicit pronunciation training using automatic speech
recognition technology. CALICO Journal, 16(3): 425–445.
Derwing, T. M. (in press). Lessons learned from teaching teachers to teach pronunciation. In V.
Sardegna & A. Jarosz (Eds.), Theoretical and practical developments in English speech assessment,
research, and training. Berlin: Springer.
Derwing, T. M. & Munro, M. J. (1997). Accent, comprehensibility and intelligibility: Evidence from
four L1s. Studies in Second Language Acquisition, 19, 1–16.
156
Derwing, T. M. & Munro, M. J. (2005). Pragmatic perspectives on the preparation of teachers of

English as a second language. In E. Llurda (Ed.), Non-native language teachers: Perceptions, chal-
lenges, and contributions to the profession (pp. 179–191). New York: Springer.
Derwing, T. M. & Munro, M. J. (2013). The development of L2 oral language skills in two L1 groups:
A seven-year study. Language Learning, 63, 163–185.
Derwing, T. M., Munro, M. J., Foote, J. A., Waugh, E. & Fleming, J. (2014). Opening the window on
comprehensible pronunciation after 19 years: A workplace training study. Language Learning, 64,
526–548.
Derwing, T. M., Rossiter, M. J., & Munro, M. J. (2002). Teaching native speakers to listen to foreign-
accented speech. Journal of Multilingualism and Multicultural Development, 23, 245–259.
Derwing, T. M., Rossiter, M. J., Munro, M. J. & Thomson, R. I. (2004). L2 fluency: Judgments on
different tasks. Language Learning, 54, 655–679.
Derwing, T. M., Waugh, E., & Munro, M. J. (2021). Pragmatically speaking: Preparing adult ESL
students for the workplace. Applied Pragmatics, 3(2), 107–135
Ding, S., Liberatore, C., Sonsaat, S., Lučić, I., Silpachai, A., Zhao, G., … & Gutierrez-Osuna, R.
(2019). Golden speaker builder: An interactive tool for pronunciation training. Speech
Communication, 115, 51–66.
Flanders, N. & Nuthall, G. (Eds.) (1972). Classroom behaviour of teachers. Hamburg: UNESCO
Institute for Education.
Flege, J. E. (1984). The detection of French accent by American listeners. Journal of the Acoustical
Society of America, 76, 692–707.
Foote, J. A., Holtby, A. & Derwing, T. M. (2011). Survey of pronunciation teaching in adult ESL
programs in Canada, 2010. TESL Canada Journal, 29, 1–22.
Fouz-González, J. (2020). Using apps for pronunciation training: An empirical evaluation of the
English File Pronunciation app. Language Learning & Technology, 24(1), 62–85.
Galante, A. & Thomson, R. I. (2017). The effectiveness of drama as an instructional approach for the
development of second language oral fluency, comprehensibility, and accentedness. TESOL
Quarterly, 51, 115–142.
Garcia, C., Nickolai, D., & Jones, L. (2020). Traditional versus ASR-based pronunciation instruction:
An empirical study. Calico Journal, 37(3), 213–232.
Gattegno, C. (1972). Teaching foreign languages in schools: The Silent Way. New York: Educational
Solutions.
Gordon, J., & Darcy, I. (2016). The development of comprehensible speech in L2 learners: A classroom
study on the effects of short-term pronunciation instruction. Journal of Second Language
Pronunciation, 2(1), 56–92.
Hahn, L. (2004). Primary stress and intelligibility: Research to motivate the teaching of supraseg-
mentals. TESOL Quarterly, 38, 201–223.
Hansen Edwards, J., Chan, K. L. R., Lam, T., & Wang, Q. (2020). Social factors and the teaching of
pronunciation: What the research tells us. RELC Journal, doi: 10.1177/003368822096089
Huensch, A. (2019). Pronunciation in foreign language classrooms: Instructors’ training, classroom
practices and beliefs. Language Teaching Research, 23, 745–764.
Huensch, A. & Nagle, C. (2021). The effect of speaker proficiency on intelligibility, comprehensibility,
and accentedness in L2 Spanish: A conceptual replication. Language Learning, 71(3), 613–945.
Huensch, A., Tracy-Ventura, N., Bridges, J., & Medina, J. A. C. (2019). Variables affecting the
Second Language Acquisition and International Education, 4(1), 96–125.
Isaacs, T., & Trofimovich, P. (Eds.). (2016). Second language pronunciation assessment: Interdisciplinary
perspectives. Bristol: Multilingual Matters.
Jułkowska, I. A., & Cebrian, J. (2015). Effects of listener factors and stimulus properties on the in-
telligibility, comprehensibility and accentedness of L2 speech. Journal of Second Language
Kang, O., Thomson, R. I., & Moran, M. (2018). Empirical approaches to measuring the intelligibility of
different varieties of English in predicting listener comprehension. Language Learning, 68(1), 115–146.
Kang, O., Moran, M. & Thomson, R. (2019). The effects of international accents and shared first
language on listening comprehension tests. TESOL Quarterly, 53(1), 56–81.
157
Kang, O., Thomson, R. I. & Moran, M. (2020). Which features of accent affect understanding?
Exploring the intelligibility thresholds of diverse accent varieties. Applied Linguistics, 41, 453–480.
Kelly, J. (1993). Obituary: David Abercrombie . Phonetica, 50, 68–71.
Kennedy, S., & Trofimovich, P. (2019). Comprehensibility: A useful tool to explore listener under-
standing. The Canadian Modern Language Review, 75(4), 275–284.
King, R. D. (1967). Functional load and sound change. Language, 43, 831–852.
Krashen, S. D. (1982). Principles and practice in second language acquisition. Oxford: Pergamon Press.
Lado, R. (1964). Language teaching: A scientific approach. New York: McGraw-Hill.
Levis, J. (2020). Changes in L2 pronunciation: 25 years of intelligibility, comprehensibility, and ac-
centedness. Journal of Second Language Pronunciation, 6(3), 277–282.
Levis, J. M. (2018). Intelligibility, oral communication, and the teaching of pronunciation. Cambridge:
Levis, J. M. & Sonsaat, S. (2020). Publication venues for L2 pronunciation research. Journal of Second
Language Pronunciation, 6, 1–11.
Levis, J. M., Sonsaat, S., Link, S., & Barriuso, T. A. (2016). Native and nonnative teachers of L2
pronunciation: Effects on learner performance. TESOL Quarterly, 50, 894–931.
Martin, I. A., Sippel, L. (2021). Is giving better than receiving?: The effects of peer and teacher feedback
on L2 pronunciation skills. Journal of Second Language Pronunciation. https://doi-org.login.
ezproxy. library.ualberta.ca/10.1075/jslp.20001.mar
Munro, M. J. (2021). On the difficulty of defining “difficult” in second-language vowel acquisition.
Frontiers in Communication. 53
Munro, M. J. (2018). Dimensions of pronunciation. In O. Kang, R. Thomson, & J. Murphy. The
Routledge handbook of contemporary English pronunciation (pp. 413–431). London: Routledge.
Munro, M. J. (2018). How well can we predict L2 learners’ pronunciation difficulties? The CATESOL
Journal, 30(1), 267–281.
Munro, M. J. & Derwing, T. M. (1995a). Foreign accent, comprehensibility and intelligibility in the
speech of second language learners. Language Learning, 45, 73–97.
ception of native and foreign-accented speech. Language & Speech, 38, 289–306.
Munro, M. J. & Derwing, T. M. (2001). Modelling perceptions of the comprehensibility and accent-
edness of L2 speech: The role of speaking rate. Studies in Second Language Acquisition, 23, 451–468.
Munro, M. J. & Derwing, T. M. (2006). The functional load principle in ESL pronunciation instruc-
tion: An exploratory study. System, 34, 520–531.
Munro, M. J. & Derwing, T. M. (in preparation). Huh? The amazing individual differences in L2
pronunciation learning trajectories.
Munro, M. J., Derwing, T. M. & Burgess, C. (2010). Detection of nonnative speaker status from
content-masked speech. Speech Communication, 52(7–8), 626–637.
Munro, M. J., Derwing, T. M., & Thomson, R. I. (2015). Setting segmental priorities for English
learners: Evidence from a longitudinal study. IRAL, 53(1), 39–60.
Munro, M. J. & Derwing, T. M. (2020). Collecting data in L2 pronunciation research. In O. Kang, S.
Staples, K. Yaw, & K. Hirschi (Eds.), Proceedings of the 11th pronunciation in second language
learning and teaching conference, Northern Arizona University, September 2019 (pp. 8–18). Ames,
IA: Iowa State University.
Murphy, J. (2017). Teaching the pronunciation of English: Focus on whole courses. Ann Arbor MI:
University of Michigan Press.
Murphy, J. & Baker, A. (2015). The history of ESL pronunciation teaching. In M. Reed & J. M. Levis
(Eds.), The handbook of English pronunciation (pp. 36–65). Hoboken, NJ: Wiley Blackwell.
Nagle, C. (2019). Developing and validating a methodology for crowdsourcing L2 speech ratings in
Amazon Mechanical Turk. Journal of Second Language Pronunciation, 5, 294–323.
comprehensibility. Studies in Second Language Acquisition, 41, 647–672.
Nilsen, D. L. F., & Nilsen, A. P. (2010). Pronunciation contrasts in English (2nd edn). Long Grove, IL:
Waveland Press.
O’Brien, Mary Grantham (2016). Methodological choices in rating speech samples. Studies in Second
O’Brien, M. G., Derwing, T. M., Cucchiarini, C., Hardison, D. M., Mixdorff, H., Thomson, R. I.,
Strik, H., Levis, J. M., Munro, M. J., Foote, J. A., & Levis, G. M. (2018). Directions for the future
158
of technology in pronunciation research and teaching. Journal of Second Language Pronunciation, 4,

182–206.
Price, O. (1665). The vocal organ. Menston, Yorkshire: Scholar Press, A Scholar Press Facsimile.
Reid, K. T., Trofimovich, P., & O’Brien, M. G. (2019). Social attitudes and speech ratings: Effects of
positive and negative bias on multiage listeners’ judgments of second language speech. Studies in
Richards, J. C. & Rodgers, T. S. (2014). Approaches and methods in language teaching (3rd edn).
Cambridge, UK: Cambridge University Press.
Rojczyk, A. (2015). Using FL accent imitation in L1 in foreign- language speech research. In E.
Waniek- Klimczak and M. Pawlak (Eds.), Teaching and researching pronunciation of English
(pp. 223–233). Cham: Springer.
Schiavetti, N. (1992). Scaling procedures for the measurement of speech intelligibility. In R. D. Kent
(Ed.), Intelligibility in speech disorders (pp. 11–34). Philadelphia, PA: John Benjamins.
Schoonmaker-Gates, E. (2020). The acquisition of dialect-specific phonology, phonetics and socio-
linguistics in L2 Spanish: Untangling learner trends. Critical Multilingualism Studies, 8, 80–103.
Shaw, G. B. (1912). Pygmalion [Play]. doi: 10.5325/shaw.31.1.0009.
Swan, M. & Smith, B. (Eds.) (2001). Learner English: A teacher’s guide to interference and other pro-
blems. Cambridge, UK: Cambridge University Press.
Thomson, R. I. (2011). Computer assisted pronunciation training: Targeting second language vowel
perception improves pronunciation. CALICO Journal, 28(3), 744–765.
Thomson, R. I. (2018). High variability [pronunciation] training (HVPT): A proven technique about
which every language teacher and learner should know. Journal of Second Language Pronunciation,
4, 208–231.
Thomson, R. I. (2021). English accent coach [Online game]. Retrieved from www.englishaccentcoach.com.
Thomson, R. I. & Derwing, T. M. (2015). The effectiveness of L2 pronunciation instruction: A nar-
rative review. Applied Linguistics, 36, 326–344.
Trofimovich, P. & Baker, W. (2006). Learning second language suprasegmentals: Effect of L2 ex-
Varonis, E. M. & Gass, S. (1982). The comprehensibility of non‐native speech. Studies in Second
Language Acquisition, 4, 114–136.
Wade, L., Lai, W., & Tamminga, M. (2021). The reliability of individual differences in VOT imitation.
Language and Speech, 64(3), 576–593.
Zhang, R. & Yuan, Z. (2020). Examining the effects of explicit pronunciation instruction on the de-
velopment of L2 pronunciation. Studies in Second Language Acquisition, 42, 905–918.
Zielinski, B. & Yates, L. (2014). Myth: Pronunciation instruction is not appropriate for beginning-level
learners. In L. Grant (Ed.), Pronunciation myths: Applying second language research to classroom
teaching (pp. 56–79). Ann Arbor: Michigan University Press.
159
11
SPEECH INTELLIGIBILITY
After the 2003 invasion of Iraq by American and allied forces, the Australian comedy show
skitHOUSE showed a British reporter apparently interviewing Iraqi insurgents outside
Tikrit, hometown of the Iraqi president (https://www.youtube.com/watch?v=j0m4rcx0of4).
The insurgent spokesman questions why subtitles are needed for his speech when his com-
rade speaks and is given no subtitles. Referring to the comrade, the reporter tells him that
“obviously he’s comprehensible,” and the comrade says that subtitles are like “teletext, for
the hearing impaired.” The skit thus plays off of three issues central to how we understand
L2 speech: accentedness, or the pronunciation of a speaker; comprehensibility, or the ease
with which listeners understand a speaker; and intelligibility, or “the extent to which a
speaker’s message is actually understood by a listener” (Munro & Derwing, 1995a, p. 76).
All three constructs of listener evaluation of speech (accentedness, comprehensibility, and
intelligibility) are partially related yet distinct. This chapter is about intelligibility, which
involves several types of understanding: understanding at the word level, the message level,
and in the interpretation of the message (Levis, 2018; Smith & Nelson, 1985). At the word
level, speech is intelligible when words in a message can be identified and decoded. If a
speaker’s words are unclearly pronounced, or mispronounced, or the competition from noise
is such that words cannot be effectively understood, words may be unintelligible by being
heard as another word, or by not being understood at all. At the message level, unintellig-
ibility occurs when the intended meaning of a discourse or utterance is not understood, or is
not fully understood. Intelligibility at the level of interpretation comes into play when lis-
teners understand the speaker’s intent behind a message, that is, its illocutionary force. This
kind of intelligibility is rarely studied for L2 speech because intent is often ambiguous, and
listeners may think they have correctly understood intent but have not.
A listener-based account of intelligibility is basic to how speakers are or are not under-
stood. In other words, we only know that spoken language is (un)intelligible because listeners
can or cannot understand it at the word, message or interpretation levels. From a speaker’s
viewpoint, intelligibility means speaking in such a way that listeners can understand what is
said. For L2 speakers, this may mean paying attention to segmental and prosodic features
associated with reduced intelligibility (Zielinski, 2008). Ladefoged and Disner (2012) describe
this give and take between speakers and listeners in terms of two principles behind speech:
160 DOI: 10.4324/9781003022497-15

Speech Intelligibility
articulatory ease and auditory distinctiveness. Speakers want to deliver their message with
the least amount of effort necessary, while listeners demand a level of clarity in speech that
makes listening easy. In effect, intelligibility means that the two sides need to meet in the
middle of any communicative exchange. When L2 speech is involved in the communicative
exchange, speakers will both speak out of their own L1 or L2 variety, and they will also listen
with their own perceptual systems.
It is clear that most exchanges between L1 speakers are mutually intelligible, even across
dialects. However, interactions including L2 speakers add unexpected variability that can
more seriously affect intelligibility. Hahn and Watts (2011), for example, report an otherwise
friendly interaction between a Hausa-speaking woman (and her children, all in their best
clothes) and an English-speaking man in Nigeria, in which the Hausa speaker said “You
want to snuff me?” The English speaker was very confused until the woman pointed to his
camera and he was able to interpret the word “snuff” as actually being “snap” (as in a
snapshot, a photo). Two mispronunciations at the word level, in the vowel and the final stop
consonant, caused unintelligibility in both what was being asked and why it was being asked.
It is helpful to think of intelligibility in terms of different types of speakers and listeners, as
in the modified speaker-listener matrix from Levis (2020), in which native speaker-listeners
(NS) and non-native speaker-listeners (NNS) interact in various combinations. Table 11.1
assumes that intelligibility can be compromised in any interaction, that is, that listeners can
misunderstand speakers at the word, message or interpretation level for a variety of reasons.
In Quadrant A, this would happen between two NS speaker-listeners, perhaps due to dif-
ferences in dialect. In Quadrant B, NNS listeners would find NS speakers unintelligible,
perhaps, for example, because of speed of speech, register of speech (with casual speech being
more difficult than formal speech), or because the NNS listener’s proficiency in their L2 is
limited. Quadrant C reflects most research on L2 speech intelligibility, in which NS listeners
misunderstand NNS speakers because of unexpected phonological, lexical, or grammatical
choices or errors. While some errors may be relatively easy to interpret (e.g., this pronounced
as dis), others are not as easy (e.g., flight pronounced as fright). Finally, Quadrant D reflects
other research on L2 intelligibility, in which speakers and listeners with different L1s use a
common L2 to communicate. Intelligibility in this Quadrant may be affected by the pho-
nological and perceptual systems of both speaker-listeners. That is, not only will their L1
affect how they produce the common L2, they will also interpret the other person’s speech
through their L1 phonological system. Jenkins (2000) reports an English interaction between
a Swiss-German and a Japanese speaker, in which the German interpreted the Japanese
Table 11.1 Possible intelligibility interactions in with native and non-native speakers
Native Listener (NS) Non-native Listener (NNS)
Native Speaker (NS) (A) (B)

Native speakers talking to Native and non-native speakers
each other (e.g., Canadian in interaction (e.g., Scottish
and Canadian speakers; speaker and Korean listener)
Southern USA and Boston
English speakers)
Non-native Speaker (NNS) (C) (D)
Native and non-native Non-native speakers talking to
speaker in interaction (e.g., each other using English
Arabic speaker and (e.g., Japanese and German
American English listener) speakers)
161
production of the words “grey house” as “clay house.” The Japanese speaker’s challenge in
producing a difference between /l/ and /ɹ/ also affected how the German speaker interpreted
the initial velar stop /ɡ/ as /k/. This chapter focuses on Quadrants C and D.
Intelligibility has a long history in speech and signal processing research, especially in regard
to understanding how electronically delivered speech (e.g., telephone speech) is understood,
the effects of noise and other non-linguistic factors on understanding, and the extent to
which distortions of the speech signal affect understanding. As research came to incorporate
linguistic considerations, intelligibility was defined “as a property of speech communication
involving meaning” (Lehiste & Peterson, 1959, p. 280), resulting in studies of intelligibility in
deaf and hearing-impaired people, linguistic factors such as word predictability, and the
effects of age on listeners’ ability to understand. More recently, there have been increasing
numbers of studies on the intelligibility of L2 speech, the central topic here.
Within L2 pronunciation, one of the earliest advocates of intelligibility was Abercrombie
(1949) who said that L2 learners needed “comfortably intelligible” pronunciation, “which
can be understood with little or no conscious effort on the part of the listener” (p. 120).
Abercrombie’s view has much in common with modern notions of comprehensibility (i.e.,
requiring little conscious effort from a listener); he was ahead of his time in taking an ex-
plicitly listener-based approach to L2 pronunciation. Intelligibility as the goal of pro-
nunciation learning made a more prominent appearance in the 1970s and 1980s in regard to
how speakers of World Englishes understood each other (Smith & Rafiqzad, 1979), how new
English varieties were understood (Bansal, 1976), how international teaching assistants in the
United States were understood (Gallego, 1990; Hinofotis & Bailey, 1980), and in relation to
pronunciation teaching approaches (Pennington & Richards, 1986). Attempts were made to
identify what was actually meant by intelligibility (Smith & Nelson, 1985), what features
were important for intelligibility of L2 English (Jenner, 1989), how listeners with different L1
backgrounds evaluated accented speech (Fayer & Krasinski, 1987), and the effect of famil-
iarity with message content, accent, and speaker on improved intelligibility (Gass & Varonis,
1984). Intelligibility was not yet widely accepted as the primary goal for L2 pronunciation
teaching, but rather competed with nativelikeness as an alternative goal for L2 pronunciation
(Leather 1983). It was not until Munro and Derwing (1995a) that intelligibility was under-
stood in the way it is used in most L2 pronunciation research today, that is, whether listeners
understand a speaker’s message at the level of the word, message or intention.
Another construct related to intelligibility is comprehensibility (Levis, 2020), but the two
constructs do not refer to the same thing. Comprehensibility is measured not by success or
failure but by the amount of work listeners do to understand the speech. There are respects in
which intelligibility and comprehensibility are similar. Like intelligibility, comprehensibility
is affected by pronunciation and lexico-grammatical features, and intelligibility and com-
prehensibility can both change even when accentedness does not (Derwing et al., 1998;
Zhang & Yuan, 2020). However, comprehensibility also can be affected by fluency and the
ways in which spoken content is organized (Isaacs & Trofimovich, 2012).

Research has identified linguistic and non-linguistic factors that are relevant to the in-
telligibility of L2 speech, especially regarding the differential effects of vowels and con-
sonants, the intelligibility of words versus sentences, the influence of the L1 on the
162
intelligibility of L2 speech, the potential influence of clear speech (i.e., speech styles with
careful articulation as opposed to casual speech styles), the influence of noise, and the impact
of listener attitudes on L2 intelligibility.
3.1 Linguistic Variables in Intelligibility Judgements

Consonants have traditionally been considered more important to intelligibility (Fogerty &
Kewley-Port, 2009). This is especially true of initial consonants in stressed syllables (Bent,
Bradlow & Smith, 2007; Zielinski, 2008). However, recent studies have suggested that vowels
may contribute more to sentence intelligibility because vowels carry important perceptual
cues, such as intensity, length, and vowel formant transitions not found in consonants (Cole
et al., 1996; Kewley-Port et al., 2007). Support for a vowel advantage comes from studies
that show that intelligibility scores via repeating back entire sentences were higher for sen-
tences containing vowels only (consonants masked or silenced) than sentences where vowels
were masked or silenced. For example, Fogerty and Humes (2012) reported a 2:1 advantage
for vowel-only information compared to consonant-only information for sentence intellig-
ibility but not monosyllabic word intelligibility. Chen et al. (2013) reported a stronger ad-
vantage for vowels for Mandarin sentences, a finding that may reflect the importance of
vowels in a tone language.
Other research has shown that the intelligibility of isolated words may be different from
the intelligibility of sentences. This might be due to fundamental differences between isolated
words and sentences in production and in perception. Fogerty and Humes (2012) noted that
compared to words spoken in isolation, words in a sentence context may be produced with
vowel reduction (Lindblom, 1963), exhibit increased coarticulation (Kent & Minifie, 1977),
and have distinct prosodic characteristics that affect word pronunciation (Shattuck-Hufnagel
& Turk, 1996). Sentences may also produce long-distance effects in perceptual tasks
(Ladefoged & Broadbent, 1957), in which the perception of one acoustic context may in-
fluence the perception of the acoustic cues of segments that follow. Words in sentences are
also more predictable than words in isolation (Miller et al., 1951) because listeners can use
linguistic context to predict following words, a strategy unavailable for words heard in
isolation.
3.2 Non-Linguistic Factors That Facilitate and Hinder Intelligibility

Intelligibility may also be facilitated by shared L1 background (Edwards et al., 2019).
Previous studies have found that listeners who share the same L1 as the speakers find them
more intelligible compared to speakers who speak a different L1. Bent and Bradlow (2003),
for instance, reported that speakers of English whose L1 was Korean find the speech of a
speaker of English who spoke Korean natively more intelligible compared to a speaker of
English whose L1 was Chinese. Edwards et al. (2019) similarly reported that speakers of
English with Cantonese as their L1 found the English of other Cantonese speakers was more
intelligible compared to the English produced by native speakers and speakers from
Singapore.
Bent and Bradlow (2003) explained these facilitative effects by arguing for an interlanguage
speech intelligibility benefit (ISIB) in which listeners and speakers sharing the same L1 also
share the same phonological and phonetic knowledge of their L1 (e.g., phonemes, phono-
tactics, and suprasegmentals). This shared knowledge helps the listeners interpret the speech of
the talkers. The authors’ explanation is based on claims made by models of speech perception
(e.g., Best, 1995; Flege, 1992) which state that non-native sound production and perception are
163
linked to L1 sound structure. The evidence for these facilitative effects, however, remains weak.
In the experiment in Bent and Bradlow (2003) itself, the listeners who experienced ISIB were
only the Korean L1 speakers. Listeners who were English speakers with Chinese as their L1 did
not find the English of a Chinese L1 speaker more intelligible compared to a Korean L1
speaker. In Edwards et al. (2019), Cantonese listeners found Cantonese-accented English in-
telligible, but they also found the English produced by Mandarin speakers intelligible. The
authors suggested that this may have been because the listeners were familiar with Mandarin
speakers’ English. A similar claim was made by Tauroza and Luk (1997) who did not find
evidence for shared background effects. They found that Hong Kong based ESL learners
showed higher comprehension scores when listening to Received Pronunciation (RP) com-
pared to Hong Kong English. The authors suggested that this was because the learners were
familiar with RP as it had been the language of instruction.
Using a transcription task, Munro et al. (2006) did not find clear facilitative effects for
listeners with an L1 background that matched that of the talkers. Their study reported that
Japanese listeners found the English produced by Japanese speakers more intelligible than
the English produced by native speakers of Cantonese, Mandarin, and English. However, the
Cantonese listeners did not find speech produced by native speakers of Cantonese more
intelligible compared to the other language groups. The authors suggested that if shared
background effects were present, they were likely weak and outweighed by other factors. The
authors speculated that the discrepancy may have been due to other factors including ones
that are not known or well understood. One speculated factor was proficiency, given that the
Japanese listeners had reported using English more compared to the Cantonese listeners;
however, the authors concluded that this factor too was likely trumped by another factor –
the properties of the speech itself.
Another area of interest to intelligibility is research into clear and conversational speech.
According to Smiljanić and Bradlow (2007), clear speech is produced when speakers think
that they are speaking to someone with impaired hearing or to someone who is a non-native
speaker. Plain (or conversational) speech is produced when speaking to someone who is
familiar with the speech style of the speaker. In general, the researchers found “clear speech
is a beneficial articulatory modification regardless of the listener and talker L1 backgrounds”
(p. 664). Related research shows that Lombard speech, or speech produced in noise
(Lombard, 1911), may similarly facilitate intelligibility compared to speech produced in quiet
(Dreher & O’Neill, 1957; Summers et al., 1988; Pittman & Wiley, 2001). Lombard speech
unconsciously occurs when humans modify their vocalizations in various ways to facilitate
communication in noisy environments. Modifications include increased amplitude in pro-
portion to noise, a slower speech rate, a rise in fundamental frequency, and more energy at
higher frequencies (Bosker & Cooke, 2020; Cooke et al., 2014; Hotchkin & Parks, 2013; Luo
et al., 2015).
Previous studies have suggested that compared to native speech, noise may greatly hinder
the intelligibility of non-native speech especially the speech produced by low-intelligibility
speakers (Munro, 1998; Strori et al., 2020). However, the types of noise or other interference
of speech used in previous L2 intelligibility studies have been limited. Speech intelligibility in
natural settings is often hindered by interfering factors such as background noise, re-
verberation, that is, the prolongation of a sound in an enclosed environment (George et al.,
2010) as well as competing voices (Healy et al., 2017) and room characteristics (Astolfi et al.,
2012). Although most current L2 investigations have examined speech produced in quiet,
research on speech produced in noise and/or reverberation might be beneficial because
speech in everyday life is rarely produced in ideal settings. Perhaps future L2 research on
164
intelligibility of speech produced in naturalistic settings such as classrooms, large halls, and
outdoors would increase current understanding of L2 intelligibility.
A final non-linguistic factor that can affect intelligibility is the attitudes that listeners have
about L2 accentedness. Beliefs about foreign accents can make listeners believe that they do
not understand the speaker (Lippi-Green, 2012), even though strongly accented speech can
be fully intelligible (Munro & Derwing, 1995a). Attitudes about accents may also be seen in
research about whether listeners find particular speakers acceptable for particular jobs, on
the assumption that the wrong type of accented speech can harm business. Thus, questions of
acceptability (Pilott, 2016) may be less about intelligibility and more about attitudes toward
accented speech. A similar reaction to accented speech is reflected in the terms “annoyance”
or “irritation” (Fayer & Krasinski, 1987), with similar uncertainty about the relationship to
intelligibility. Even the expectation of accented speech may be enough to affect listeners’
views of speech being less intelligible (Rubin, 1992), although such expectations may be
mediated by the (mis)match of the speaker’s appearance and beliefs about how the speaker
should sound (McGowan, 2015).
Nonetheless, reactions to accented speech can result in discrimination based on the belief
that strong accents are a cause of unintelligibility. Munro (2003) describes cases investigating
discrimination on the basis of accent in Canada. These types of cases do not often reach a
courtroom, partly because of weak legal protections for speakers with foreign accents, or as
Wolfram (2013) quotes Lippi-Green (2012) who says: “Accent discrimination…is so com-
monly accepted, so widely perceived as appropriate, that it must be seen as the last back door
to discrimination”. This alone calls for a higher profile for intelligibility and a lower one for
accentedness in how societies think of spoken language.
4 Measuring Intelligibility
Intelligibility is traditionally defined by the absence of its opposite, that is, an utterance is
intelligible if it is not unintelligible. As a result, most measures of intelligibility try to identify
places in which listeners do not accurately understand. We will first describe approaches to
measuring intelligibility. Table 11.2 summarizes each approach to measuring intelligibility.
We will then consider how loss of intelligibility can make it difficult for listeners to navigate
speech.
A dominant approach to measuring intelligibility in signal processing is by using verbal
repetition (e.g., Fogerty & Humes, 2012), in which participants repeat aloud as accurately as
possible each sentence they hear. They do not receive any feedback on their repetition and
are encouraged to guess if unsure (Jørgensen et al., 2013). They may be allowed to listen to
each sentence a limited number of times (Chen et al., 2013). The scoring of repeated sen-
tences can be more or less strict, but typically it is the proportion of the number of correctly
recognized words as judged by trained raters. In a stricter approach, all words including their
exact morphemes are assessed (Fogerty & Humes, 2012), whereas in a less strict approach,
only specific keywords are considered (Zekveld et al., 2013).
Studies of L2 intelligibility primarily have measured word intelligibility, but some studies
have also examined the intelligibility of segments and of messages. The primary task used to
measure L2 word intelligibility has been transcription rather than verbal repetition. Scoring
can involve words in isolation (Field, 2005), all words (Munro & Derwing, 1995a), keywords
(Bent et al., 2007), or cloze transcription in which some words are provided and the targeted
words must be supplied (Smith, 1992). There is no research that we know of that compares
these different types of measures under similar conditions.
165
Table 11.2 Approaches to measuring intelligibility

Intelligibility Example Studies Description Comments
Measure
Repetition Fogerty and Humes Listeners repeat the Not typically used in L2
(2012); Zekveld sentences that they hear. research, but common in
et al. (2013) Scoring involves L1 intelligibility
calculating the number research.
of correctly recognized
words within a sentence.
Transcription with Field (2005); Munro Exact word transcription Assumes all words are
all words counted and Derwing of speech in isolation or equally valuable in
(1995a) in sentences. All words understanding speech.
are counted equally.
Transcription with Bent et al. (2007) Exact word transcription Takes into account the
content words of speech. Content differential
counted words are counted, contributions of content
function words are not. and function words to
understanding.
Cloze Transcription Smith (1992) Cloze transcription in Makes transcription of
which many words are longer sections of speech
provided, while targeted easier and may facilitate
words are left blank. prediction.
Sentence Munro and Derwing Listeners evaluate the Used to isolate
verification (1995b); truth value of sentences. intelligibility from the
Munro (1998) Less intelligible effect of discourse
sentences will be context.
wrongly evaluated more
often.
Intelligibility in Jenkins (2000) Researchers identify Examples of
interaction unintelligibility from unintelligibility used to
interaction based on draw conclusions about
difficulties shown by pronunciation priorities,
listeners. repair strategies, etc.
Causes of Zielinski (2008) Difficulties in Connects phonetically
intelligibility understanding identified annotated speech to
from extended difficulties in
discourse. understanding, either
through transcriptions
(Zielinski) or in think-
alouds ( Im &
Levis, 2015).
Listening Hahn (2004) Degree of message Actual recall of content in
comprehension understanding as extended discourse
measured by recall of without referring to
main ideas and details. specific words.
Word identification Thomson 2015 Measurement of segment Intelligibility in this type of
intelligibility by study measures
identifying/ segmental accuracy
misidentifying rather than word
individual segments in accuracy.
known words.
Scalar measures Gooch et al. (2016) Measurement using Rarely used because scales
Likert-type scales. do not measure
intelligibility as it is
understood by the field.
166
Since intelligibility in normal speech may be higher than expected because of the semantic
redundancy of speech, another type of intelligibility measure is sentence verification. Munro
and Derwing (1995b) used decontextualized sentences that were either true or false. They
asked listeners to verify whether each sentence was true or false and connected that judge-
ment to the subsequent transcriptions of the sentences.
A third type of intelligibility measure uses the analysis of interaction (Jenkins, 2000;
Kennedy, 2012), in which misunderstandings and repairs in the midst of interactions are
identified and used as a proxy for loss of intelligibility. These misunderstandings may then be
used to identify possible causes of unintelligibility.
A fourth measure is similar to the third, but directly identifies the phonetic and phonological
causes of unintelligibility. Zielinski (2008) used three extended (2 hour) interviews of Chinese,
Korean, and Vietnamese speakers to identify sentences that listeners had trouble transcribing.
Those sentences were then used for a formal intelligibility task in which listeners transcribed the
sentences. The same sentences were phonetically annotated for segments and stress patterns.
The phonetic annotations were used to describe linguistic patterns affecting intelligibility.
Intelligibility at the level of discourse meaning is typically measured by comprehension of a
message. Hahn (2004) used a matched-guise study with three versions of a short lecture (about
5 minutes). Each lecture was delivered by the same Korean–English bilingual, whose oral
English was high proficiency, ensuring consistency in delivery across versions. One had correct
prominence placements, one included incorrect prominence placements, and one was spoken
with no identifiable prominence placement (as with Korean prosody). Methodologically, Hahn
estimated intelligibility by the number of main ideas and details that listeners recalled.
Intelligibility can also be measured at the level of the segment. In this kind of study,
listeners are asked to identify the segment that they heard, often within a minimal pair (e.g.,
Lee & Lyster, 2017). Thomson and Derwing (2015) asked listeners to identify whether a
vowel was a correct or incorrect example of a category. If correct, the listener was asked to
identify whether the production was a good or poor example of the vowel. This approach
only indirectly says anything about whether mispronounced segments would impair in-
telligibility in a sentence or discourse context.
Scalar judgements have also been used to evaluate intelligibility (e.g., Gooch et al., 2016).
This is an unsatisfactory approach to measuring intelligibility, as Thomson (2017) says:
“using Likert-type scales to assess intelligibility…appeals to listeners’ subjective experiences
of listening, rather than requiring subjects to demonstrate that they can match what they
have heard with what was uttered” (p. 19).
All of these methods require comparisons to evaluate how a certain level of intelligibility is
to be understood. For example, what level of unintelligibility is too much for a listener?
Labov and Hanau (2011), in a description of training for doctors whose L2 oral dictations
were consistently difficult for medical transcribers, showed that only 2% of words were
impossible to transcribe correctly at pre-test, a number that decreased by 20%–1.6% at post-
test. The seemingly low percentage suggests that even 1 of 50 words is enough to cause
listeners to misunderstand, especially in contexts where accuracy is crucial.

Intelligibility should be the central principle underlying the teaching of L2 spoken language,
including pronunciation, speaking, and listening (Levis, 2005, 2018). Successful speaking
167
must be intelligible to a listener, and successful listening can only occur when the listener can
understand the words, messages, and intentions of the speaker. Particular accents may be
important for speakers and group identity, but trying to master particular accents should not
be a priority in most L2 teaching, where accents are unlikely to become native-like. Munro’s
(2011) statement remains the dominant view of the field today: “Intelligibility is the single
most important aspect of all communication. If there is no intelligibility, communication has
failed. In language pedagogy this…is an empirically sound concept that will provide a basis
for a wide range of pedagogically-oriented research in the future” (p. 13).
A teaching and learning approach focused on intelligibility means that teaching must
prioritize features that strongly affect intelligibility and that teaching for intelligibility must
include pronunciation and other language features. Jenkins (2000), in her data from
NNS–NNS communication, found that pronunciation (mostly segmental) was implicated in
loss of intelligibility about two-thirds of the time, while the other one-third of the time,
grammatical and lexical issues influenced loss of intelligibility. Because Jenkins established
priorities based on a relatively small number of errors and limited L2–L2 speaker pairs, her
conclusions must be seen as preliminary.
Teaching for intelligibility can be done reactively or proactively. Reactive correction, or ad
hoc correction is a common type of corrective feedback and is important because it is part of
communicative language practice. Proactive approaches are based on teachers’ knowledge of
the types of features that are likely to cause loss of intelligibility, and proactively addressing
them through planned instruction. Not all pronunciation errors are likely to cause loss of in-
telligibility even if they are noticeable, and some pronunciation errors in certain types of su-
prasegmentals may not even be noticed as pronunciation errors. For example, word stress
errors such as saying noun–verb pairs (e.g., PERmit vs. perMIT) with the wrong stress patterns
do not seem to affect intelligibility or comprehensibility for L1 English listeners (Cutler, 1986).
For these reasons, it is valuable for teachers to understand that some problems are always more
important, while other errors may simply be bothersome to the teacher’s ear.
In the classroom, loss of intelligibility should be addressed explicitly, especially at the
word level. Word-level intelligibility problems occur most often when segmentals or word
stress choices are inaccurate. For example, Benrabah (1997) reported British English listener
transcriptions of English words spoken by Indian, Nigerian, and Algerian speakers. All the
words reported had unexpected stress patterns and their transcriptions were of completely
different words: UPset was transcribed as absent, norMALly as no money, wriTTEN as re-
tain, and seCONdary as country. Hahn and Watts (2011) show that vowel and consonant
mispronunciations can also result in word-level unintelligibility. They report on NS listeners
hearing math as meth, bed as bad, duck as dog, passion as patience, fork as pork, pen as Ben,
and ranking as linking. A number of these mishearings involve more than one segment (e.g.,
dog→duck involved both a vowel and final consonant error). Hahn and Watts say about two-
thirds of their examples involved more than one error. Other errors are high-functional load
errors (Brown, 1988), that is, they involve substitutions that have many minimal pairs in
English, such as /æ/-/ɛ/ (math-meth, bad-bed) and /ɹ/-/l/ (ranking-linking).
Because L2 learners are more likely to be aware of accentedness, it is important to raise
awareness of intelligibility as being more important. To do this, teachers could show that
pronunciation can sometimes be so unexpected that a listener cannot understand what is being
said. This can be done with stories or humour (almost all L2 learners have examples of this
happening) or by using others’ stories, such as those in Hahn and Watts (2011). Teachers can
then give opportunities to model and practice clarification or negotiation strategies for restoring
intelligibility (see Jenkins, 2000). There must also be opportunities for pronunciation instruction
for words that are particularly troublesome or for phonemic or word stress patterns.
168
In addressing segmental errors, the functional load principle offers helpful guidance for
teaching. Functional load measures the amount of work two phonemes do in distinguishing
otherwise identical words in a language. Brown (1988) lists phoneme contrasts for English
from 1 (low) to 10 (high). Low-functional load sound pairs have few minimal pairs (e.g.,
soot–suit, thought–fought), while high-functional load pairs have many (e.g., pat–fat,
plead–bleed, feet–fit). Because high-functional load errors cause listeners to work harder in
understanding and to judge accentedness more seriously (Munro et al., 2006), teachers
should prioritize high-functional load errors. Finally, all difficulties with understanding
should be openly discussed so that everyone can share strategies for resolving problems when
interacting in different contexts, for example, by paraphrasing or writing a word for the
listener. By doing so, teachers and learners may find it easier to talk about why commu-
nication was not successful.
In addressing suprasegmentals, it is important to remember that errors in suprasegmentals
may not be heard as pronunciation errors but as a general lack of understanding the larger
message or even as a social failing (Levis, 2018; Smith & Nelson, 1985). Unlike word-level
intelligibility, the role of pronunciation in these higher levels of intelligibility may be harder
to recognize because of the greater role of suprasegmentals in communicating messages and
intentions. With the exception of word stress, suprasegmentals are unlikely to cause unin-
telligibility at the word level (Levis, 2018). However, a high degree of word-level intelligibility
may be insufficient for promoting message level intelligibility. Hahn (2004) found that the
misplacement or absence of one pronunciation feature, primary phrase stress (i.e., promi-
nence or nuclear stress), resulted in undergraduate students understanding less of the content
of a short lecture.
Examples of suprasegmentals being heard as a social failing can be seen in Gumperz
(1982) and Low (2006). Gumperz reported a conflict that involved a cross-cultural pro-
nunciation difference in intonation. Servers of Indian and Pakistani origin were accused of
being rude to British origin employees at a British Airways cafeteria. In turn, the Indian and
Pakistani workers interpreted the British origin workers as also being rude. After analysis,
Gumperz pinpointed the problem as coming from differences in intonation. The Indian and
Pakistani workers served “Gravy” with a falling intonation, a pattern interpreted as some-
thing like, “Here, take the gravy”. Although Gumperz reported this was a polite way to offer
something in the servers’ L1s, it was impolite in English, where offers are conventionally
made with a rising intonation. In another example, Low (2006) reported that her Singapore
English pronunciation, which usually ends sentences with a prominent syllable even when
other varieties of English would not, was sometimes heard by L1 speakers of English as
signalling that she was angry or upset, even if that was the furthest thing from her intention.
6 Future Directions
The field of L2 pronunciation has progressed greatly since 1995. With an extensive pedagogical
pedigree, growing research findings on the differential effects of errors on intelligibility, studies
of the effectiveness of various techniques for face-to-face and autonomous learning, and the
extension of research on L2 pronunciation into a variety of languages, learning contexts, and
listening environments, the field is ready for further development. Murphy and Baker (2015)
write about four overlapping historical waves of L2 pronunciation: that pronunciation should
be taught, that knowledge about pronunciation makes a difference in teaching, that knowledge
about teachers can improve teaching, and that pedagogy can best improve when built on a solid
research base. Murphy and Baker also foresee a fifth wave in which pedagogy changes as a
result of attention to how teaching and learning are conceptualized as social practices, research
169
into teaching and learning materials, and innovations in teacher training. In such a fifth wave,
intelligibility will remain central to pedagogy and research.
Further Reading
These sources address intelligibility from different perspectives: ELF (Jenkins), research (Derwing &
Munro), and pedagogy (Levis). Munro and Derwing (2011) is an accessible starting point.
teaching and research. Amsterdam: John Benjamins Publishing Company.
Jenkins, J. (2000). The phonology of English as an international language. Oxford: Oxford University
Press.
Levis, J. (2018). Intelligibility, oral communication, and the teaching of pronunciation. Cambridge:
Munro, M. J. & Derwing, T. M. (2011). The foundations of accent and intelligibility in pronunciation
research. Language Teaching, 44(3), 316–327.
References
Abercrombie, D. (1949). Teaching pronunciation. ELT Journal, 3(5), 113–122.
Astolfi, A., Bottalico, P., & Barbato, G. (2012). Subjective and objective speech intelligibility in-
vestigations in primary school classrooms. The Journal of the Acoustical Society of America, 131(1),
247–257.
Bansal, R. (1976). The intelligibility of Indian English. Central Institute of English and Foreign
Languages.
Bent, T., & Bradlow, A. (2003). The interlanguage speech intelligibility benefit. The Journal of the
Acoustical Society of America, 114(3), 1600–1610.
Bent, T., Bradlow, A., & Smith, B. (2007). Segmental errors in different word positions and their effects
on intelligibility of non-native speech. In O-S Bohn & M. Munro (Eds.), Language experience in
second language speech learning: In honour of James Emil Flege (pp. 331–347). Amsterdam: John
Benjamins.
Benrabah, M. (1997). Word-stress–a source of unintelligibility in English. World Englishes, 35(3),
157–165.
Best, C. (1995). A direct realist view of cross-language speech perception. In W. Strange (Ed.), Speech
perception and linguistic experience: Issues in cross-language research (pp. 171–206). York Press.
Bosker, H. & Cooke, M. (2020). Enhanced amplitude modulations contribute to the Lombard in-
telligibility benefit: Evidence from the Nijmegen Corpus of Lombard Speech. The Journal of the
Acoustical Society of America. doi: 10.1121/10.0000646.
Brown, A. (1988). Functional load and the teaching of pronunciation. TESOL Quarterly, 22(4),
593–606.
Chen, F., Wong, L., & Wong, E. (2013). Assessing the perceptual contributions of vowels and con-
sonants to Mandarin sentence intelligibility. The Journal of the Acoustical Society of America,
134(2), EL178–EL184.
Cole, R., Yan, Y., Mak, B., Fanty, M., & Bailey, T. (1996, May). The contribution of consonants
versus vowels to word recognition in fluent speech. In 1996 IEEE international conference on
acoustics, speech, and signal processing conference proceedings (Vol. 2, pp. 853–856). IEEE.
Cooke, M., King, S., Garnier, M., & Aubanel, V. (2014). The listening talker: A review of human and
algorithmic context-induced modifications of speech. Computer Speech & Language, 28(2), 543–571.
Cutler, A. (1986). Forbear is a homophone: Lexical prosody does not constrain lexical access. Language
and Speech, 29(3), 201–220.
Derwing, T. & Munro, M. (2015). Pronunciation fundamentals: Evidence-based perspectives for L2
Derwing, T., Munro, M., & Wiebe, G. (1998). Evidence in favor of a broad framework for
pronunciation instruction. Language Learning, 48(3), 393–410.
Dreher, J., & O’Neill, J. (1957). Effects of ambient noise on speaker intelligibility for words and
phrases. The Journal of the Acoustical Society of America, 29(12), 1320–1323.
Edwards, J., Zampini, M., & Cunningham, C. (2019). Listener proficiency and shared background
170
effects on the accentedness, comprehensibility and intelligibility of four varieties of English. Journal
of Monolingual and Bilingual Speech, 1(2), 333–356.
Fayer, J. & Krasinski, E. (1987). Native and nonnative judgments of intelligibility and irritation.
Field, J. (2005). Intelligibility and the listener: The role of lexical stress. TESOL Quarterly, 39(3),
399–423.
Flege, J. (1992). Speech learning in a second language. Phonological development: Models, Research,
Implications, 565, 604.
Fogerty, D., & Humes, L. (2012). The role of vowel and consonant fundamental frequency, envelope,
and temporal fine structure cues to the intelligibility of words and sentences. The Journal of the
Fogerty, D., & Kewley-Port, D. (2009). Perceptual contributions of the consonant-vowel boundary to
sentence intelligibility. The Journal of the Acoustical Society of America, 126(2), 847–857.
Gallego, J. (1990). The intelligibility of three nonnative English-speaking teaching assistants: An
analysis of student-reported communication breakdowns. Issues in Applied Linguistics, 1(2),
219–237.
Gass, S., & Varonis, E. (1984). The effect of familiarity on the comprehensibility of nonnative speech.
George, E., Goverts, S., Festen, J., & Houtgast, T. (2010). Measuring the effects of reverberation and
noise on sentence intelligibility for hearing-impaired listeners. Journal of Speech, Language, and
Hearing Research, 53, 1429–1439.
Gooch, R., Saito, K., & Lyster, R. (2016). Effects of recasts and prompts on L2 pronunciation de-
velopment: Teaching English /ɹ/ to Korean adult EFL learners. System, 60, 117–127.
Gumperz, J. (1982). Discourse strategies. Cambridge: Cambridge University Press.
Hahn, L. (2004). Primary stress and intelligibility: Research to motivate the teaching of supraseg-
mentals. TESOL Quarterly, 38(2), 201–223.
Hahn, L., & Watts, P. (2011). Intelligibility tales. In Proceedings of the 2nd pronunciation in second
language learning and teaching conference (pp. 17–29). Iowa State University, Ames, IA.
Healy, E., Delfarah, M., Vasko, J., Carter, B., & Wang, D. (2017). An algorithm to increase intellig-
ibility for hearing-impaired listeners in the presence of a competing talker. The Journal of the
Hinofotis, F., & Bailey, K. (1980). American undergraduates’ reactions to the communication skills of
foreign teaching assistants. In J. Fisher, M. Clarke, & J. Schachter (Eds.), On TESOL (Vol. 80,
pp. 120–133). Washington, DC: Teachers of English to Speakers of Other Languages.
Hotchkin, C., & Parks, S. (2013). The Lombard effect and other noise‐induced vocal modifications:
Insight from mammalian communication systems. Biological Reviews, 88(4), 809–824.
Im, J., & Levis, J. (2015). Judgments of non-standard segmental sounds and international teaching
assistants’ spoken proficiency levels. In G. Gorsuch (Ed.), Talking matters: Research on talk and
communication of international teaching assistants (pp. 113–142). Stillwater, OK: New Forums Press.
Isaacs, T., & Trofimovich, P. (2012). Deconstructing comprehensibility: Identifying the linguistic in-
fluences on listeners’ L2 comprehensibility ratings. Studies in Second Language Acquisition, 34(3),
475–505.
Press.
Jenner, B. (1989). Teaching pronunciation: The common core. Speak Out, 4, 2–4.
Jørgensen, S., Ewert, S., & Dau, T. (2013). A multi-resolution envelope-power based model for speech
intelligibility. The Journal of the Acoustical Society of America, 134(1), 436–446.
Kennedy, S. (2012). When non-native speakers misunderstand each other: Identifying important as-
pects of pronunciation. Contact Magazine, 38(2), 49–62.
Kent, R., & Minifie, F. (1977). Coarticulation in recent speech production models. Journal of Phonetics,
5(2), 115–133.
Kewley-Port, D., Burkle, T., & Lee, J. (2007). Contribution of consonant versus vowel information to
sentence intelligibility for young normal-hearing and elderly hearing-impaired listeners. The Journal
of the Acoustical Society of America, 122(4), 2365–2375.
Labov, J., & Hanau, C. (2011). Pronunciation as life and death: Improving the communication skills of
non-native English-speaking pathologists. In B. Hoekje & S. Tipton (Eds.), English language and the
171
medical profession: Instructing and assessing the communication skills of international physicians
(pp. 261–285). Bingley, UK: Emerald Group Publishing.
Ladefoged, P., & Disner, S. (2012). Vowels and consonants (3rd edn). Malden, MA: Wiley Blackwell.
Ladefoged, P., & Broadbent, D. (1957). Information conveyed by vowels. The Journal of the Acoustical
Society of America, 29(1), 98–104.
Leather, J. (1983). Second-language pronunciation learning and teaching. Language Teaching, 16(3),
198–219.
Lehiste, I., & Peterson, G. (1959). Linguistic considerations in the study of speech intelligibility. The
Journal of the Acoustical Society of America, 31(3), 280–286.
Levis, J. (2005). Changing contexts and shifting paradigms in pronunciation teaching. TESOL
Quarterly, 39(3), 369–377.
Levis, J. (2020). Revisiting the nativeness and intelligibility principles. Journal of Second Language
Lindblom, B. (1963). Spectrographic study of vowel reduction. The Journal of the Acoustical Society of
America, 35(11), 1773–1781.
Lippi-Green, R. (2012). English with an accent. Language, ideology, and discrimination in the United
States (2nd edn). Milton Park, UK: Routledge.
Lombard, E. (1911). Le signe de l’elevation de la voix. Ann. Mal. de L’Oreille et du Larynx, 37, 101–119.
Low, E. L. (2006). A cross‐varietal comparison of deaccenting and given information: Implications for
international intelligibility and pronunciation teaching. TESOL Quarterly, 40(4), 739–761.
Luo, J., Goerlitz, H., Brumm, H., & Wiegrebe, L. (2015). Linking the sender to the receiver: Vocal
adjustments by bats to maintain signal detection in noise. Scientific Reports, 5, 1–11.
Lee, A. H., & Lyster, R. (2017). Can corrective feedback on second language speech perception errors
affect production accuracy? Applied Psycholinguistics, 38(2), 371.
McGowan, K. (2015). Social expectation improves speech perception in noise. Language and Speech,
58(4), 502–521.
Miller, G., Heise, G., & Lichten, W. (1951). The intelligibility of speech as a function of the context of
the test materials. Journal of Experimental Psychology, 41(5), 329–335.
Munro, M. (1998). The effects of noise on the intelligibility of foreign-accented speech. Studies in
Munro, M. (2003). A primer on accent discrimination in the Canadian context. TESL Canada Journal,
20(2), 38–51.
Munro, M. (2011). Intelligibility: Buzzword or buzzworthy? In. J. Levis & K. LeVelle (Eds.),
Proceedings of the 2nd pronunciation in second language learning and teaching conference, Sept. 2010.
(pp. 7–16). Ames, IA: Iowa State University.
Munro, M. & Derwing, T. (1995a). Foreign accent, comprehensibility, and intelligibility in the speech
of second language learners. Language Learning, 45(1), 73–97.
Munro, M. & Derwing, T. (1995b). Processing time, accent, and comprehensibility in the perception of
native and foreign-accented speech. Language and Speech, 38(3), 289–306.
Munro, M. & Derwing, T. (2011). The foundations of accent and intelligibility in pronunciation re-
search. Language Teaching, 44(3), 316–327.
Munro, M., Derwing, T., & Morton, S. (2006). The mutual intelligibility of L2 speech. Studies in Second
Murphy, J., & Baker, A. (2015). History of ESL pronunciation teaching. In M. Reed & J. Levis (Eds.),
The handbook of English pronunciation (pp. 36–65). New York: Wiley Blackwell.
Pennington, M. & Richards, J. (1986). Pronunciation revisited. TESOL Quarterly, 20(2), 207–225.
Pilott, M. (2016). Migrant pronunciation: What do employers find acceptable? Doctoral dissertation,
Victoria University, Wellington, New Zealand.
Pittman, A. & Wiley, T. (2001). Recognition of speech produced in noise. Journal of Speech, Language,
and Hearing Research, 44, 487–496
Rubin, D. L. (1992). Nonlanguage factors affecting undergraduates’ judgments of nonnative English-
speaking teaching assistants. Research in Higher Education, 33(4), 511–531.
Shattuck-Hufnagel, S., & Turk, A. (1996). A prosody tutorial for investigators of auditory sentence
processing. Journal of Psycholinguistic Research, 25(2), 193–247.
Smiljanić, R., & Bradlow, A. (2007). Clear speech intelligibility: Listener and talker effects. In
172
Proceedings of the XVIth international congress of phonetic sciences (pp. 661–664). Saarbrucken,
Germany.
Smith, L. (1992). Spread of English and issues of intelligibility. In B. Kachru (Ed.), The other tongue:
English across cultures (pp. 75–90). Champaign: University of Illinois Press.
Smith, L., & Nelson, C. (1985). International intelligibility of English: Directions and resources. World
Englishes, 4(3), 333–342.
Smith, L., & Rafiqzad, K. (1979). English for cross-cultural communication: The question of intellig-
ibility. TESOL Quarterly, 13(3), 371–380.
Strori, D., Bradlow, A., & Souza, P. (2020). Recognition of foreign-accented speech in noise: The
interplay between talker intelligibility and linguistic structure. The Journal of the Acoustical Society
of America, 147(6), 3765–3782.
Summers, W., Pisoni, D., Bernacki, R., Pedlow, R., & Stokes, M. (1988). Effects of noise on speech
production: Acoustic and perceptual analyses. The Journal of the Acoustical Society of America,
84(3), 917–928.
Tauroza, S., & Luk, J. (1997). Accent and second language listening comprehension. RELC Journal,
28(1), 54–71.
Thomson, R. (2017). Measurement of accentedness, intelligibility, and comprehensibility. In O. Kang &
A. Ginther (Eds.), Assessment in second language pronunciation (pp. 11–29). Milton Park, UK:
Routledge.
Thomson, R., & Derwing, T. (2015). The effectiveness of L2 pronunciation instruction: A narrative
review. Applied Linguistics, 36(3), 326–344.
Wolfram, W. (2013). Sound effects. Teaching Tolerance, 52(43), 29–31.
Zekveld, A., Rudner, M., Johnsrude, I., & Rönnberg, J. (2013). The effects of working memory
capacity and semantic cues on the intelligibility of speech in noise. The Journal of the Acoustical
Society of America, 134(3), 2225–2234.
Zielinski, B. (2008). The listener: No longer the silent partner in reduced intelligibility. System,
36(1), 69–84.
Zhang, R., & Yuan, Z. (2020). Examining the effects of explicit pronunciation instruction on the
development of L2 pronunciation. Studies in Second Language Acquisition, 42(4), 905–918.
173
12
SPEECH COMPREHENSIBILITY
Pavel Trofimovich, Talia Isaacs, Sara Kennedy, and Aki Tsunemoto
1 Introduction
In 21st century second language (L2) pronunciation research, pedagogy, and assessment, two
contrasting views continue to dominate the landscape (Levis, 2018). Propagated by the
unregulated accent reduction industry (Thomson, 2013), the first view upholds nativelike
attainment as the goal of L2 pronunciation learning and assessment. The second advances an
agenda to help L2 speakers be more easily understandable, not necessarily nativelike, to
listeners. This agenda includes researching the presumed factors that could foster or impede
listeners’ understanding of L2 speech and developing pedagogical interventions to enhance
the quality of L2 speakers’ performance and improve listeners’ ability to understand them.
Most L2 pronunciation experts deem the traditional emphasis on nativelikeness to be an
unsuitable goal in many contexts of language use (Derwing & Munro, 2015). However, the
alternative view – based on measuring how understandable L2 speakers are to listeners – has
been mired in definitional confusion and inconsistency (Isaacs, 2008).
2 Definitions and Historical Perspectives

An influential and widespread framework for listeners’ understanding of L2 speech is Munro
and Derwing’s (1995a) distinction between two interrelated constructs: intelligibility and
comprehensibility. Intelligibility, denoting the degree to which listeners actually understand
L2 speech (see Levis & Silpachai, this volume), has been operationalized in different ways,
including the accuracy of listeners’ orthographic transcriptions of L2 speech samples
(Derwing & Munro, 1997; Munro et al., 2006) and listeners’ responses to comprehension or
true/false questions as a check for understanding speech content (Hahn, 2004; Kennedy &
Trofimovich, 2008). In contrast, comprehensibility denotes listeners’ perceptions of the ease
or difficulty with which they understand L2 speech. Comprehensibility is typically measured
on a continuum, which is generally a rating scale. This construct has also been characterized
by how irritating listeners perceive L2 speech to be, though this dimension is not emphasized
in many studies (Ludwig, 1982; Piazza, 1980) likely because irritability (while related) does
not fully overlap with comprehensibility.
Although Munro and Derwing’s (1995a) distinction between intelligibility and compre-
hensibility is clear, it has not always been used in the same way, particularly in studies
174 DOI: 10.4324/9781003022497-16

Speech Comprehensibility
predating Munro and Derwing’s influential work. Some researchers use the term intellig-
ibility when measuring understanding through Likert-type scales (e.g., Fayer & Krasinski,
1987) when, in fact, what is being measured is comprehensibility. Other scholars use the term
comprehensibility to refer to measures of what Munro and Derwing would call intelligibility,
such as examining the accuracy of listeners’ transcriptions of L2 utterances (e.g., Gass &
Varonis, 1984, but see Varonis & Gass, 1982, for a measure compatible with Munro and
Derwing’s notion of comprehensibility). Yet others have used rating scales to measure in-
telligibility, which additionally conflate nativelike pronunciation and intelligibility. For
example, Anderson-Hsieh et al.’s (1992) scalar ratings of pronunciation ranged from
“heavily accented speech that was unintelligible” to “near nativelike speech” (p. 538). This
leads to two problems: listeners’ perceptions are treated as their actual understanding, and
speakers’ comprehensibility is confounded with how nativelike they sound.
Definitional challenges can also be seen in the context of L2 oral proficiency scales in
human-mediated standardized language tests often used for high-stakes decision making (e.g.,
TOEFL iBT, IELTS, TOEIC, and Aptis), where the use of comprehensibility has become
pervasive. Many rating scale descriptors make reference to intelligibility or intelligible speech,
but the use of listener- or examiner-mediated scales implies that in fact, Munro and Derwing’s
notion of comprehensibility is being measured. To illustrate, Band 8 of the public version of
the IELTS speaking descriptors refers to L2 speech as “easy to understand throughout; L1
[first language] accent has minimal effect on intelligibility” (British Council, 2020). In another
example from language assessment, Isaacs et al. (2018) developed a dedicated L2 compre-
hensibility scale with extended descriptors intended for English for Academic Purposes tea-
chers to use as a pedagogical tool (i.e., for low-stakes formative assessment rather than high-
stakes consequential decision making). In their detailed analytic scale, comprehensibility is
discussed in terms of underlying pronunciation, fluency, lexis, and grammar features at dif-
ferent ability levels, with the degree of listener effort described across the subscales. This scale
illustrates a data-driven approach to modelling comprehensibility, where comprehensibility is a
multidimensional construct defined through multiple extended descriptors rather than a single
numerical scale commonly used in research settings.
3 Critical Issues
One overarching issue about the construct of comprehensibility is its role among other global
measures of speaking (e.g., intelligibility) and specific metrics of speakers’ performance (e.g.,
pronunciation accuracy and fluency). The key question is whether and to what degree scalar
ratings of comprehensibility can be useful for language teachers and learners, researchers,
and language speakers more generally.
Comprehensibility and Understanding

First and foremost, comprehensibility judgements can be useful to researchers and practi-
tioners as a measure of L2 comprehension. Although intelligibility is often regarded as the
gold standard for evaluating listeners’ actual understanding of L2 speakers (Derwing &
Munro, 2015), scalar ratings of comprehensibility are a useful measure in many contexts. To
begin with, comprehensibility ratings are practical and intuitive, and can be elicited and
scored easily using speech samples featuring the same content. In contrast, intelligibility
measures require tasks with unique speech content for each instance when intelligibility is
measured (to avoid greater intelligibility for content repeated to listeners) and comparatively
175
Pavel Trofimovich et al.
more time for listeners to complete the tasks. Comprehensibility ratings are also reliable
across listeners, meaning that they generally agree with each other regardless of how com-
prehensibility is measured (Munro, 2018; Nagle, 2019). By comparison, intelligibility scores
often vary across task type, influenced by the nature of the speech sample and the type of
listening task used to measure intelligibility (Kang et al., 2018; Kennedy, 2009). Most im-
portantly, although intelligibility and comprehensibility are partially independent, compre-
hensibility ratings provide a reasonable estimate of listeners’ actual understanding of speech
(Sheppard et al., 2017). For instance, Munro and Derwing (1995a) reported substantial
overlaps between these dimensions, with correlation coefficients approaching .90, although
the magnitude of this link might vary for different speakers and listeners (Matsuura et al.,
1999). An intuitive, easy-to-use scalar measure, comprehensibility might thus be a useful
general metric of understanding in several contexts of language teaching, learning, and use.
Comprehensibility and Linguistic Content of Speech

Besides being a practical measure of understanding, comprehensibility ratings are also
shaped by the linguistic content of speech, making them a useful metric of how linguistic
dimensions of utterances impact the listener. By identifying linguistic correlates of com-
prehensibility, language researchers and teachers might target (through instruction or as-
sessment) the dimensions of L2 speech that hinder listener processing. In their initial work,
Munro and Derwing (1995a) found associations between listeners’ comprehensibility ratings
and several linguistic measures, including phonemic substitutions, intonation accuracy, and
morphosyntactic errors. More recent work has revealed two constellations of linguistic di-
mensions relevant to comprehensibility: pronunciation (individual segments, prosody, flu-
ency) and lexicogrammar (variety and richness of vocabulary, accuracy and complexity of
grammar). The exact combinations of linguistic dimensions feeding into judgements of
comprehensibility can depend on the speaker’s linguistic background and the speaking task
(Crowther et al., 2018), but the general finding has been consistent. Many measures at the
level of segments, prosody, fluency, grammar, and discourse have been linked to listeners’
ratings of comprehensibility in multiple languages (Isaacs & Trofimovich, 2012; Saito et al.,
2017a). To take an example from L2 English, Kang et al. (2010) reported that 50% of the
variance in comprehensibility can be explained by fluency and prosody (e.g., intonation,
pausing, and speech rate). For L2 French, Bergeron and Trofimovich (2017) found links
between comprehensibility ratings and measures of pronunciation, lexicogrammar, fluency,
and discourse richness. In L2 German, O’Brien (2014) showed that comprehensibility was
tied to fluency and accuracy measures for vocabulary, morphology, and pronunciation. The
assumption underlying this work is that teachers might specifically target these linguistic
dimensions through instruction to help L2 speakers become more comprehensible to
interlocutors.
Comprehensibility and Processing Fluency

Comprehensibility might also be useful to researchers and practitioners as a measure of
processing fluency, defined as a listener’s subjective experience of the ease or difficulty with
which information is processed (Reber & Greifeneder, 2017; Schwarz, 2018). A key aspect of
processing fluency cutting across various social and psychological domains is that people
appraise and respond to various situations based on the perceived difficulty they report while
processing a stimulus (e.g., text, image, and sound), which may or may not reflect their actual
experience with that stimulus. For instance, statements attributed to people whose names are
176
harder to pronounce are considered less trustworthy (Newman et al., 2014), regardless of the
content of the statements. Similarly, readers exposed to text printed in a difficult-to-read font
react more negatively than those reading the same text in an easy-to-read font, despite having
similar text comprehension for both conditions (Sanchez & Jaeger, 2015; Song & Schwarz,
2008). Munro and Derwing (1995a) observed that comprehensibility might be rated differ-
ently for speech that is perfectly intelligible, which aligns with findings from processing
fluency studies that listeners’ various reactions to speech and speakers might be linked not to
actual understanding (intelligibility) but to comprehensibility.
Growing evidence suggests that comprehensibility captures socially important decisions
for listeners. For instance, in social–psychological research, speakers who listeners perceived
as hard to understand were downgraded in listeners’ affective and attitudinal evaluations.
Such speakers were ascribed negative emotions of annoyance and irritation and deemed less
intelligent and successful (Dragojevic et al., 2017). Similarly, in an e-learning study, when
students evaluated an instructional video narrated by the instructor who was rated hard to
understand, students downgraded their evaluations of the instructor, expressed negative
attitudes towards coursework, and evaluated video content as more difficult, even though
students’ actual understanding of the video was not compromised (Sanchez & Khan, 2016).
In fact, a comprehensibility scale akin to that used in L2 speech research has now been
validated as part of a five-item processing fluency measure that appears to explain various
human judgements (truthfulness, preference, and perceived risk) all formerly attributed to
processing fluency (Graf et al., 2018). Thus, an intuitive appeal of comprehensibility as a
measure of processing fluency is that it might help to explain aspects of human behaviour,
including, for instance, whether interlocutors continue interacting with speakers they find
difficult to understand or whether university students drop out of courses led by instructors
whose speech they consider hard to process.
Comprehensibility and the (Imagined) Interlocutor

The usefulness of comprehensibility as a measure of L2 speakers’ success at conveying their
message or as a measure of the linguistic dimensions that matter most for comprehensibility
hinges on the key issue of who judges comprehensibility. If the response to this question is
native speakers, then the follow-up question should be, “Why are native speakers the only
suitable judges of comprehensibility?”
Whereas early L2 comprehensibility research relied nearly exclusively on native-speaking
listeners (e.g., Munro & Derwing, 1995a), more recently, researchers have employed more
varied listener groups, including L2 speakers (Crowther et al., 2016; Derwing & Munro,
2013; O’Brien 2014), bilinguals and multilinguals (Saito & Shintani, 2016), and members of
specific academic and professional groups regardless of native speaker status (Derwing &
Munro, 2009; Kennedy & Trofimovich, 2013; Sheppard et al., 2017). Prior research has
revealed multiple listener characteristics that impact comprehensibility judgements, including
listeners’ experience with the language being evaluated (Munro et al., 2006), listeners’
teaching experience (Saito et al., 2017b) and linguistic training (Isaacs & Thomson, 2013),
listener status as bilinguals or multilinguals (Saito & Shintani, 2016), and their L2 learning
experience (Saito et al., 2019). Nevertheless, despite minor differences, various listener
groups appear to be similar in the quality and consistency of the comprehensibility ratings
they assign to the same L2 speakers (Crowther et al., 2016; Derwing & Munro, 2013; Saito
et al., 2017b), even though listeners might rely on different criteria to arrive at similar ratings
(Foote & Trofimovich, 2018; Isaacs & Thomson, 2013). In light of this evidence, although
researchers may find it practical to elicit only “native” listeners’ comprehensibility ratings,
177
this would seem short sighted. It may be irresponsible to presume that L2 speakers will
exclusively speak with native speakers, especially for languages of major global or regional
significance (e.g., English, Mandarin, and Spanish).

Although early research focusing on comprehensibility predominantly examined speaker and
listener characteristics relevant to comprehensibility (e.g., Munro & Derwing, 1995a; Gass &
Varonis, 1984; Tyler & Bro, 1992), more recent work has adopted a multidimensional per-
spective, exploring comprehensibility as a socially driven, dynamic construct interdependent
on both speaker and listener. Researchers have also intensified applied research examining
the effect of instructional interventions on comprehensibility.
Comprehensibility – Pedagogically Relevant

One rapidly expanding and valuable research strand focuses on how to help L2 speakers to
speak comprehensibly. Comprehensible speech takes time and effort to develop. L2 speakers
are not generally judged more comprehensible when performing the same speaking task a
second time; nor do they sound any more comprehensible to listeners after being told to
make their speech as easy for the interlocutor to understand as possible (Strachan et al.,
2019). Similarly, comprehensibility might not greatly improve for L2 speakers by virtue of
attending a university in the target language, in the absence of pronunciation instruction
(Kennedy et al., 2015). Even when instruction is available, improving comprehensibility is
not guaranteed during one academic term (Kennedy & Trofimovich, 2010). In a landmark 7-
year study, Derwing and Munro (2013) showed that Slavic but not Mandarin adult im-
migrants to Canada showed an improvement in their L2 English comprehensibility ratings
across three timepoints (2 months, 2 years, and 7 years of residence). The success of the
Slavic group was attributed to their greater integration into English-speaking communities
and greater exposure to and use of English. Thus, even after many years of language use
opportunities, some L2 speakers may not adopt speech patterns that would be judged easier
to understand, suggesting that comprehensibility is an important target for instruction.
Fortunately, comprehensibility can be improved through instruction. For instance,
Derwing et al. (1998) showed that supplementing regular language instruction with focused
teaching on speech fluency (e.g., speaking rate) and prosody (e.g., intonation, stress, and
rhythm) led to significant improvement in L2 learners’ comprehensibility after 12 weeks.
Extensive interaction through video-conferencing (Saito & Akiyama, 2017) and specific
teaching techniques such as shadowing (Foote & McDonough, 2017) have also been shown
to positively impact comprehensibility. In a rare longitudinal study of comprehensibility
development during one year of university instruction in L2 Spanish, Nagle (2018) showed
that learners’ comprehensibility improved through communicative teaching alone, without a
dedicated pronunciation focus, and that the extent of improvement was unrelated to lear-
ners’ motivational profiles. In contrast, Saito et al. (2018) reported links between Japanese
high school students’ comprehensibility gains and measures capturing students’ motivation
and positive feelings towards their learning. It appears that focused instruction is useful even
for learners who have become entrenched in their speech patterns. Derwing et al. (2014)
taught a 17-hour pronunciation course to L2-speaking employees at a Canadian factory.
These L2 speakers, who on average had resided in Canada for 19 years, showed significant
progress in their comprehensibility ratings after instruction. An interim conclusion is that
comprehensible L2 speech can be taught and learned, which has prompted researchers to
178
promote comprehensibility as a measure of L2 learning compatible with principles of

communicative language teaching (Saito & Plonsky, 2019).
Comprehensibility – Dynamic
Although pedagogically oriented investigations of comprehensibility have tracked L2
speakers’ comprehensibility over weeks, months, and sometimes years, comprehensibility
has rarely been framed as a dynamic, variable process which can change on a finer-grained
timescale, as a matter of minutes or seconds. Nagle et al. (2019) explored whether com-
prehensibility can be construed as dynamic, examining how raters explain their assessments
as they evolve over time. Twenty-four Spanish-speaking listeners evaluated 3-minute
personal narratives by L2 Spanish learners using a computer interface which allowed lis-
teners to increase or decrease the comprehensibility rating as the speech unfolded. Listeners
showed varying rating profiles, such that some listeners increased or decreased compre-
hensibility ratings infrequently over a speech sample whereas others increased or decreased
ratings at a high frequency, with varying magnitude of change. In a follow-up study,
Trofimovich et al. (2020) reasoned that interactive speech, where interlocutors react to one
another in real time, might be even more amenable to dynamic comprehensibility judge-
ments, compared to the one-way listening task that Nagle et al. (2019) examined. For this
study, L2 English university students from different language backgrounds engaged in
collaborative tasks over 17 minutes, rating their partner’s comprehensibility at 2- to 3-
minute intervals. Speakers’ comprehensibility ratings for the most part followed a U-
shaped function, with comprehensibility (initially perceived to be high) dipping to lower
levels but then reaching high levels by the end of the interaction. Speakers’ ratings also
became more similar to each other soon after the interaction started and remained alike
throughout. Taken together, these findings not only suggest that speakers’ comprehensi-
bility can change over time as interaction unfolds but also imply that comprehensibility
issues might become less important for both interlocutors in a conversation after a certain
minimum threshold of comprehensibility has been reached. Whether such a threshold in-
volves a degree of interpersonal comfort or is simply a matter of investing sufficient time
into communication is an area for future research.
Comprehensibility – Socially Flexible

That listeners’ understanding of speech is subject to social influences is relatively well known.
For instance, Rubin and Smith (1990) showed that perceived speaker ethnicity influenced
listeners’ comprehension of a lecture. Listeners who were led to believe that the speaker was
Asian demonstrated less understanding than those who were told that the same speaker was
Caucasian. Recent work has revealed several similar social influences on comprehensibility.
For instance, Sheppard et al. (2017) showed that university faculty who reported negative
attitudes towards the English proficiency of international students gave lower comprehen-
sibility ratings to students’ L2 speech than did faculty with positive attitudes, despite both
groups transcribing the speech with equal accuracy. In another study, Taylor Reid et al.
(2019) presented listeners with either a positive or a negative biasing anecdote about lan-
guage abilities of L2 speakers, before asking listeners to evaluate comprehensibility.
Compared to the assessment of baseline listeners (no anecdote), positively oriented listeners
rated speakers more favourably while negatively oriented listeners (especially older in-
dividuals with likely more entrenched social attitudes) evaluated the same speakers more
negatively. Taylor Reid et al. (2020b) showed that teachers of L2 German, particularly native
179
speakers of German, were similarly influenced by negative comments, downgrading L2

German learners’ ratings relative to the assessment of baseline listeners. However, it appears
that positive and negative social biases can be mitigated through relatively simple inter-
ventions. In a follow-up study, Taylor Reid et al. (2021) demonstrated that perspective
taking (Hansen et al., 2014; Weyant, 2007), which they defined as asking listeners to practice
the target speech task before engaging in speech rating, was effective at eliminating negative
bias effects in rating comprehensibility (see also Taylor Reid et al., 2020a). Put differently,
encouraging listeners to “walk in the shoes” of people whose speech was evaluated may have
immunized listeners against various attitudinal comments that they were exposed to before
the rating. These findings underscore comprehensibility as a socially and contextually flexible
construct influenced by positive and negative biases and highlight the value of interventions
that can minimize these biases in research and practice settings.

In research contexts, comprehensibility is most often measured using 9-point numerical
rating scales (Munro & Derwing, 1995a), although other scale lengths have been attested,
including 5-point scales (Isaacs & Thomson, 2013), 7-point scales (Ludwig & Mora, 2017),
8-point scales (Polyanskaya & Ordin, 2019), and 10-point scales (Caspers, 2010), often
without thorough validation (Thomson, 2018). In Munro and Derwing’s (1995a) initial
study, the lowest scalar value is described as “extremely easy to understand,” whereas the
highest scalar value designates speech that is “impossible to understand” (p. 79). Other re-
searchers reverse the scalar extremes to approximate what they consider a more intuitive
scale direction, where a higher value represents better (more comprehensible) performance
(Isaacs & Trofimovich, 2012), or replace verbal endpoint descriptions with images of smiling
and frowning faces illustrating scale directionality (Taylor Reid et al., 2019).
Researchers have occasionally opted for continuous scales over Likert scales. In a paper-
and-pencil format, raters indicate their judgement by marking a location on a straight line
bounded by endpoint descriptors, and researchers measure the distance (e.g., in millimetres)
between the left endpoint and the mark (Isaacs et al., 2015). In a computer-based format,
raters move a slider to record their rating, typically on a 1,000-point continuum (Saito et al.,
2017a). Another approach is to ask raters to estimate the proportion of words that they can
understand using a percentage scale (Isaacs, 2008; Kang et al., 2018). Yet another option is
to measure comprehensibility through direct magnitude estimation by comparing the target
speech sample with a reference item (Munro, 2018). As an index of processing difficulty,
comprehensibility has also been operationalized in some research in terms of the time it takes
for listeners to process speech content (Ludwig & Mora, 2017; Munro & Derwing, 1995b) or
examined through listeners’ performance in a concurrent reaction-time task, such as mon-
itoring tones while trying to understand and remember the lecture’s content (Hahn, 2004).
More recently, comprehensibility has been measured dynamically, using Idiodynamic
Software (MacIntyre, 2012), with listeners upgrading and downgrading ratings in real time as
they experience L2 speech (Nagle et al., 2019).
Most comprehensibility studies have targeted English, with few studies focusing on other
languages, including German (O’Brien, 2014), French (Bergeron & Trofimovich, 2017),
Spanish (Nagle, 2018), Japanese (Saito & Akiyama, 2017), and Korean (Isbell et al., 2019).
Target speakers are mostly university students but rarely school-age learners (e.g., Saito
et al., 2018) and almost never children or older individuals (but see Derwing et al., 2014). To
elicit speech for comprehensibility rating, researchers have used controlled read-aloud tasks
involving individual words (e.g., Caspers, 2010) and sentences (e.g., Kennedy & Trofimovich,
180
2008), as well as various tasks eliciting monologic, extemporaneous speech performances,

including timed and untimed picture descriptions (e.g., Derwing et al., 2008), tasks from
operational L2-speaking tests (e.g., Isaacs et al., 2015) or practice test items (e.g., Crowther
et al., 2018), argumentative tasks (e.g., Suzuki & Kormos, 2020), and mock job interviews
(e.g., Kennedy & Trofimovich, 2013). Only recently have researchers begun investigating
comprehensibility for both members of dyads engaged in conversation (e.g., Trofimovich
et al., 2020). Comprehensibility ratings are often collected through paper-and-pencil ques-
tionnaires but increasingly so through online interfaces (e.g., Crowther et al., 2016; Saito &
Shintani, 2016) and online crowdsourcing platforms (Nagle, 2019).
Several generalizations have emerged from past methodologically oriented research fo-
cusing on speech ratings:
• There is little difference in the ratings obtained through the use of 5- versus 9-point
scales, although shorter scales are sometimes perceived by listeners as constraining
whereas longer scales are considered difficult for differentiating across skill levels (Isaacs
& Thomson, 2013). Compared to direct magnitude estimation of comprehensibility,
9-point scales perform just as well, suggesting that the use of scalar ratings is a reliable
approach to measuring L2 comprehensibility for research purposes (Munro, 2018).
• Evaluations of individual short sentences by the same speaker often lack consistency,
suggesting that ratings of shorter speech samples might not be representative of ratings
of longer discourse produced by the same speaker (Munro, 2018).
• Listeners sometimes assign harsher ratings when evaluating the same samples again,
because listeners might become increasingly aware of how the speakers’ output differs
from the language expected by listeners (Flege & Fletcher, 1992; Munro &
Derwing 1994).
• Comprehensibility ratings do not appear to be influenced by whether this dimension is
evaluated separately or in combination with other global dimensions such as accent-
edness and fluency (O’Brien, 2016) or by the order in which comprehensibility judge-
ments occur in a rating sequence (Derwing & Munro, 1997; O’Brien, 2016).
• Speech ratings obtained in online environments with built-in controls (e.g., through
crowdsourcing platforms) yield highly reliable judgements, comparable to those ob-
tained in research laboratories (Nagle, 2019).
As Isaacs et al. (2015) note, regardless of the method used to capture comprehensibility, in
the absence of detailed guidance, raters may interpret the target construct in different ways,
for example, assuming that it refers to listeners’ perceptions of understanding the overall
message, to understanding every single word that is uttered, or solely to understanding
meaning-laden words. Put simply, some listeners’ interpretations of comprehensibility may
differ from other listeners’ interpretations and might not exactly conform to what the re-
searchers believe they are measuring. This is important to establish in light of construct
validity for comprehensibility measurement, which has only been examined infrequently to
date (Isaacs & Thomson, 2013; Munro, 2018; Nagle, 2019).

With respect to implications of comprehensibility research for language teaching, it is
encouraging that many researchers and teachers see comprehensibility as a construct with
multiple elements, not just pronunciation. Elements that may be relevant or important
should be highlighted as much as possible in teaching and learning materials and in
181
contexts for teaching or tutoring L2 speech or familiarizing listeners with L2 speech. For
instance, the type of speaking task, the speaker’s use of vocabulary and grammar, the
listener’s level of motivation, attitude towards or experience with L2 speech and learning
generally are all elements that could be linked to comprehensibility. Teachers, speakers,
and listeners could then work on elements over which they potentially have control, such as
pronunciation, vocabulary, attitude, motivation, or experience with L2 speech. Clearly,
neither teachers and speakers nor listeners can attend simultaneously to all elements po-
tentially linked to comprehensibility. The importance of particular elements varies ac-
cording to the person and context, and speakers and listeners should be encouraged to
enhance their awareness of these elements through awareness-building activities, including
guided analysis of self or others’ comprehensibility or spoken language (e.g., Derwing
et al., 2002; Krech Thomas, 2004).
In light of comprehensibility’s dynamic nature, it is important for researchers to consider
how the length of a speech sample or frequency of rating might impact comprehensibility.
Because comprehensibility trends upward over the course of interactions, L2 speakers aiming
for comprehensibility or increased confidence in their comprehensibility should be en-
couraged to seek opportunities for spoken interactions which are not brief. These might be
found in group discussions or brainstorming sessions, interviews, workshops, and commu-
nity group meetings. Another consideration is speakers’ overall affective and motivational
profiles which have been linked to comprehensibility ratings. Many researchers and teachers
try to ensure that their research and teaching contexts are not stressful for speakers or lis-
teners. Rehearsal of tasks and self-reports of anxiety, willingness to communicate, and
motivation could also help to modulate or document the possible influence of these and other
variables on interlocutors’ comprehensibility. Confidence could be promoted by teachers or
L2 speakers through calming or self-affirming exercises prior to and during spoken inter-
action. On a more global scale, enjoyable learning environments, where teachers promote
positivity, encourage learners’ desire to communicate, increase motivation, and reduce an-
xiety (e.g., Moskowitz & Dewaele, 2021), are likely most conducive to the development of
comprehensibility and successful L2 learning generally.
Because comprehensibility appears to be influenced by social variables, language tea-
chers might engage their learners in initiatives which involve structured opportunities for
positive contact between various types of interlocutors. Other initiatives might target
native-speaking listeners, to help them discover some differences between their language
and another language, do structured practice in transcribing L2 speech, or take the per-
spective of an L2 peer. The goal would be to guide people to consider different facets of
individuals with whom they might unknowingly share linguistic and social commonalities,
as a way of promoting harmonious communication (Hansen et al., 2014). With en-
couragement and support from administration and managers, formal or informal activities
such as happy hours, sharing circles, or language classes can also be done in workplaces
with colleagues from different backgrounds (Kim et al., 2019). Of course, individuals can
themselves initiate contact with L2 speakers or try to learn or use a less familiar language,
and so reduce anxiety or develop more positive attitudes about communication with L2
speakers. For research, the importance of attitudes towards L2 speech means that eliciting
measures of raters’ attitudes will add another dimension to the analysis and interpretation
of comprehensibility ratings. Moreover, social biases which a rater is exposed to before
rating can affect the rating itself. Researchers should carefully consider who is involved in
administering the rating session (e.g., a majority or minority language speaker in a given
context), what is said prior to the ratings, and how these factors could influence raters and
the scores they assign.
182
7 Future Directions
Comprehensibility is an appealing construct because it connects language learners and tea-
chers, who might be interested in improving L2 oral production, with researchers, whose goal
is to describe what linguistic, social, experiential, and behavioural dimensions underlie
people’s experience with speech. The breadth of theoretical questions and the versatility of
applied contexts relevant to comprehensibility make for exciting future research. For ex-
ample, researchers could intensify longitudinal research examining how learners with dif-
ferent cognitive, motivational, experiential, and affective profiles develop comprehensible L2
speech across different contexts, both instructed and uninstructed. In keeping with a dynamic
view of comprehensibility, researchers might continue exploring interlocutors’ comprehen-
sibility in paired or group interaction. This work could clarify how interlocutors’ cumulative
shared experience impacts their comprehensibility ratings in tasks that increase versus de-
crease in cognitive difficulty over time and examine how non-linguistic cues (e.g., gestures,
facial expressions, and displays of emotion) and interactional variables (e.g., backchannelling
and clarification requests) contribute to interlocutors’ mutual comprehensibility judgements.
Researchers might explore links between interaction-based comprehensibility ratings and
interlocutor awareness of what makes speech comprehensible for them, using different
combinations of interlocutors who vary in language proficiency, experience, and other
variables (e.g., personality characteristics). Similarly, it might be useful to explore long-term
effects of interlocutors’ extended conversational experience on their perception of compre-
hensibility, focusing on speakers’ judgements of the same and new partners in another in-
stance of interaction, after a delay. In light of demonstrated alignment between both
partners’ comprehensibility scores in extended interaction, it could be fruitful to examine the
validity of a joint (rather than speaker-specific) measure of comprehensibility for both
partners in a conversational dyad. Given attitudinal influences on comprehensibility, re-
searchers might explore situations where interlocutors’ comprehensibility judgements are
influenced by one or both interlocutors’ sociopolitical views, stereotypical judgements, or
other attitudes towards the speaker or the topic of conversation. Finally, comprehensibility
ratings, as useful measures of listener understanding and listener processing fluency, could be
examined in relation to such conversational phenomena as speakers’ engagement in dialogue,
participation patterns, or affective responses to the task or their partner, to clarify the role of
processing effort in interlocutor experience in interaction.
8 Conclusion
Beyond a doubt, comprehensibility is a valuable construct, relevant to both speakers and
listeners and useful for both researchers and educational practitioners. People’s perceptions
of each other’s comprehensibility are subject to social influences, evolve dynamically over
time as communication unfolds, are tied to many linguistic (and some non-linguistic) features
of interaction, and affect other aspects of people’s judgements, such as how annoying or
intelligent a person is. These characteristics make comprehensibility a worthy conceptual and
practical target. By understanding how interlocutors perceive each other’s comprehensibility
(in terms of what comprehensibility means for them), it might be possible to empower
speakers to become more successful L2 communicators. Nevertheless, comprehensibility is
but one of several constructs relevant to L2 speech. To gain a clearer understanding of the
teaching and learning of L2 speech would unquestionably require the use of multiple com-
plementary metrics of listeners’ understanding, including measures of comprehensibility,
intelligibility, and listening comprehension.
183
Further Reading
Derwing, T. M., & Munro, M. J. (2015). Pronunciation fundamentals: Evidence-based perspectives for L2
teaching and research. John Benjamins.
A state-of-the-art review of literature on various constructs relevant to L2 pronunciation, including
comprehensibility.
Isaacs, T., & Trofimovich, P. (Eds.). (2016). Second language pronunciation assessment: Interdisciplinary
perspectives. Multilingual Matters.
This open access resource features multiple research contributions relevant to pronunciation assess-
ment, including the assessment of comprehensibility.
Isaacs, T., Trofimovich, P., & Foote, J. A. (2018). Developing a user-oriented second language com-
prehensibility scale for English-medium universities. Language Testing, 35, 193–216.
This study describes the development and validation of a comprehensibility-focused scale for English
for Academic Purposes teachers.
The first study investigating comprehensibility from a dynamic perspective.
References
Anderson-Hsieh, J., Johnson, R., & Koehler, K. (1992). The relationship between native speaker
judgments of nonnative pronunciation and deviance in segmentais, prosody, and syllable structure.
Bergeron, A., & Trofimovich, P. (2017). Linguistic dimensions of accentenedness and comprehensi-
bility: Exploring task and listener effects in second language French. Foreign Language Annals, 50,
547–566.
British Council (2020). How IELTS is assessed. Retrieved 5 April 2020 from https://takeielts.british
council.org/teach-ielts/test-information/assessment
Caspers, J. (2010). The influence of erroneous stress position and segmental errors on intelligibility,
comprehensibility and foreign accent in Dutch as a second language. Linguistics in the Netherlands,
27, 17–29.
Crowther, D., Trofimovich, P., & Isaacs, T. (2016). Linguistic dimensions of second language accent
and comprehensibility: Nonnative listeners’ perspectives. Journal of Second Language Pronunciation,
2, 160–182.
Crowther, D., Trofimovich, P., Saito, K., & Isaacs, T. (2018). Linguistic dimensions of L2 accentedness
and comprehensibility vary across speaking tasks. Studies in Second Language Acquisition, 40,
443–457.
Derwing, T. M., & Munro, M. J. (1997). Accent, intelligibility, and comprehensibility: Evidence from
four L1s. Studies in Second Language Acquisition, 19, 1–16.
Derwing, T. M., & Munro, M. J. (2009). Comprehensibility as a factor in listener interaction pre-
ferences: Implications for the workplace. Canadian Modern Language Review, 66, 181–202.
A 7‐year study. Language Learning, 63, 163–185.
teaching and research. John Benjamins.
Derwing, T. M., Munro, M. J., & Thomson, R. I. (2008). A longitudinal study of ESL learners’ fluency
and comprehensibility development. Applied Linguistics, 29, 359–380.
Derwing, T. M., Munro, M. J., & Wiebe, G. (1998). Evidence in favor of a broad framework for
pronunciation instruction. Language Learning, 48, 393–410.
Derwing, T. M., Munro, M. J., Foote, J. A., Waugh, E., & Fleming, J. (2014). Opening the window on
comprehensible pronunciation after 19 years: A workplace training study. Language Learning, 64,
526–548.
Derwing, T. M., Rossiter, M. J., & Munro, M. J. (2002). Teaching native speakers to listen to foreign-
accented speech. Journal of Multilingual and Multicultural Development, 23, 245–259.
Dragojevic, M., Giles, H., Beck, A.-C., & Tatum, N. T. (2017). The fluency principle: Why foreign
accent strength negatively biases language attitudes. Communication Monographs, 84, 385–405.
Fayer, J. M., & Krasinski, E. (1987). Native and nonnative judgments of intelligibility and irritation.
184
Flege, J., & Fletcher, K. (1992). Talker and listener effects on the perception of degree of foreign accent.
Journal of the Acoustical Society of America, 91, 370–389.
Foote, J. A., & McDonough, K. (2017). Using shadowing with mobile technology to improve L2
pronunciation. Journal of Second Language Pronunciation, 3, 34–56.
Foote, J., & Trofimovich, P. (2018). Is it because of my language background? A study of language
background influence on comprehensibility judgments. Canadian Modern Language Review, 74,
253–278.
Gass, S., & Varonis, E. M. (1984). The effect of familiarity on the comprehensibility of nonnative
speech. Language Learning, 34, 65–89.
Graf, L. K. M., Mayer, S., & Landwehr, J. R. (2018). Measuring processing fluency: One versus five
items. Journal of Consumer Psychology, 28, 393–411.
Hahn, L. D. (2004). Primary stress and intelligibility: Research to motivate the teaching of supraseg-
Hansen, K., Rakic, T., & Steffens, M. C. (2014). When actions speak louder than words: Preventing
discrimination of nonstandard speakers. Journal of Language and Social Psychology, 33, 68–77.
Isaacs, T. (2008). Towards defining a valid assessment criterion of pronunciation proficiency in non-
native English-speaking graduate students. Canadian Modern Language Review, 64, 555–580.
Isaacs, T., & Thomson, R. I. (2013). Rater experience, rating scale length, and judgments of L2 pro-
nunciation: Revisiting research conventions. Language Assessment Quarterly, 10, 135–159.
Isaacs, T., & Trofimovich, P. (2012). Deconstructing comprehensibility: Identifying the linguistic in-
fluences on listeners’ L2 comprehensibility ratings. Studies in Second Language Acquisition, 34,
475–505.
Isaacs, T., Trofimovich, P., & Foote, J. A. (2018). Developing a user-oriented second language com-
prehensibility scale for English-medium universities. Language Testing, 35, 193–216.
Isaacs, T., Trofimovich, P., Yu, G., & Chereau, B. M. (2015). Examining the linguistic aspects of speech
that most efficiently discriminate between upper levels of the revised IELTS Pronunciation scale.
IELTS Research Report Series, 4, 1–48.
Isbell, D. R., Park, O. S., & Lee, K. (2019). Learning Korean pronunciation: Effects of instruction,
proficiency, and L1. Journal of Second Language Pronunciation, 5, 13–48.
Kang, O., Rubin, D., & Pickering, L. (2010). Suprasegmental measures of accentedness and judgments
of language learner proficiency in oral English. The Modern Language Journal, 94, 554–566.
Kang, O., Thomson, R. I., & Moran, M. (2018). Empirical approaches to measuring the intelligibility
of different varieties of English in predicting listener comprehension. Language Learning, 68,
115–146.
Kennedy, S. (2009). L2 proficiency: Measuring the intelligibility of words and extended speech. In A.
Benati (Ed.), Issues in second language proficiency (pp. 132–144). Continuum.
Kennedy, S., & Trofimovich, P. (2008). Intelligibility, comprehensibility, and accentedness of L2
speech: The role of listener experience and semantic context. Canadian Modern Language Review, 64,
459–490.
Kennedy, S., & Trofimovich, P. (2010). Language awareness and second language pronunciation: A
classroom study. Language Awareness, 19, 171–185.
Kennedy, S., & Trofimovich, P. (2013). First-and final-semester non-native students in an English-
medium university: Judgments of their speech by university peers. Learning in Higher Education, 3,
283–303.
Kennedy, S., Foote, J. A., & Buss, L. K. (2015). Second language speakers at university: Longitudinal
development and rater behaviour. TESOL Quarterly, 49, 199–209.
Kim, R., Roberson, L., Russo, M., & Briganti, P. (2019). Language diversity, nonnative accents, and
their consequences at the workplace: Recommendations for individuals, teams, and organizations.
The Journal of Applied Behavioral Science, 55, 73–95.
Krech Thomas, H. (2004). Training strategies for improving listeners’ comprehension of foreign‐accented
speech (Doctoral dissertation). Retrieved from https://scholar.colorado.edu/concern/graduate_
thesis_or_dissertations/j098zb31g
Levis, J. (2018). Intelligibility, oral communication, and the teaching of pronunciation. Cambridge
University Press.
Ludwig, A., & Mora, J. C. (2017). Processing time and comprehensibility judgments in non-native
listeners’ perception of L2 speech. Journal of Second Language Pronunciation, 3, 167–198.
Ludwig, J. (1982). Native-speaker judgments of second-language learners’ efforts at communication: A
review. The Modern Language Journal, 66, 274–283.
185
MacIntyre, P. D. (2012). The idiodynamic method: A closer look at the dynamics of communication
traits. Communication Research Reports, 29, 361–367.
Matsuura, H., Chiba, R., & Fujieda, M. (1999). Intelligibility and comprehensibility of American and
Irish Englishes in Japan. World Englishes, 18, 49–62.
Moskowitz, S., & Dewaele, J.-M. (2021). Is teacher happiness contagious? A study of the link between
perceptions of language teacher happiness and student attitudes. Innovation in Language Learning
and Teaching, 15, 117–130.
Munro, M. J. (2018). Dimensions of pronunciation. In O. Kang, R. I. Thomson, & J. M. Murphy
(Eds.), The Routledge handbook of contemporary English pronunciation (pp. 413–431). Routledge.
Munro, M. J., & Derwing, T. M. (1994). Evaluations of foreign accent in extemporaneous and read
material. Language Testing, 11, 253–266.
Munro, M. J., & Derwing, T. M. (1995a). Foreign accent, comprehensibility, and intelligibility in the
ception of native and foreign-accented speech. Language and Speech, 38, 289–306.
Munro, M. J., Derwing, T. M., & Morton, S. L. (2006). The mutual intelligibility of L2 speech. Studies
Nagle, C. (2018). Motivation, comprehensibility, and accentedness in L2 Spanish: Investigating moti-
vation as a time‐varying predictor of pronunciation development. The Modern Language Journal,
102, 199–217.
Nagle, C. (2019). Developing and validating a methodology for crowdsourcing L2 speech ratings in
Amazon Mechanical Turk. Journal of Second Language Pronunciation, 5, 294–323.
Newman, E. J., Sanson, M., Miller, E. K., Quigley-McBride, A., Foster J. L., Bernstein, D. M., & Garry, M.
(2014). People with easier to pronounce names promote truthiness of claims. PLoS ONE, 9(2), e88671.
O’Brien, M. G. (2014). L2 learners’ assessments of accentedness, fluency, and comprehensibility of
native and nonnative German speech. Language Learning, 64, 715–748.
O’Brien, M. G. (2016). Methodological choices in rating speech samples. Studies in Second Language
Piazza, L. G. (1980). French tolerance for grammatical errors made by Americans. The Modern
Polyanskaya, L., & Ordin, M. (2019). The effect of speech rhythm and speaking rate on assessment of
pronunciation in a second language. Applied Psycholinguistics, 40, 795–819.
Reber, R., & Greifeneder, R. (2017). Processing fluency in education: How metacognitive feelings shape
learning, belief formation, and affect. Educational Psychologist, 52, 84–103.
Rubin, D. L., & Smith, K. A. (1990). Effects of accent, ethnicity, and lecture topic on undergraduates’
perceptions of nonnative English-speaking teaching assistants. International Journal of Intercultural
Relations, 14, 337–353.
Saito, K., & Akiyama, Y. (2017). Linguistic correlates of comprehensibility in second language
Japanese speech. Journal of Second Language Pronunciation, 3, 199–217.
Saito, K., & Plonsky, L. (2019). Effects of second language pronunciation teaching revisited: A
proposed measurement framework and meta‐analysis. Language Learning, 69, 652–708.
Saito, K., & Shintani, N. (2016). Do native speakers of North American and Singapore English dif-
ferentially perceive comprehensibility in second language speech? TESOL Quarterly, 50, 421–446.
Saito, K., Dewaele, J.-M., Abe, M., & In’nami, Y. (2018). Motivation, emotion, learning experience,
and second language comprehensibility development in classroom settings: A cross‐sectional and
longitudinal study. Language Learning, 68, 709–743.
Saito, K., Tran, M., Suzukida, Y., Sun, H., Magne, V., & Ilkan, M. (2019). How do L2 listeners
perceive the comprehensibility of foreign-accented speech? Roles of L1 profiles, L2 proficiency, age,
experience, familiarity and metacognition. Studies in Second Language Acquisition, 41, 1133–1149.
Saito, K., Trofimovich, P., & Isaacs, T. (2017a). Using listener judgements to investigate linguistic
influences on L2 comprehensibility and accentedness: A validation and generalization study. Applied
Saito, K., Trofimovich, P., Isaacs, T., & Webb, S. (2017b). Re-examining phonological and lexical
correlates of second language comprehensibility: The role of rater experience. In T. Isaacs & P.
Trofimovich (Eds.), Second language pronunciation assessment: Interdisciplinary perspectives
(pp. 141–156). Multilingual Matters.
186
Sanchez, C. A., & Jaeger, A. J. (2015). If it’s hard to read, it changes how long you do it: Reading time
as an explanation for perceptual fluency effects on judgment. Psychonomic Bulletin and Review, 22,
206–211.
Sanchez, C. A., & Khan, S. (2016). Instructor accents in online education and their effect on learning
and attitudes. Journal of Computer Assisted Learning, 32, 494–502.
Schwarz, N. (2018). Of fluency, beauty, and truth: Inferences from metacognitive experiences. In
J. Proust & M. Fortier (Eds.), Metacognitive diversity: An interdisciplinary approach (pp. 25–46).
Oxford University Press.
Sheppard, B. E., Elliott, N. C., & Baese-Berk, M. M. (2017). Comprehensibility and intelligibility of
international student speech: Comparing perceptions of university EAP instructors and content
faculty. Journal of English for Academic Purposes, 26, 42–51.
Song, H., & Schwarz, N. (2008). If it’s hard to read, it’s hard to do: Processing fluency affects effort
prediction and motivation. Psychological Science, 19, 986–988.
Strachan, L., Kennedy, S., & Trofimovich, P. (2019). Second language speakers’ awareness of their own
comprehensibility: Examining task repetition and self-assessment. Journal of Second Language
Pronunciation, 5, 347–373.
Suzuki, S., & Kormos, J. (2020). Linguistic dimensions of comprehensibility and perceived fluency: An
investigation of complexity, accuracy, and fluency in second language argumentative speech. Studies
Taylor Reid, K., O’Brien, M. G., Trofimovich, P., & Tsunemoto, A. (2020a). Exploring the stability of
second language speech ratings through task practice in bilinguals’ two languages. Journal of
Monolingual and Bilingual Speech, 2, 315–329.
Taylor Reid, K., O’Brien, M., Trofimovich, P., & Bajt, A. (2020b). Testing the malleability of teachers’
judgments of second language speech. Journal of Second Language Pronunciation, 6, 236–264.
Taylor Reid, K., Trofimovich, P., & O’Brien, M. G. (2019). Social attitudes and speech ratings: Effects
of positive and negative bias on multiage listeners’ judgments of second language speech. Studies in
Second Language Acquisition, 41, 419–442.
Taylor Reid, K., Trofimovich, P., O’Brien, M. G., & Tsunemoto, A. (2021). Using task practice to
reduce social influences on listener evaluations of second language accent and comprehensibility.
International Journal of Listening. Advanced Online Publication. https://doi.org/10.1080/10904018.
2021.1904933
Thomson, R. I. (2013). Accent reduction. In C. A. Chapelle (Ed.), The encyclopedia of applied linguistics
(pp. 8–11). Wiley-Blackwell.
Thomson, R. I. (2018). Measurement of accentedness, intelligibility, and comprehensibility. In O. Kang
& A. Ginther (Eds.), Assessment in second language pronunciation (pp. 11–29). Routledge.
Trofimovich, P., Nagle, C. L., O’Brien, M. G., Kennedy, S., Taylor Reid, K., & Strachan, L. (2020).
Second language comprehensibility as a dynamic construct. Journal of Second Language
Tyler, A., & Bro, J. (1992). Discourse structure in nonnative English discourse: The effect of ordering
and interpretive cues on perceptions of comprehensibility. Studies in Second Language Acquisition,
14, 71–86.
Varonis, E., & Gass, S. (1982). The comprehensibility of nonnative speech.Studies in Second Language
Weyant, J. M. (2007). Perspective taking as a means of reducing negative stereotyping of individuals
who speak English as a second language. Journal of Applied Social Psychology, 37, 703–716.
187
13
FLUENCY
Jimin Kahng
While having a conversation in a less proficient second language (L2), have you missed a turn
because you could not construct your message fast enough to keep up with the pace of the
conversation? Speaking is a skill under time pressure. Compared to their first language (L1),
people typically have less knowledge of their second language, and are also considerably less
fluent using the L2 knowledge they do have (Segalowitz, 2010). Consequently, fluency
constitutes a crucial aspect of understanding L2 performance and proficiency (e.g., Housen
et al., 2012; Iwashita et al., 2008).
The importance of fluency in second language acquisition and education has been widely
acknowledged by researchers and practitioners; however, defining the term has been a long-
standing issue in the field, mainly due to its polysemous nature (Schmidt, 1992). For ex-
ample, in “Emily speaks three languages fluently,” “fluently” can be casually substituted with
“well” and fluency relates to someone’s overall proficiency. This meaning is what Lennon
(1990, 2000) labelled as the “broad” sense of fluency. One of the broadest conceptualizations
of fluency was provided by Fillmore (1979). In the discussion of how well people speak their
L1, Fillmore identified four dimensions of fluency based on speed and smoothness, semantic
density and coherence, appropriateness, and creativity.
On the other hand, fluency can refer to a specific aspect of proficiency, complemented by
others, such as the accuracy and complexity of language use (Housen et al., 2012). This
relates to what Lennon (1990) called the “narrow” sense of fluency, defined as the “rapid,
smooth, accurate, lucid, and efficient translation of thought or communicative intention
under the temporal constraints of online processing” (2000, p. 26). The majority of L2 flu-
ency research has focused on this narrow sense, although there have been some discussions of
taking a more holistic approach to examining fluency (e.g., Wright & Tavakoli, 2016).
Segalowitz (2010) maintains that even the narrow sense of fluency itself is a multidimensional
construct, encompassing three distinct aspects – cognitive, utterance, and perceived fluency.
Cognitive fluency refers to a speaker’s capacity to utilize the underlying cognitive processes
responsible for fluent speech production (e.g., efficiency of lexical retrieval, grammatical/
phonological encoding). Utterance fluency refers to the temporal and repair characteristics of
speech (e.g., articulation rate, number of pauses or repairs). In contrast, perceived fluency
involves listeners’ inferences about the speaker’s cognitive fluency based on their speech.
188 DOI: 10.4324/9781003022497-17

Fluency
The distinction of the three aspects of fluency is a useful way to conceptualize and situate
various L2 fluency studies and will be used throughout the chapter.
Disfluencies such as silent and filled pauses (e.g., uh and um), repetitions, and self-corrections
are common in spontaneous speech. However, historically they have not always been central
in language research. For instance, Chomsky (1965) considered disfluencies to be random or
characteristic errors, and argued that disfluencies should be excluded from linguistic theory.
On the other hand, a psychologist, Frieda Goldman-Eisler, was one of the first researchers
to systematically examine temporal features and disfluencies in L1 spontaneous speech (e.g.,
1951, 1968). Since her pioneering work, research on disfluencies have been expanded in re-
lated fields such as psycholinguistics, speech language pathology, discourse analysis, socio-
linguistics, second language acquisition (SLA), and language assessment with different
focuses. Before discussing early work in SLA, a brief overview of different perspectives on
fluency research in these neighbouring fields will help us to contextualize the topic (see De
Jong, 2018 for a comprehensive review).
Psycholinguistic studies on disfluencies have mainly investigated when and why dis-
fluencies occur in L1 speech and how listeners process them. They showed that slower speech
and disfluencies occur when speech planning is challenging (e.g., describing complex things,
Goldman-Eisler, 1968; before low-frequency words, Beattie & Butterworth, 1979).
Disfluencies were initially hypothesized to be edited out by listeners (e.g., Levelt, 1989);
however, later studies showed disfluencies have functions such as having listeners anticipate
something complex or new (e.g., Brennan & Schober, 2001).
Fluency-related research in discourse analysis has focused on social-interactional function
of disfluencies in conversation. Some of the main findings suggest that disfluencies are in-
teractional devices to regulate turn-taking; different types of disfluencies such as silent and
filled pauses are used to hold the floor or to give the floor to interlocutors (e.g., Maclay &
Oswood; 1959). Sociolinguists have investigated how individuals differ in speaking style
including fluency features based on various factors such as age, gender, socioeconomic
status, and personality. In particular, linguistic markers of powerlessness including hesita-
tions have been perceived to be less competent, less attractive, and less trustworthy (see
Hosman, 2015, for an overview).
Research on L2 fluency started in the 1970s and 1980s (e.g., Dechert & Raupach, 1987).
From the 1990s, L2 fluency research saw considerable growth in the fields of SLA and
language testing. This increased interest led to a special issue in the International Review of
Applied Linguistics in Language Teaching and a few books on L2 fluency, including the first
edited volume on L2 fluency from multiple disciplines by Riggenbach (2000), Segalowitz’s
(2010) monograph on fluency from a cognitive science perspective, and two recent additions
by Lintunen et al. (2020), and Tavakoli and Wright (2020).
Among early works, Lennon (1990) is one of the most widely cited studies on L2 fluency.
This paper introduced the broad and narrow senses of fluency and identified temporal speech
features reflecting L2-perceived fluency. Four advanced EFL learners told a story based on
pictures at the beginning and end of 6-months’ residence in the United Kingdom. Their
speech samples were rated by 10 EFL teachers for fluency and were also analyzed to obtain
12 objective measures of utterance fluency, for example, words per minute, repetitions, self-
corrections, and filled pauses per T-unit [a T-unit is “one main clause with all subordinate
clauses attached to it” Hunt, 1965, p. 20]. Most raters agreed that the participants improved
in overall fluency and their speech improved in terms of speech rate, filled pauses per T-unit,
189
Jimin Kahng
and percentage of T-units followed by pause, suggesting them to be good indicators of

perceived fluency.
Some early fluency research addressed issues related to cognitive fluency. Towell et al.
(1996) examined development of utterance fluency in relation to cognitive fluency based on
Anderson’s (1983) ACT theory, in which fluent performance is viewed as a result of pro-
ceduralization, or the conversion of declarative knowledge (knowledge that) into procedural
knowledge (knowledge how). They aimed to investigate the process of proceduralization and
what is proceduralized. Twelve British learners of French retold a film story at the beginning
and end of a year abroad in a French-speaking country. After the residence abroad, speaking
rate became faster and mean number of syllables between pauses increased. Qualitative
analysis showed that fluent participants were better able to use formulaic language with
fewer pauses, especially after a residence abroad and they interpreted it as evidence for
proceduralization. [Formulaic language or a formulaic sequence is a sequence of words,
which appears to be prefabricated (see Wray, 2002; Chapter 21)]. The aforementioned and
other early studies on fluency set the conceptual and methodological stage for later studies on
different aspects of L2 fluency research.

The overarching purpose of previous studies on L2 fluency has been to identify speech
features that function as reliable indicators of L2 fluency. Speech features have often been
operationalized as shown in Table 13.1 (based on De Jong (2018, p. 240, Table 1), Derwing
(2019, p. 247, Table 14.1, and Kormos (2006, p. 163, Table 8.2). However, in achieving the
purpose, different approaches have been used according to the conceptualization of fluency.
As captured in the title of Freed (2000), “Is fluency in the eyes (and ears) of the beholder?”
the majority of previous studies treated fluency as a construct to be defined by listeners.
These studies typically investigated the relationship between objective measures of L2 ut-
terance fluency and subjective ratings of perceived fluency. On the other hand, there are
studies that viewed fluency from the speaker’s perspective and examined the question by
tracking development of utterance fluency over time or by relating utterance fluency to
Table 13.1 Frequently used measures of utterance fluency
Measure Formula
Speech rate Number of syllables / total time

Articulation rate Number of syllables / (total time – silent pausing time)
Pruned syllables per second (Number of syllables – number of filled pauses and repairs) /
total time in seconds
Mean length of run (MLR) Mean number of syllables between silent pauses
Phonation time ratio Speaking time / total time
Mean length of silent pauses Pausing time / number of silent pauses
Number of silent pauses (per Number of silent pauses / total time
minute)
Number of filled pauses (per Number of filled pauses / total time
minute)
Number of repetitions (per minute) Number of repetitions / total time
Number of repairs (per minute) Number of repairs and restarts / total time
190
Fluency
proficiency. Finally, a few studies explored the underlying mechanism of fluent speech by
examining the relationship between measures of utterance fluency and those of cognitive
fluency. In doing so, L1 utterance fluency or speaking style has also been viewed as a major
factor of L2 utterance fluency. In the subsequent part, details about each of the issues and
topics will be discussed in depth.

Recent research findings on L2 fluency are discussed, focusing on the relationship between
L2 utterance fluency and (1) perceived fluency, (2) proficiency, (3) cognitive fluency, and (4)
L1 utterance fluency.
Relationship Between Utterance Fluency and Perceived Fluency

Utterance fluency can be objectively measured with temporal variables in speech (see
Table 13.1), and it consists of different aspects such as speed, breakdown (pause and hesi-
tation phenomena), and repair fluency (Tavakoli & Skehan, 2005). To identify objective
measures of utterance fluency that influence the perception of fluency, several previous
studies related subjective fluency ratings to temporal characteristics of utterance fluency. For
instance, Cucchiarini et al. (2002) related fluency ratings of speech samples from 30 begin-
ning and 30 intermediate learners of Dutch to the utterance fluency measures. The ratings
were best predicted by the number of phonemes per second for beginners and by the mean
length of run for the intermediate learners. Derwing et al. (2004) found significant correla-
tions between fluency ratings and pausing and pruned syllables per second. Kormos and
Dénes (2004) had native and non-native English teachers rate speech samples from
Hungarian learners of English and found for both groups of listeners speech rate, the mean
length of utterance, phonation time ratio, and the number of stressed words produced per
minute were the best predictors of perceived fluency.
Overall, the previous findings on the relationship between utterance fluency and perceived
fluency suggest that measures of speed and pausing are strongly associated with L2-perceived
fluency. However, as De Jong (2018) and Bosker et al. (2012) point out, some findings should
be interpreted with caution due to the problem of multicollinearity. As Table 13.1 shows,
there are numerous utterance fluency measures and selecting measures should be theoreti-
cally motivated. Depending on the purpose of a study, one could use a complex measure
involving multiple fluency features (e.g., pruned syllables and mean length of run) or use
multiple measures that are theoretically and mathematically distinct, thus not confounding.
For instance, even if one finds pausing does not significantly add to the explained variance of
fluency ratings, after mean length of run (or pruned syllables) is added to the model, the
finding is not meaningful because mean length of run or pruned syllables already includes
information about pausing. Another example of theoretically motivated selection of mea-
sures is recent studies’ inclusion of pause location measures (e.g., mid-clause vs. clause-final
pauses; Kahng, 2014, 2018, 2020; Saito et al., 2018), based on the findings on the association
between pause location with perceived fluency, fluency development, and cognitive fluency
(also see Segalowitz et al., 2017, for more discussion).
To circumvent the issue of multicollinearity, Bosker et al. (2012) selected variables that are
not highly interrelated. To this end, they purposefully excluded the most often used measures,
such as speech rate and mean length of run. Instead, they based their selection on the theo-
retical distinction between the three facets of utterance fluency (speed, breakdown, and repair
fluency) and chose the representative measures that do not confound the fluency aspects. With
191
Jimin Kahng
four separate experiments, they examined the relative contributions of the three fluency aspects
to perceived fluency and whether perceptual sensitivity to each aspect can explain the relative
contributions. They found each of the three fluency aspects explained a significant amount of
variance of perceived fluency (breakdown 60%, speed 54%, repair 16%, altogether 84%) al-
though repair measures made the least contribution. When listeners rated on only one of the
three fluency aspects, they were sensitive to each aspect, but a bit more sensitive to pauses
compared to speed or repairs. They concluded that the contributions of the three aspects of
fluency cannot be explained by perceptual sensitivity alone and listeners seem to weight the
importance of the perceived aspects of fluency for an overall judgement.
More recently, Saito et al. (2018) investigated temporal correlates of four different levels
of perceived fluency. They found that the number of clause-final pauses distinguished be-
tween low- and mid-levels of perceived fluency; the number of mid-clause pauses further
distinguished between mid- and high-levels of perceived fluency; and finally, articulation rate
further distinguished between high- and native-levels of perceived fluency, suggesting the
importance of articulation rate and mid-clause pauses in perceived fluency.
Studies on perceived fluency utilizing correlation analyses, however, cannot demonstrate
whether those utterance features actually cause different levels of perceived fluency.
Therefore, some studies utilized phonetic manipulations to examine causal relationships.
Munro and Derwing (1998) had L1 and advanced L2 speech samples and three additional
sets of the samples whose speed had been synthetically manipulated (with a mean L1 rate, a
mean L2 rate, and a reduced rate). English native listeners rated how appropriate each
speaker’s rate was using a 9-point scale (1 = too slowly, 5 = just right, 9 = too quickly). [This
scale differs from the typical scales of perceived fluency, in which the anchors refer to “very
disfluent” and “very fluent.” It is noteworthy that typical fluency rating scales do not have a
reference point for “too quickly” or “too fluent.”] The results showed that speeding up slow
L2 speech samples had a positive effect on the ratings, whereas listeners in general preferred
L2 speech presented at a slightly slower rate than that of L1 speech. More recently, Bosker
et al. (2014) examined the effects of speed, number and length of silent pauses on perceived
fluency of L1 and L2 speech. They manipulated L1 and L2 speech samples in terms of
pauses, by creating no-, short- and long-pause conditions, and speed, by speeding up L2
speech and slowing down L1 speech. They found for both L1 and L2 speech samples, the
number and the length of silent pauses negatively affected fluency ratings. In terms of speed,
following their predictions, speeding up non-native speech increased fluency ratings and
slowing down native speech decreased fluency ratings. Kahng (2018) further examined
whether the location of silent pauses has an impact on perceived fluency by manipulating
pause location in L1 and L2 speech. She constructed no-pause, pauses-within-clauses, and
pauses-between-clauses conditions and compared their fluency ratings. The results showed
that the no-pause condition was rated more fluent than the two conditions with pauses.
Crucially, for both L1 and L2 speech, the pauses-between-clauses condition was rated more
fluent than the pauses-within-clauses condition (see Wennerstrom, 2001, for a similar finding
with non-manipulated L2 speech). The findings of Bosker et al. (2014) and Kahng (2018)
suggest that although L1 and L2 speech may be different in terms of speed and disfluencies
and L1 speech is usually perceived to be more fluent than L2 speech, the factors of perceived
fluency seem to operate in a similar fashion in L1 and L2 speech.
Relationship Between Utterance Fluency and Proficiency

L2 researchers and test developers have also had a keen interest in identifying utterance
fluency measures that reliably indicate one’s L2 speaking proficiency. A few different
192
Fluency
methodologies have been used – comparing L2 with L1 speech, relating utterance fluency to
different speaking proficiency levels cross-sectionally, and tracking learners’ gains in utter-
ance fluency longitudinally.
When compared to L1 speech, L2 speech tends to be slower and have more pauses and
repairs (e.g., Kahng, 2014; Riazantseva, 2001). Importantly, there is also a difference in the
distribution of pauses. Compared to L1 speech, L2 speech has more pauses within clauses
(Kahng, 2014; Tavakoli, 2011) and within Analysis of Speech (AS) units (De Jong, 2016;
Skehan & Foster, 2007). [Foster et al. (2000, p. 365) define an AS unit as “a single speaker’s
utterance consisting of an independent clause, or sub-clausal unit, together with any sub-
ordinate clause(s) associated with either.”] De Jong (2016) further examined the effects of
word frequency on pause occurrences and found that both L1 and L2 speech are more likely
to have pauses before lower than higher frequency words. She concluded that the findings
suggest that in both L1 and L2 speech production, pauses at utterance boundaries are mainly
connected with conceptual planning, whereas pauses within utterances involve difficulties in
formulating the linguistic message including lexical retrieval.
Studies that related utterance fluency measures to speaking proficiency found moderate to
strong correlations (Ginther et al., 2010; Iwashita et al., 2008; Kahng, 2014, Kang et al.,
2010; Révész et al., 2016). For instance, Iwashita et al. (2008) analyzed spoken test perfor-
mances using a range of measures of grammatical accuracy and complexity, vocabulary,
pronunciation, and fluency and investigated which features best distinguish overall levels of
performance. They found that measures of fluency, especially speech rate, along with those
of vocabulary, had the strongest impact on distinguishing L2 speaking proficiency levels. De
Jong (2018) points out the rubrics of speaking proficiency of Iwashita et al. (2008) included
aspects of fluency, which might have led raters to pay more attention to fluency features.
However, a couple of studies (Kahng, 2014; Révész et al., 2016) on this topic whose raters
were not instructed to focus on fluency still reported significant correlations between utter-
ance fluency and speaking proficiency. In Kahng (2014), the rubric did not contain any
aspects of fluency yet she still found that overall speaking scores were significantly correlated
with mean syllable duration (inverse of articulation rate) and the number of silent pauses
within AS units.
Some studies have also tracked learners’ gains in utterance fluency over time. Since
Lennon’s (1990) and Towell et al.’s (1996) early work on the development of utterance fluency,
research on study abroad and related areas has pursued this line of research with various
L1–L2 speakers as participants. For example, O’Brien et al. (2007) found that after a semester
of study abroad, the English-speaking learners of Spanish improved speech rate and mean
length of run without fillers. In Mora and Valls-Ferrer (2012), after 3 months stay abroad,
Catalan-Spanish bilingual learners of English showed gains in speech rate, mean length of run,
pause frequency and duration. In their 2-year-long study with English-speaking learners of
Spanish, Huensch and Tracy-Ventura (2017b) reported gains in speed appeared quickly and
were maintained after return from study abroad whereas gains in pausing appeared later and
were sensitive to attrition after return home. Derwing and Munro (2013) followed two groups
of L2 immigrants to English-speaking Canada over 7 years and determined that fluency
continued to develop over time in the group that reported more interaction and exposure to
English, whereas the other group showed no significant fluency progress; most members of this
group reported limited exposure to English. The authors interpreted these findings through a
Willingness to Communicate framework (MacIntyre, 2007).
Taken together, studies on the relationship between utterance fluency and proficiency
suggest pure speed measures (e.g., articulation rate, mean syllable duration), and the fre-
quency of silent pauses, especially those within clauses or AS units seem to be reliable
193
Jimin Kahng
indicators of speaking proficiency. On the other hand, these studies do not tell us about what
cognitive processes are responsible for such fluent L2 speech. We will turn to the issue in the
next part.
Relationship Between Utterance Fluency and Cognitive Fluency

An underexamined research area in L2 fluency is cognitive fluency – what enables speakers to
produce fluent speech, or the cognitive processes responsible for fluent production in L2.
According to Levelt (1989), speech production consists of three main stages (see De Bot, this
volume). During conceptualization, a preverbal message is created using world knowledge.
This message is converted into words through lexical, grammatical, morphophonological,
and phonetic encoding during the formulation stage. Finally, during the articulation stage,
the generated utterance is physically articulated. Throughout the process, one’s own speech is
self-perceived and monitored. These stages are claimed to operate simultaneously; however,
for less fluent speakers, processes particularly involved in formulation, including lexical re-
trieval and grammatical encoding, may not be fully automatized, thus resulting in a
breakdown in parallel processing and the slowing down of speech (Kormos, 2006;
Segalowitz, 2010).
Only a few studies have thus far measured cognitive fluency in relation to utterance fluency.
Segalowitz and Freed (2004) was one of the first studies to measure subprocesses of cognitive
fluency and relate them to utterance fluency. For cognitive fluency, they used
L2-specific measures from a semantic classification task and an attention control test by
partialling out L1 measures and relating the results to gains in utterance fluency. They found
correlations between speed and efficiency of lexical access and mean length of run. In their
large-scale, comprehensive study, De Jong et al. (2013) identified utterance fluency measures
indicative of L2 knowledge and cognitive skills. They obtained a range of measures of linguistic
knowledge (e.g., vocabulary, grammar, and pronunciation) and processing skills (e.g., speed of
morphosyntactic processing and lexical selection) and related them to utterance fluency
measures. Results showed that mean syllable duration was most strongly related to linguistic
knowledge and skills, explaining 50% of the variance, whereas pause duration was the weakest,
explaining only 5% of the variance (number of silent pauses, 22%; filled pauses, 18%; cor-
rections, 25%; and repetitions, 12%). The findings suggest that mean syllable duration is a
strong indicator of L2 knowledge and cognitive fluency but pause duration is not.
De Jong and her colleagues recently examined the connections between utterance fluency
and less explored subprocesses of L2 speech production, namely articulation and con-
ceptualization. Focusing on the articulation stage, De Jong and Mora (2019) explored to
what extent L1 and L2 utterance fluency can be predicted by an individual’s articulatory
skills. Articulatory skills were measured using delayed picture naming tasks in the L1 and L2
and a diadochokinetic production task (DDK; i.e., saying /pa/, /ta/, /ka/, /pa.ta/, and /pa.-
ta.ka/ as fast as possible for 5 seconds). They found the articulatory skills explained only 10%
and 7% of variance of silent pause rate and silent pause duration, respectively, in the L1, and
19% and 27% for silent pause rate and silent pause duration, respectively, in the L2, but were
not related to articulation rate.
Employing a novel experimental paradigm, Felker et al. (2019) examined the effects of
conceptualizing difficulty on L1 and L2 fluency. In two experiments, participants described
paths in networks of pictures in which the target paths appeared one step at a time, having
them keep generating new speech plans (experiment 1), and the paths sometimes changed at a
certain point, forcing participants to revise their speech plans (experiment 2). Online changes
in the networks were triggered by participants’ gaze using eye-tracking technology. They
194
Fluency
found that abandoning and regenerating a speech plan is cognitively demanding for both L1
and L2 speakers and leads to disfluencies; however, the additional time L2 speakers needed
to regenerate a speech plan was greater than for L1 speakers.
In summary, these few studies on the relationship between L2 utterance fluency and
cognitive fluency suggest that not all utterance fluency measures reflect L2 cognitive fluency
and some objective measures are more connected with certain subprocesses of speech pro-
duction. This line of research is exciting and inspiring yet more research is needed for a clear
understanding of the underlying mechanisms of fluent L2 speech.
L1–L2 Relationship in Fluency

De Jong et al. (2015) examined to what extent L2 utterance fluency measures are reliable
indicators of L2 proficiency, given that fluency is also influenced by personality or L1
speaking style. They found the measures of speed, pause, and repair phenomena in L1 and
L2 were moderately to strongly correlated, except for mean syllable duration. In addition,
except for silent pause duration, for most measures of utterance fluency, both the corrected
(for L1 measures) and the uncorrected utterance fluency measures significantly predicted L2
proficiency. For mean syllable duration, the corrected measure had a stronger predictive
power of L2 proficiency than the uncorrected measure did.
Not all studies of L1–L2 utterance fluency, however, report high correlations across
participants or measures. In their longitudinal study, Derwing et al. (2009) investigated
whether cross-linguistic differences affect the L1–L2 relationship in utterance fluency with
Slavic and Mandarin learners of English. They found significant L1–L2 associations with
respect to speech rate, number of pauses, and pruned syllables per second. However, the
correlations were higher for Slavic speakers than for Mandarin speakers, especially after 2
years of residence abroad, suggesting an influence of cross-linguistic differences and profi-
ciency development on L1–L2 relationship in utterance fluency. Huensch and Tracy-Ventura
(2017a) also examined L1–L2 fluency relationship with English L1 Spanish and French
majors. Only two of the seven measures (i.e., mean syllable duration and the number of silent
pauses per second) for both groups showed a significant L1–L2 correlation before and after 5
months residing abroad. Furthermore, the L1–L2 relationship changed over time and was
modulated by cross-linguistic differences and proficiency, highlighting the non-
straightforward, complex relationship between L1 and L2 fluency.
Most studies of the L1–L2 relationship in fluency have focused on utterance fluency whereas
Kahng (2020) examined the L1–L2 relationship in both utterance and cognitive fluency to
identify L2 utterance fluency features that are more indicative of L1 utterance fluency or
L2-specific cognitive fluency. As was the case in De Jong et al. (2015), she also found silent pause
duration in L1 and L2 were most strongly correlated with each other. On the other hand,
Kahng’s Chinese learners of English overall exhibited lower L1–L2 correlations in utterance
fluency and the results further revealed higher L1–L2 associations for filled pauses, compared to
silent pauses. For cognitive fluency measures (e.g., speed of syntactic encoding and lexical re-
trieval), she found moderate L1–L2 correlations. Regarding the main research question, the
results suggested the number of mid-clause silent pauses and mean syllable duration are in-
dicative of L2-specific cognitive measures, whereas silent pause duration and the number of filled
pauses are indicative of L1 utterance fluency, or individual speaking style. The findings suggest
an even more complex picture of L1–L2 relationship in fluency by demonstrating differential
L1 influence on L2 fluency measures.
In addition to the studies discussed earlier, fluency has also been researched in relation to
other cognitive, social, and linguistic factors. Fluency is a criterion of L2 performance within
195
Jimin Kahng
the framework of CALF (complexity, accuracy, lexis, and fluency) research (e.g., Housen
et al., 2012; Skehan & Foster, 2007; Tavakoli & Skehan, 2005). Some studies have explored
the cognitive and affective factors of L2 fluency development, including phonological short-
term memory (O’Brien et al., 2007), personality (Dewaele & Furnham, 2000), and affective
variables (Kormos & Préfontaine, 2017). In their 7-year study, Derwing and Munro (2013)
demonstrated how learners’ L1, age, L2 use, and their willingness to communicate influenced
the development of fluency, comprehensibility, and accentedness. There is a paucity of re-
search exploring the effects of learning contexts (e.g., Mora & Valls-Ferrer, 2012; Segalowitz
& Freed, 2004) and instruction types on fluency development (e.g., Boers et al., 2006; De
Jong & Perfetti, 2011; Galante & Thomson, 2017). There is also a growing interest in L2
fluency in dialogue (e.g., McCarthy, 2010; Sato, 2014; van Os et al., 2020).

Studies on L2 fluency development (e.g., De Jong & Perfetti, 2011; Mora & Valls-Ferrer,
2012) suggest that learners need ample repeated opportunities to speak the target language in
real-world(like) communication to develop oral fluency. What follows describes some
practical ways to achieve that goal.
Automatization in Communicative Contexts of Essential Speech Segments (ACCESS) is a
methodology designed to promote automatization within a Communicative Language
Teaching framework (Gatbonton & Segalowitz, 2005). It comprises three phases: (1) creative
automatization (communicative activities including role-plays and problem-solving, with
essential lexical items/phrases provided), (2) language consolidation (fluency, accuracy, and
grammatical tasks to improve learner’s control of problematic utterances), and (3) free
communication (free communication activities with similar topics as in creative auto-
matization to test practiced utterances in context). Three main criteria should be considered
in designing activities: (1) genuinely communicative with an information gap, (2) inherently
repetitive by having students repeat essential speech segments to each other multiple times,
and (3) functionally formulaic by having learners produce authentic, useful utterances that
they can re-use in real-world communication.
Similarly, Nation and Newton (2009) identified three conditions for fluency development:
(1) tasks should be easy in terms of the language, ideas, and discourse requirements so that
learners can focus on skill development; (2) they should be meaning-focused; and (3) they
should encourage learners to reach a higher than usual level of performance using time
pressure, planning, and/or repetition. Examples of fluency activities that meet the conditions
include the best recording, rehearsed talks, and 4/3/2 technique. The best recording is an
activity in which learners record their own speech on a topic (e.g., describing pictures or
previous experiences) multiple times after self-evaluation of the recording until they are
satisfied. In rehearsed talks, learners first prepare a talk individually, then rehearse it with a
partner, further practice it in a small group, and finally present it in front of the whole class.
In the 4/3/2 technique, learners speak three times on the same topic under increasing time
pressure, for 4, 3, and 2 minutes, respectively. The technique can be communicative by
having a new partner as a listener for each repetition. Repeating the same topic three times,
as opposed to speaking about three different topics, has been shown to lead to generalized
fluency improvements on new tasks by supporting proceduralization of linguistic knowledge
(De Jong & Perfetti, 2011). A potential issue of fluency activities involving repetition is
whether learners repeat their errors (e.g., Thai & Boers, 2016). To circumvent the issue,
teachers could monitor learners’ speech and provide feedback throughout the process or
196
Fluency
include brief form-focused instruction as suggested in the Language Consolidation Phase

mentioned earlier.
Rossiter et al. (2010) and Derwing (2019) further suggest several practical classroom
activities. Raising learners’ awareness of various markers of fluency (e.g., buying time
without sounding disfluent, pausing at phrase/clause boundaries, using discourse markers)
can potentially yield long-lasting effects on fluency development. Learners can transcribe
short video- or audio-recordings, analyze fluency markers, and practice with role play and
shadowing activities, in which they read along with the recording. In addition, explicit in-
struction of high frequency formulaic language, discourse markers such as fillers (e.g., “well”
and “you know”), sequential markers (e.g., “first” and “finally”), and conventions for
opening and closing conversations will provide learners with immediate tools to improve
their fluency.
6 Future Directions
The complex nature of L2 fluency development provides a wide range of avenues for future
investigation (see also Foster, 2020; Segalowitz, 2016; Thomson, 2015). As De Jong (2018)
and Segalowitz (2016) point out, L2 fluency researchers have typically investigated temporal
and hesitation phenomena descriptively by listing objective measures that reflect perceived
fluency or L2 proficiency from a listener’s perspective. On the other hand, to go beyond
description, and to further explain L2 fluency phenomena, more research is needed focusing
on the cognitive processes responsible for L2 utterance fluency.
Moreover, findings regarding cognitive-utterance fluency associations should be situated
within a larger theoretical framework of fluency development (e.g., Segalowitz, 2010, 2016)
because cognitive processes underlying utterance fluency themselves develop through re-
peated fluency-related experiences shaped in positive or negative communicative social
contexts for fluency development. To construct a comprehensive understanding of L2 fluency
development, research on each of the components and their dynamic interrelationships
should be conducted with a theoretical framework in mind.
Finally, the multidimensional construct of fluency can be better understood by expanding
interdisciplinary work. For instance, in related fields including psycholinguistics, discourse
analysis, and sociolinguistics, temporal features and disfluencies in speech have been in-
vestigated from a positive, functional perspective. In line with research on processing dis-
fluencies in psycholinguistics, learning more about how L2 disfluencies are processed by L1
and L2 listeners will have implications for L2 learning and testing. Research on turn-taking
and interaction in the field of discourse analysis and the construct of interactive-alignment
(i.e., various types of mimicry in conversation) in sociolinguistics will be especially valuable
in exploring L2 interactional fluency, which is in its infancy in SLA.
Further Reading
De Jong, N. H. (2018). Fluency in second language testing: Insights from different disciplines. Language
Assessment Quarterly, 15, 237–254.
An overview of L2 fluency research in applied linguistics, psycholinguistics, discourse analysis, and
sociolinguistics, focusing on the differences in its conceptualization with implications for language
research and testing.
Derwing, T. M. (2019). L2 fluency development. In S. Loewen & M. Sato (Eds.), The Routledge
handbook of instructed second language acquisition (pp. 246–259). New York: Routledge.
A thorough and comprehensive review on the historical background, current issues, empirical evidence,
pedagogical implications and teaching tips for L2 fluency development.
197
Jimin Kahng
A comprehensive and interdisciplinary synthesis of fluency research from a cognitive science perspective
covering a wide range of cognitive, social, motivational factors of second language fluency.
References
Anderson, J. R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press.
Beattie, G. W. & Butterworth, B. L. (1979). Contextual probability and word frequency as determi-
nants of pauses and errors in spontaneous speech. Language and Speech, 22, 201–211.
Boers, F., Eyckmans, J., Kappel, J., Stengers, H., & Demecheleer, H. (2006). Formulaic sequences and
perceived oral proficiency: Putting a lexical approach to the test. Language Teaching Research, 10,
245–261.
Bosker, H. R., Pinget, A., Quené, H., Sanders, T., & De Jong, N. H. (2012). What makes speech sound
fluent? The contributions of pauses, speed and repairs. Language Testing, 30, 159–175.
Bosker, H. R., Quené, H., Sanders, T., & Jong, N. H. (2014). The perception of fluency in native and
nonnative speech. Language Learning, 64, 579–614.
Brennan, S. E., & Schober, M. F. (2001). How listeners compensate for disfluencies in spontaneous
speech. Journal of Memory and Language, 44, 274–296.
Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Cucchiarini, C., Strik, H., & Boves, L. (2002). Quantitative assessment of second language learners’
fluency: Comparisons between read and spontaneous speech. The Journal of the Acoustical Society
of America, 111, 2862–2873.
De Jong, N. H., & Perfetti, C. A. (2011). Fluency training in the ESL classroom: An experimental study
of fluency development and proceduralization. Language Learning, 62, 533–568.
De Jong, N. H. (2016). Predicting pauses in L1 and L2 speech: The effects of utterance boundaries and
word frequency. International Review of Applied Linguistics in Language Teaching, 54, 113–132.
De Jong, N. H. (2018). Fluency in second language testing: Insights from different disciplines. Language
Assessment Quarterly, 15, 237–254.
De Jong, N. H., Groenhout, R., Schoonen, R., & Hulstijn, J. H. (2015). Second language fluency:
behavior. Applied Psycholinguistics, 36, 223–243.
De Jong, N. H., & Mora, J. C. (2019). Does having good articulatory skills lead to more fluent speech in
first and second languages? Studies in Second Language Acquisition, 41, 227–239.
De Jong, N. H., Steinel, M. P., Florijn, A., Schoonen, R., & Hulstijn, J. H. (2013). Linguistic skills and
speaking fluency in a second language. Applied Psycholinguistics, 34, 893–916.
Dechert, H. W., & Raupach, M. (Eds.). (1987). Psycholinguistic models of production. Norwood, NJ:
Ablex Publishing Corporation.
Derwing, T. M. (2019). L2 fluency development. In S. Loewen & M. Sato (Eds.), The Routledge
handbook of instructed second language acquisition (pp. 246–259). New York: Routledge.
A 7-year study. Language Learning, 63, 163–185.
Derwing, T. M., Munro, M. J., Thomson, R. I., & Rossiter, M. J. (2009). The relationship between L1
fluency and L2 fluency development. Studies in Second Language Acquisition, 31, 533–557.
Derwing, T. M., Rossiter, M., Munro, M., & Thomson, R. (2004). Second language fluency: Judgments
on different tasks. Language Learning, 54, 655–679.
Dewaele, J.-M., & Furnham, A. (2000). Personality and speech production: A pilot study of second
language learners. Personality and Individual Differences, 28, 355–365.
Felker, E. R., Klockmann, H. E., & de Jong, N. H. (2019). How conceptualizing influences fluency in
first and second language speech production. Applied Psycholinguistics, 40, 111–136.
Fillmore, C. (1979). On fluency. In C. Fillmore, D. Kempler, & W. S.-Y. Wang (Eds.), Individual
differences in language ability and language behavior (pp. 85–101). New York: Academic Press.
Foster, P. (2020). Oral fluency in a second language: A research agenda for the next ten years. Language
Teaching. doi: 10.1017/S026144482000018X
Foster, P., Tonkyn, A., & Wigglesworth, G. (2000). Measuring spoken language: A unit for all reasons.
Applied Linguistics, 21, 354–375.
Freed, B. F. (2000). Is fluency, like beauty, in the eyes (and ears) of the beholder? In H. Riggenbach
(Ed.), Perspectives on fluency (pp. 243–265). Ann Arbor, MI: University of Michigan Press.
198
Fluency
Gatbonton, E., & Segalowitz, N. (2005). Rethinking communicative language teaching: A focus on
access to fluency. Canadian Modern Language Review, 61, 325–353.
Galante, A., & Thomson, R. I. (2017). The effectiveness of drama as an instructional approach for the
development of second language oral fluency, comprehensibility, and accentedness. TESOL
Quarterly, 51, 115–142.
Ginther, A., Dimova, S., & Yang, R. (2010). Conceptual and empirical relationships between temporal
measures of fluency and oral English proficiency with implications for automated scoring. Language
Testing, 27, 379–399.
Goldman-Eisler, F. (1951). The measurement of time sequences in conversational behaviour. British
Journal of Psychology, 42, 355–362.
Goldman-Eisler, F. (1968). Psycholinguistics experiments in spontaneous speech. London: Academic
Press.
Hosman, L. (2015). Powerful and powerless speech styles and their relationship to perceived dominance
and control. In R. Schulze, & H. Pishwa (Eds.), The exercise of power in communication
(pp. 221–232). London: Palgrave Macmillan.
Housen, A., Kuiken, F., & Vedder, I. (Eds.). (2012). Dimensions of L2 performance and proficiency:
Complexity, accuracy and fluency in SLA. Amsterdam: John Benjamins.
Huensch, A., & Tracy-Ventura, N. (2017a). Understanding L2 fluency behavior: The effects of in-
dividual differences in L1 fluency, cross-linguistic differences, and proficiency over time. Applied
Psycholinguistics, 38, 755–785.
Huensch, A., & Tracy-Ventura, N. (2017b). L2 utterance fluency development before, during, and after
residence abroad: A multidimensional investigation. The Modern Language Journal, 101, 275–293.
Hunt, K. W. (1965). Grammatical structures written at three grade levels. National Council of Teachers
of English Research No. 3. Urbana Champaign, IL: National Council of Teachers of English.
Iwashita, N., Brown, A., McNamara, T., & O’Hagan, S. (2008). Assessed levels of second language
speaking proficiency: How distinct? Applied Linguistics, 29, 24–49.
Kahng, J. (2014). Exploring utterance and cognitive fluency of L1 and L2 English speakers: Temporal
measures and stimulated recall. Language Learning, 64, 809–854.
Kahng, J. (2018). The effect of pause location on perceived fluency. Applied Psycholinguistics, 39,
569–591.
Kahng, J. (2020). Explaining second language utterance fluency: Contribution of cognitive fluency and
first language utterance fluency. Applied Psycholinguistics, 41, 457–480.
Kang, O., Rubin, D., & Pickering, L. (2010). Sugrasegmental measures of accentedness and judgments
of language learner proficiency in oral English. Modern Language Journal, 94, 554–566.
Kormos, J. (2006). Speech production and second language acquisition. Mahwah, NJ: Erlbaum.
Kormos, J., & Dénes, M. (2004). Exploring measures and perceptions of fluency in the speech of second
language learners. System, 32, 145–164.
Kormos, J., & Préfontaine, Y. (2017). Affective factors influencing fluent performance: French learners’
appraisals of second language speech tasks. Language Teaching Research, 21, 699–716.
Lennon, P. (1990). Investigating fluency in EFL: A quantitative approach. Language Learning, 40,
387–417.
Lennon, P. (2000). The lexical element in spoken second language fluency. In H. Riggenbach (Ed.),
Perspectives on fluency (pp. 25–42). Ann Arbor, Michigan: University of Michigan Press.
Levelt, W. (1989). Speaking: From intention to articulation. Cambridge, MA: MIT Press.
Lintunen, P., Mutta, M., & Peltonen, P. (Eds.). (2020). Fluency in L2 learning and use. Bristol:
MacIntyre, P. D. (2007). Willingness to communicate in a second language: Understanding the decision
to speak as a volitional process. The Modern Language Journal, 91, 564–576.
Maclay, H., & Oswood, C. E. (1959). Hesitation phenomena in spontaneous English speech. Word,
15, 19–44.
McCarthy, M. (2010). Spoken fluency revisited. English Profile Journal, 1, E4.
Mora, J. C., & Valls-Ferrer, M. (2012). Oral fluency, accuracy, and complexity in formal instruction
and study abroad learning contexts. TESOL Quarterly, 46, 610–641.
Munro, M. J., & Derwing, T. M. (1998). The effects of speaking rate on listener evaluations of native
and foreign-accented speech. Language Learning, 48, 159–182.
Nation, I. S. P., & Newton, J. (2009). Teaching ESL/ EFL listening and speaking. New York, NY:
Routledge.
199
Jimin Kahng
language oral fluency gains in adults. Studies in Second Language Acquisition, 29, 557–582.
Révész, A., Ekiert, M., & Torgersen, E. N. (2016). The effects of complexity, accuracy, and fluency on
communicative adequacy in oral task performance. Applied Linguistics, 37, 828–848.
Riazantseva, A. (2001). Second language proficiency and pausing. Studies in Second Language
Riggenbach, H. (Ed.). (2000). Perspectives on fluency. Ann Arbor, MI: University of Michigan Press.
Rossiter, M. J., Derwing, T. M., Manimtim, L. G., & Thomson, R. I. (2010). Oral fluency: The ne-
glected component in the communicative language classroom. The Canadian Modern Language
Review, 66, 583–606.
Saito, K., Ilkan, M., Magne, V., Tran, M. N., & Suzuki, S. (2018). Acoustic characteristics and learner
profiles of low-, mid- and high-level second language fluency. Applied Psycholinguistics, 39, 593–617.
Sato, M. (2014). Exploring the construct of interactional oral fluency: Second language acquisition and
language testing approaches. System, 45, 79–91.
Schmidt. R. (1992). Psychological mechanisms underlying second language fluency. Studies in Second
Segalowitz, N. (2010). Cognitive bases of second language fluency. New York, NY: Routledge.
Segalowitz, N. (2016). Second language fluency and its underlying cognitive and social determinants.
International Review of Applied Linguistics in Language Teaching, 54, 79–95.
Segalowitz, N., & Freed, B. F. (2004). Context, contact, and cognition in oral fluency acquisition:
Learning Spanish in at home and study abroad contexts. Studies in Second Language Acquisition, 26,
173–200.
Segalowitz, N., French, L. & Guay, J. (2017). What features best characterize adult second language
utterance fluency and what do they reveal about fluency gains in short-term immersion? Canadian
Skehan, P., & Foster, P. (2007). Complexity, accuracy, fluency and lexis in task-based performance:
A meta-analysis of the Ealing Research. In S. Van Daele, A. Housen, F. Kuiken, M. Pierrard, &
I. Vedder (Eds.), Complexity, accuracy, and fluency in second langauge use, learning, and teaching
(pp. 207–226). Brussels, Belgium: University of Brussels Press.
Tavakoli, P. (2011). Pausing patterns: Differences between L2 learners and native speakers. ELT
Journal, 65, 71–79.
Tavakoli, P., & Skehan, P. (2005). Strategic planning, task structure, and performance testing. In
R. Ellis (Ed.), Planning and task performance in a second language (pp. 239–276). Amsterdam: John
Benjamins.
Tavakoli, P., & Wright, C. (2020). Second language speech fluency: From research to practice.
Thai, C., & Boers, F. (2016). Repeating a monologue under increasing time pressure: Effects on fluency,
complexity, and accuracy. TESOL Quarterly, 50, 369–393.
Thomson, R. I. (2015). Fluency. In M. Reed, & J. M. Levis (Eds.), The handbook of English pro-
nunciation (pp. 209–226). Malden, MA: John Wiley & Sons.
Towell, R., Hawkins, R., & Bazergui, N. (1996). The development of fluency in advanced learners of
French. Applied Linguistics, 17, 84–119.
Van Os, M., De Jong, N. H., & Bosker, H. R. (2020). Fluency in dialogue: The effect of turn-taking
behavior on perceived fluency in native and non-native speech, Language Learning. Advanced online
publication. doi: 10.1111/lang.12416
Wennerstrom, A. (2001). The music of everyday speech: Prosody and discourse analysis. New York:
Oxford University Press.
Wray, A. (2002). Formulaic language and the lexicon. Cambridge: Cambridge University Press.
Wright, C., & Tavakoli, P. (2016). New directions and developments in defining, analyzing and mea-
suring L2 speech fluency. International Review of Applied Linguistics in Language Teaching,
54, 73–77.
200
14
THE ROLE OF PROSODY ACROSS
LANGUAGES
Prosody is conceptualized as covering a spectrum of speech phenomena including pitch,
intonation, tone, loudness, rhythm and stress. Pitch is the height of a tone perceived by the
listener, which reflects the number of times the vocal folds vibrate per second (the funda-
mental frequency, F0). Pitch functions at both the lexical and post-lexical (i.e., phrasal,
sentential, and discourse) level, and is signified by tone and intonation. Tone refers to the
pitch of syllable for the use of lexical contrast. In tone languages, the contour and the level of
the pitch decide the meaning of a syllable. Intonation, also known as tunes, contours or
melodies, denotes the movement of pitch in an utterance and is used to perform a wide range
of post-lexical functions. For instance, it can indicate whether an utterance is a question or a
statement, divide chunks of speech, express emotions like enthusiasm, irony, incredulity, and
show relationships between utterances in conversational turns. Loudness refers to the
strength of the sound perceived by the listeners, which is associated with the acoustic in-
tensity (relative amplitude) of the sound. Rhythm is understood as the perceived regular
occurrence of units, with “stress-timed” referring to perceived regularity of stress beats and
“syllable-timed” to syllables. Hindi, Finnish, and many Romance languages are regarded as
syllable-timed languages, while English, Russian, Arabic, and all Germanic languages are
classified as stress-timed. Although this classification appears to capture a perceivable
rhythmic difference, the notion of truly equal inter-stress or inter-syllable intervals (isco-
chrony) has been shown to be false (Dauer, 1983). Rhythm is not a single prosodic property
but is intertwined with syllable structure, syllable duration, vowel quality, speech rate, flu-
ency, pausing, and connected speech processes (Dauer, 1983). Stress can be defined as the
perceived prominence of a syllable in relation to other syllables. A stressed syllable is usually
longer in duration, higher in pitch, and louder in amplitude, but languages vary in the re-
lative importance of these cues (Cutler, 2007). Stress placement also shows cross-linguistic
variation. In free-stress languages like English and Spanish, stress is lexically specified,
whereas in fixed-stress languages like Polish and Czech, it is realized on one fixed position in
words or phrases.
In L2 acquisition, prosody is worth paying attention to because it can affect the accent-
edness and intelligibility of L2 speech (De Mareüil & Vieru-Dimulescu, 2006). Accentedness
denotes the degree of foreign accent in an L2 utterance perceived by a listener, while
DOI: 10.4324/9781003022497-18 201

intelligibility refers to how much of the message is actually understood by the listener. A
related concept is comprehensibility, denoting listeners’ subjective estimation of the difficulty
they experience in understanding an utterance (Munro & Derwing, 1999). In the literature,
off-target production and perception of L2 prosody are evidenced in speech rhythm (Grenon
& White, 2008), lexical tone (Wu et al., 2014), pitch contour (Graham & Post, 2018), stress
(McGory, 1997), and a variety of prosodic features are known to influence the intelligibility
of L2 speech (Anderson-Hsieh et al., 1992; Munro & Derwing, 1998; Trofimovich & Baker,
2006). Therefore, to facilitate communication in real life, measures should be taken to im-
prove L2 prosody.
In the past, research often focused on the differences between native speakers and L2 learners
in the production of intonation, stress, rhythm, and so forth. Through auditory and acoustic
analyses of tonal events, L2 learners were commonly found to produce non-nativelike
prosodic features. For example, the native judges in Backman (1979) identified Venezuelan
Spanish learners of English to have narrower pitch range and higher pitch level on unstressed
syllables than native American English speakers. Willems (1982) demonstrated that Dutch
learners of L2 English had a narrower pitch range and lower F0 at the beginning of an
utterance than native English speakers. Adams and Munro (1978) found that L2 English
speakers of various L1 backgrounds differed from native speakers in placement and fre-
quency of stress.
Early studies further examined the relationship between non-native prosody and foreign
accent, showing that non-nativelike temporal features in L2 speech resulted in the perception
of foreign accent by native listeners. Hutchinson (1973) found that L1 Spanish learners of L2
English who made a smaller durational contrast between stressed and unstressed syllables in
English were rated lower in pronunciation by native English listeners. Jonasson and
McAllister (1972) demonstrated that manipulation of vowel and consonant duration can
influence native listeners’ perception of foreign accent in American-accented Swedish, and
Flege (1993) found that larger durational contrasts between long and short vowel minimal
pairs led to weaker foreign accent in Mandarin-accented English.
However, with the progression of research in second language acquisition, most re-
searchers now believe that it is more important for L2 speakers to be understood and to be
able to communicate (Moyer, 2004), which can be achieved even when the L2 speakers retain
some foreign accent (Munro & Derwing, 1999). Therefore, contemporary studies also ex-
amine the effect of prosody on the intelligibility and comprehensibility of L2 speech. In
addition, recent attempts have been made to explore ways of improving L2 prosody with
these concepts in mind.

The acquisition of L2 prosody can be studied from various perspectives. A brief overview of
the main research areas will be presented here.
L2 Intonation
Intonation contributes substantially to successful communication as it affects the listeners’
understanding of speakers’ intentions. Gumperz (1982) illustrated this with an example in
which British Airline employees felt that they were treated badly because cafeteria servers of
202
The Role of Prosody Across Languages
Indian and Pakistani descent spoke with a falling intonation. Gumperz’ explanation was that
politeness is expressed with rising intonation in English, but by falling intonation in the
servers’ L1s. Direct evidence also illustrates the effect of intonation on speakers’ compre-
hensibility and intelligibility. Wennerstrom (1998) showed that teaching assistants who more
often used intonation to signal topic shifts scored higher on an oral intelligibility examina-
tion. Similarly, the evaluation of native and L2 teaching assistants’ presentations in Pickering
(2001) revealed that L2 speakers’ use of rising intonation affected the social connection they
made with students and hence the comprehensibility of their speech.
However, intonation poses strong obstacles for L2 speech acquisition. It is not easy for
learners to realize native-like pitch features in their speech production. Mennen et al. (2014)
showed that German learners of L2 English expanded pitch range in earlier portions and
compressed pitch range in later portions of intonational phrases, and Mennen (2004) found
that advanced Dutch speakers of L2 Greek had much earlier peak alignment (the timing of
the F0 peaks), than native Greek speakers due to L1 transfer. Moreover, L2 speakers have
difficulty learning to use intonation to achieve attitudinal, discoursal, grammatical, and
focusing functions. Pytlyk (2008) suggested that L2 Mandarin speakers with L1 English used
a falling pitch contour for questions with the bu particle and a rising pitch contour for
questions with ma particle preceded by Tone 1, which was opposite to native Mandarin
patterns.
L2 Lexical Tone
As introduced earlier, in tone languages, lexical tone can differentiate one word from an-
other. To be understood by others, learners of tone languages need to produce and process
tones correctly. However, evidence indicates that speakers of non-tone languages face dif-
ficulties when they perceive tones with contextual variation. Lee et al. (2010) studied the
identification of acoustically modified Mandarin tones by L2 learners of various L1 back-
grounds and proficiencies. They observed that L2 listeners were faster in recognizing tones
containing F0 continuity information of the preceding context than tones whose F0 con-
tinuity information was cut out, while native Mandarin listeners did not show such an F0
continuity effect. This observation suggested that L2 listeners relied on canonical F0 contour
and could not compensate for contextual variability.
Due to the complexities of acquisition, research in L2 tone perception is very vibrant with
a growing number of studies using more complex designs, frameworks and technologies, as
discussed later.
L2 Stress
Stress is critical for spoken word recognition (Cutler et al., 1997). In some languages, stress
placement can contrast lexical meaning, as in the English word contract (a noun meaning a
written or spoken agreement) and contract (a verb meaning a decrease in size or number). In
addition, the degree of stress plays a role in lexical retrieval. For example, the syllable oc-
receives primary stress in octopus, secondary stress in October, and no stress in occur. To
identify the three words, listeners need to be able to tell the difference between degrees of
stress in an activated cohort of words starting with oc- (Levis, 2018). As native listeners use
stress information to access words, mis-stressing in L2 can be problematic. Field (2005)
showed that L2 speech was rated as more intelligible when the primary stress was placed
correctly, and Zielinski (2008) suggested that syllable stress pattern was a more reliable cue
than segmental information for intelligibility.
203
L2 learners often have problems with perceiving L2 stress and using stress information to
access the L2 mental lexicon. Dupoux et al. (2008) reported stress “deafness” of L2 Spanish
learners speaking L1 French, a language having no contrastive use of duration, pitch or
stress for lexical contrast. L2 learners had problems encoding stress in short-term memory
and using stress to access the lexicon. Context-sensitive “stress-deafness” was reported in
Ortega-Llebaria et al. (2013), who found that L1 English speakers had difficulty perceiving
L2 Spanish stress because English uses context-sensitive phonetic details differently from
Spanish in stress-marking. Therefore, “stress-deafness” hampers communication when L2
listeners fail to use stress to recognize words uttered by interlocutors.
In addition to perception, learners can have difficulty in stress production. Fokes and
Bond (1989) found that in L2 speech, stressed vowels were too short and unstressed vowels
were too long compared to native norms. McGory (1997) showed that L1 Korean learners of
English had difficulty acquiring reduction in unstressed syllables since Korean is a nonstress
language. In addition, Zhang et al. (2008) revealed that Mandarin learners of English were
unable to produce F0 and vowel reduction in a native-like manner and that L2 stress pro-
duction was judged by native listeners as less acceptable than native production.
L2 Speech Rhythm
Rhythm plays an important role in speech communication because it helps listeners to segment
the continuous speech flow into identifiable words. According to Cutler (2012), alternations
between strong and weak syllables are used by listeners to locate the beginnings of words. In
stress-timed languages like English, a strong syllable is likely to be the beginning of a word.
English listeners apply this feature to segment speech, whereas French listeners whose L1 is
syllable-timed do not. Therefore, L2 learners of a language with a different rhythmic pattern
need to learn the appropriate rhythm to help understand sentences spoken in the target language.
Numerous studies have demonstrated that rhythm can influence L2 speakers’ accented-
ness and intelligibly. Huang and Jun (2011) found that the perception of Mandarin-accented
English was strongly influenced by speech rate and articulation rate. Trofimovich and Baker
(2006) suggested that the degree of accent in L2 English speech produced by Korean learners
was most affected by pause duration and speech rate. Munro and Derwing (1998) found that
native listeners preferred higher speech rate than lower speech rate in foreign accent ratings.
Even when speech rate was controlled, Quené and Delft (2010) found that non-native
durational patterns alone can influence listeners’ perception of foreign accent. As for in-
telligibility, Tajima et al. (1997) found that the intelligibility of L2 English speech improved
by 15%–25% when phonemic durations were manipulated to match native durational pat-
terns, whereas the intelligibility of L1 English dropped by 15% when its phonemic durations
were warped to the non-native temporal patterns.
L2 speakers learning a language with a different rhythmic structure usually struggle with
acquiring speech rhythm. Mok and Dellwo (2008) found that English spoken by L1
Mandarin and Cantonese speakers sounded more syllable-timed than native English, as
reflected in rhythm metrics. Setter (2003) measured syllable duration of English spoken by
Cantonese speakers and found that L2 English had insufficient vowel reduction, making it
sound more syllable-timed. Gut (2003) found that various types of L2 German had less
vowel reduction or deletion than native German, again because of more syllable-timed
speech. Nevertheless, learners were able to learn speech rhythm through more L2 exposure.
Tortel & Hirst (2010), for example, demonstrated an increase of durational variability and
thus an approximation of stress-timing in French learners’ L2 English production during the
course of L2 acquisition.
204
Factors Contributing to the Acquisition of L2 Prosody

A range of factors probably influence success in the acquisition of L2 prosody. First is the L2
speakers’ age of arrival (AoA) in the L2-speaking country. In a sentence repetition task,
Guion et al. (2000) found that AoA was positively related to sentence duration, meaning that
the earlier the first exposure was, the faster the L2 speech rate would be. Their interpretation
was that the earlier AoA entailed a less established L1 system, which made it easier to inhibit
L1 interference in L2 production. Huang and Jun (2011) explored the English intonation of
L1 Mandarin speakers varying in AoA in the United States and found that adult arrivals
read more slowly, divided the passage into too many phrases, inserted syntactically in-
appropriate prosodic boundaries, had excessive pitch accents, produced atypical pitch accent
alignment, and were rated as more accented than child arrivals and native speakers.
Related to AoA is L2 exposure and L2 experience, which also affect the acquisition of L2
prosody. In the study of L2 stress acquisition, Chen (2013) noticed that due to the different
quantity and quality of L2 English input, ESL or EFL learners faced more difficulties
compared with bilingual speakers living in an English-speaking environment. Trofimovich
and Baker (2006) investigated the effects of language experience on the acquisition of L2
English intonation by Korean speakers. Comparison of L2 learners with different language
learning experience showed that learners’ ability to produce nativelike speech rate and
pausing were associated with the time of first intensive exposure, while the ability to produce
L2 stress-timing was more related to the amount of L2 exposure.
L2 learners’ language backgrounds shape L2 prosody acquisition as well. Arslan and
Hansen (1997) analysed the continuative intonation of isolated words in L1 Mandarin,
German and Turkish learners’ English production and found that the groups differed in
continuative intonation slopes, with German speakers having more positive, Mandarin
speakers having more negative, and Turkish speakers having less varied values than native
American English speakers did. L1 background effects were also demonstrated in Wang’s
(2006) examination of the perceptual accuracy of L2 Mandarin tones by speakers of Hmong
(a tone language), and Japanese and English (non-tone languages). The Hmong learners
performed worse than the other two groups. Braun and Galts (2014) tested the learning of L2
Mandarin lexical tones by French, German and Japanese speakers. German has rich in-
tonational pitch contrasts while French and Japanese have limited intonational pitch var-
iation. Their findings showed that L1 German speakers displayed higher sensitivity to L2
tonal contrasts.
Developmental constraints also play a role in prosody acquisition, as different groups of
L2 learners show certain default L2 prosodic features. Mennen et al. (2010) found that L2
English spoken by L1 Punjabi and L2 Italian learners all started out having more intonation
falls than rises. Santiago and Delais-Roussarie (2015) investigated L2 interrogative intona-
tion and found that L1 Spanish learners of L2 French almost always used a rising tune for L2
French interrogatives, even though their L1 Spanish used a variety of final tunes to achieve
interrogative function. Kainada and Lengeris (2015) suggested that L2 learners of English
with L1 Greek show language-independent features such as slower speech rate, narrow pitch
span, and low pitch level.
Furthermore, the inherent properties of the target structure condition L2 prosody ac-
quisition. For example, in lexical tone acquisition, Lee et al. (2010) suggested that Mandarin
Tone 2 was the most challenging and Tone 4 the least challenging in perception for all their
L2 participants, regardless of proficiencies and L1 backgrounds. Hao (2012) studied the
acquisition of Mandarin tones by English and Cantonese speakers and found a prevailing
Tone 2–Tone 3 confusion among the learners regardless of L1.
205
L2 Prosody Production
Different from earlier studies, which usually described global prosodic features in L2 pro-
duction, recent studies have enquired into the L2 production of specific prosodic features in
the realization of linguistic and pragmatic functions. Intonational marking of polarity
contrasts was studied in Turco et al. (2015), focusing on L2 Italian spoken by highly pro-
ficient German and Dutch speakers. A strong L1 effect was observed, as the speakers en-
coded the polarity contrast either by producing a nuclear accent on the finite verb as in
German or using lexical markers as in Dutch. Tremblay et al. (2018) examined the impact of
intonation on segmentation in the perception of L2 French by Dutch and English listeners.
An F0 rise signals word final boundaries in French, but signals word initial boundaries in
English and Dutch, with a stronger weight in Dutch than in English. Results suggested that
Dutch-speaking learners were at an advantage over English-speaking learners when learning
to use an F0 rise to cue word boundaries in French, showing the influence of L1. Post-focus
compression (PFC) realization for focus marking was systematically examined in Hong
Kong English (HKE), a typical variety of Cantonese-accented English spoken in Hong
Kong. PFC refers to the phenomenon of F0 narrowing and lowering after focus, which is
typically found in English but not in Cantonese. Fung and Mok (2014) studied the prosodic
realization of narrow focus in L2 English by Cantonese speakers. Production results sug-
gested that narrow focus was realized by the L2 speakers as on-focus F0 range expansion but
not PFC, which could be partially explained by L1 Cantonese influence. The lack of PFC in
L2 English was likewise observed in Gananathan et al. (2015), who investigated the pro-
duction and perception of narrow focus by HKE speakers. The study suggested that HKE
speakers were nativelike in the perception of narrow focus, but far from nativelike in pro-
duction. The less proficient speakers did not mark narrow focus at all, and the more pro-
ficient speakers did not realize PFC or rarely exhibited on-focus F0 changes.
L2 Prosody Perception
Given the deviations in L2 intonation production, researchers have also examined L2 lear-
ners’ knowledge of the relevant intonation patterns in the target language. For instance, He
et al. (2012) investigated Chinese learners’ knowledge of L2 Dutch intonation contours. In a
forced-choice task, L2 learners saw sentences on a screen, listened to the same sentences
spoken with correct or incorrect intonation contours, and selected the most appropriate
spoken version. Low proficiency learners tended to choose rising contours for sentences
ending with a question mark and falling contours in other situations. Using a similar design,
the mapping between intonation and its functions was investigated in Mok et al. (2016),
showing that the English speakers with L1 Cantonese were quite native-like for some sen-
tence types, but were less accurate in tag questions and wh questions.
In addition to intonation, stress has received attention in current perception studies del-
ving into the interaction between L1 and L2 in stress presentation. As introduced before,
languages vary in their use of duration, pitch, and intensity cues for stress identification
(Cutler, 2007), but learners’ cue-weighting strategies for stress perception in L1 can be
transferred to L2. Qin et al. (2017) investigated the processing of word stress by L1 Standard
Mandarin and L1 Taiwan Mandarin learners of English. F0 signals lexical meaning in both
varieties, but only Standard Mandarin uses duration to distinguish stressed from unstressed
words. The study showed that when English stress was only signalled by duration, the
206
Taiwan Mandarin speakers performed worse than Standard Mandarin speakers, indicating
that cue properties in L1 determine the weight of these cues in L2. Lin et al. (2020) examined
the processing of L2 English word stress by L1 Korean and L1 Mandarin speakers. The
study revisited the concept of “stress-deafness” (Dupoux 2008) and suggested different de-
grees of learning difficulty varying with L1. Mandarin and English have contrastive word
stress while Korean does not, and it was found that L1 Korean learners of L2 English
performed worse than their Mandarin-speaking peers in stress perception.
Training L2 Prosody
Although many studies demonstrate deviations in L2 prosody in comparison to native
speech, current work suggests that training can improve learners’ ability to produce and
perceive prosodic features so as to enhance comprehensibility.
One method known to be effective is high-variability phonetic training (HVPT), which
provides trainees with speech produced by multiple speakers and/or in multiple phonetic
contexts, along with corrective feedback. This training method helps listeners to form robust
phonetic categories while ignoring context- and talker-specific information. Studies on L2
prosody have shown that HVPT can facilitate the acquisition of tones. For instance,
Silpachai (2020) found that HVPT from multi-talkers could improve the perception of
Mandarin tones by English listeners. As for speech production, Wieneret al. (2020) trained
L1 English speakers to produce rising and falling tones in a Mandarin-like artificial language
and suggested that HVPT in combination with explicit instruction led to significant
improvement.
In addition, metaphoric bodily actions are also found to be facilitative in the acquisition
of L2 prosody. In Eng et al. (2013), English-speaking learners of L2 Mandarin showed
improvement in tonal identification after a training session in which they were provided with
hand gestures mimicking pitch contours. In Morett and Chang (2015), English-speaking
learners of Mandarin benefited from gestures representing Mandarin pitch contours in
identifying the meaning of minimal lexical tone pairs. In Burnham et al. (2006), rigid head
motion was demonstrated as a perceptual cue for L2 Cantonese tone perception. In addition
to tone perception, Zheng et al. (2018) showed that hand gestures and head nods can
modestly improve the production of L2 Mandarin tones.
Furthermore, quite a few studies have exemplified effective uses of modern computer-
assisted visualization technologies in L2 prosodic training. Hirata (2004) trained English
learners of L2 Japanese with Japanese words contrasting in pitch and duration using
computer-assisted methods that provided prosody graphs as visual feedback. Participants in
the 3.5-week training programme and a control group without training took pre- and post-
tests on production and perception of words including pitch and duration contrasts. Results
showed that the trained participants improved more than the control group in production
and perception of Japanese pitch and duration contrasts. Computer-assisted prosody
training with visual feedback was likewise facilitative in the learning of L2 French prosody
by English speakers (Hardison, 2004). Moreover, computer-aided instruction on various
aspects of prosody was shown in Lima (2015) to significantly improve learners’ intelligibility.
Other studies suggested some visualization methods outperformed others in prosodic
training. Niebuhr et al. (2017) compared the usability of different means of prosody visua-
lization techniques for Danish learners of L2 German. Both auditory phonological analysis
on prosodic accuracy and learners’ ratings on usability of visualization techniques suggested
that iconic techniques performed better than symbolic techniques. Godfroid et al. (2017)
designed a 3-week online course for L2 Mandarin tone perception using different multimodal
207
methods. The study suggested that colour-mediated methods resulted in 10%–20% im-
provement of accuracy in tone perception, but colour needed to be presented with a concrete
object to optimize perceptual learning.

Production in L2 prosody can be measured in multiple ways which may involve comparisons
between L2 prosodic features and native features to reveal difficulties in L2 prosody pro-
duction. Traditionally, researchers made recordings of L2 speech and calculated parameters
related to different aspects of prosody. For instances, stress can be quantified by means of
pitch, duration, and vowel quality (Fokes & Bond, 1989; McGory, 1997). Recent studies also
analyze the kinematic correlates of stress using the Electromagnetic Articulograph (EMA)
(Huang & Erickson, 2019; Smith et al., 2019). Rhythm can be measured by the so-called
“rhythm metrics” which capture variability in the durations of consonantal and vocalic in-
tervals (Dellwo, 2006; Grabe & Low, 2002; Ramus et al., 1999). Intonation features can be
analysed in terms of pitch range, peak alignment and rising or falling patterns (Mennen,
2004; Mennen et al., 2014).
The perception of prosody in L2 is analysed through a variety of perceptual tasks.
Shadowing and lexical decision tasks reflect participants’ processing of prosody in the mental
lexicon (Slowiaczek, 1990). Forced-choice tasks requiring the association between sentences
and prosodic features allow for an examination of participants’ knowledge of various pro-
sodic functions (He et al., 2012). Sequence recall tasks measure the encoding of prosodic
features in memory (Dupoux et al., 2008). Tonal identification tasks, involving reaction time,
directly assess speakers’ perception of L2 tones (Lee et al., 2010). Finally, in recent years,
brain imaging techniques have also been applied to assess L2 tone perception (Pelzl
et al., 2019).
The influence of prosody on foreign accent, intelligibility, and comprehensibility is ex-
amined with rating and transcribing tasks (Hahn, 2004). To tease apart the contributions of
different prosodic features, many studies have manipulated speech samples, including ma-
nipulations of duration ( Quené & Delft, 2010; Tajima et al., 1997), speech rate (Munro &
Derwing, 1998), and intonation (De Mareüil & Vieru-Dimulescu, 2006; Holm, 2009), and the
removal of segmental information (Huang & Jun, 2011). Such manipulations have helped
researchers to understand the relative importance of each prosodic feature to listeners’
perception of foreign accent and intelligibility.
Training studies typically last three to 4 weeks. The process includes a pre-test, several
sessions of training, and a post-test. For example, Hardison (2004) assessed the effectiveness
of computer-assisted prosody training with visual feedback on L2 acquisition. After English-
speaking learners of French received 3 weeks of training on French prosody, they produced
new sentences, which were rated for accuracy. The study found that L2 learners benefited
from prosodic training and their improvement was generalized to novel sentences.

It is evident from the above that L2 speakers are often inaccurate in the production and
perception of L2 prosodic features such as intonation, stress, and rhythm. These deviations
might severely reduce their speech intelligibility and increase accentedness. However, pro-
sody training has been demonstrated to be fruitful in improving L2 learners’ ability to
communicate. For example, prosody-based teaching was found effective in improving
French EFL learners’ English perceptual skills (Capliez, 2016), and teaching of contrastive
208
stress also significantly improved ESL learners’ comprehensibility and fluency (Levis &
Levis, 2018).
Therefore, teachers should be encouraged to draw learners’ attention to prosodic phe-
nomena. Hirano-Cook (2011) showed that raising learners’ awareness and improving their
self-monitoring skills could improve L2 learners’ production and perception of Japanese
pitch accents. In Saito and Wu (2014) a group who received form-focused instruction de-
monstrated significant gains in the perception of L2 Mandarin tones. This led the authors to
suggest that a communicative focus on form could facilitate L2 prosody acquisition.
In addition, teachers are advised to give explicit instruction on prosody. The importance
of explicit instruction was underscored in past studies. Gordon and Darcy (2016) examined
the effects of explicit and non-explicit instruction on the development of comprehensible
speech by ESL learners. The participants received explicit and non-explicit pronunciation
instruction on prosody and vowels. The training results underscored the importance of ex-
plicit training on prosody. Comprehensibility of L2 speech improved after explicit instruc-
tion, and this improvement was most salient for prosody training. In an investigation of L2
stress acquisition, Chen (2013) called for explicit training on English lexical categories which
could facilitate the acquisition of English lexical stress rules by Mandarin ESL learners.
Finally, teachers can use multiple innovative teaching methods involving actions, mo-
tions, and computerized visualization to help students perceive and produce speech prosody
in class. For example, haptic pronunciation teaching such as tapping and clapping have been
recommended in class to improve learning of L2 English rhythm grouping and stress (Burri
& Baker, 2016). Teaching speech rhythm with physical beat gestures was also shown to
improve Catalan learners’ production in their L2 English (Gluhareva & Prieto, 2017). Kaiser
(2013) offered practical teaching advice such as teaching only two levels of word stress, using
physicalization (e.g., rubber bands) to match strong actions with stresses, and adopting both
bottom–up prediction skills and top–down skills to facilitate the acquisition of L2 speech
rhythm. Frost and Picavet (2014) introduced a project incorporating a series of teacher
training workshops and blended learning platforms with a variety of tools to facilitate L2
prosody teaching. Modern speech visualization technology may also be used to enhance
learners’ practice of discourse level intonation (Levis & Pickering, 2004).
7 Future Directions
A number of new directions can be applied to future work on L2 prosody. Most previous
studies have been cross-sectional. Longitudinal investigations have been far fewer given the
logistical complexities involved, but they can provide much more useful insight into how L2
prosody develops over time, as it has been suggested that the improvement in L2 pro-
nunciation is concentrated in the first year following arrival in the L2 environment (Flege,
1988; Munro & Derwing, 2008). So far, only a handful of longitudinal studies of L2 English
rhythm have been published (Quené & Orr, 2014; White & Mok, 2018, 2019). Further work
can assess whether L2 prosody can be improved beyond initial improvement in the first year.
Today, there are even more multilingual speakers than bilingual speakers, and the vast
multilingual population provides more varied language combinations that allow researchers
to study more interesting intonation phenomena in L3. Only a handful of L3 studies have
scratched the surface of the new area of L3 prosody. Gabriel et al. (2015) found that mul-
tilingual Chinese–German speaking learners of L3 English and L3 French benefited from all
of their previous languages in producing L3 speech rhythm. Studies of Cantonese–English
bilingual learners of L3 German have also shown that various aspects of L3 German prosody
were influenced by both the L1 and L2 (Zhu et al., 2019; Zhu & Mok, 2016). Many
209
interesting topics, such the influence of prosody on L3 speakers’ intelligibility, factors af-
fecting the acquisition of L3 prosody, and the prosodic training of L3 speech, still await
further investigation.
Further Reading
Trouvain. J., & Gut, U. (2007). Non-native prosody: Phonetic description and teaching practice. Berlin:
De Gruyter.
A description of how L2 learners differ from native speakers in various aspects of prosody.
teaching and research. Amsterdam/Philadelphia: John Benjamins.
A general overview of the current state of L2 pronunciation research.
Levis, J. (2018). Intelligibility, oral communication, and the teaching of pronunciation. Cambridge, UK:
A review of research on the impact of prosody on intelligibility.
References
Adams, C., & Munro, R. R. (1978). In search of the acoustic correlates of stress: Fundamental fre-
quency, amplitude, and duration in the connected utterance of some native and non-native speakers
of English. Phonetica, 35, 125–156.
Anderson‐Hsieh, J., Johnson, R., & Koehler, K. (1992). The relationship between native speaker
judgments of nonnative pronunciation and deviance in segmentals, prosody and syllable structure.
Arslan, L. M., & Hansen, J. H. L. (1997). A study of temporal features and frequency characteristics in
American English foreign accent. The Journal of the Acoustical Society of America, 102, 28–40.
Backman, N. (1979). Intonation errors in second-language pronunciation of eight Spanish-speaking
adults learning English. Interlanguage Studies Bulletin, 4, 239–265.
Braun, B., & Galts, T. (2014). Lexical encoding of L2 tones: The role of L1 stress, pitch accent and
intonation, Second Language Research, 30, 323–350.
Burnham, D., Reynolds, J., Vatikiotis-Bateson, E., Yehia, H., Ciocca, V., Morris, R. H., Hill, H.,
Vignali, G., Bollwerk, S., Tam, H., & Jones, C. (2006). The perception and production of phones
and tones: The role of rigid and non-rigid face and head motion. In ISSP 2006 – proceedings of the
7th international seminar on speech production (pp. 185–192).
Burri, M., & Baker, A. A. (2016). Teaching rhythm and rhythm grouping: The butterfly technique.
English Australia Journal: The Australian Journal of English Language Teaching, 31, 72–77.
Capliez, M. (2016). Prosody- vs. segment-based teaching: Impact on the perceptual skills of French
learners of English. Language, Interaction and Acquisition, 7, 212–237.
Chen, H. C. (2013). Chinese learners’ acquisition of English word stress and factors affecting stress
assignment. Linguistics and Education, 24, 545–555.
Cutler, A. (2007). Lexical stress. In D. B. Pisoni & R. E. Remez (Eds.), The handbook of speech per-
ception. New York, NY: Blackwell.
Cutler, A. (2012). Native listening. Cambridge, MA: MIT Press.
Cutler, A., Dahan, D., & Van Donselaar, W. (1997). Prosody in the comprehension of spoken lan-
guage: A literature review. Language and Speech, 40, 141–201.
Dauer, R. M. (1983). Stress-timing and syllable-timing reanalyzed. Journal of Phonetics, 11, 51–62.
De Mareüil, P. B., & Vieru-Dimulescu, B. (2006). The contribution of prosody to the perception of
foreign accent. Phonetica, 63, 247–267.
Delais-Roussarie, E., Avanzi, M., & Herment, S. (Eds.). (2015). Prosody and language in contact: L2
acquisition, attrition and languages in multilingual situations. Berlin/Heidelberg: Springer-Verlag.
doi: 10.1007/978-3-662-45168-7
Dellwo, V. (2006). Rhythm and speech rate: A variation coefficient for deltaC rhythm and speech rate:
A variation coefficient for C. In Language and language processing (pp. 231–241). Frankfurt:
Peter Lang.
Dupoux, E., Sebastián-Gallés, N., Navarrete, E., & Peperkamp, S. (2008). Persistent stress “deafness”:
The case of French learners of Spanish. Cognition, 106, 682–706.
210
Eng, K., Hannah, B., Leung, L., & Wang, Y. (2013). Can co-speech hand gestures facilitate learning of
non-native tones? The Journal of the Acoustical Society of America, 133, 3572–3572.
Field, J. (2005). Intelligibility and the listener: The role of lexical stress. TESOL Quarterly, 39, 399–423.
Flege, J. (1988). Factors affecting degree of perceived foreign accent in English sentences. Journal of the
Acoustical Society of America, 84, 70–79.
Flege, J. E. (1993). Production and perception of a novel, second-language phonetic contrast. Journal of
the Acoustical Society of America, 93, 1589–1608.
Fokes, J., & Bond, Z. S. (1989). The vowels of stressed and unstressed syllables in nonnative English.
Frost, D., & Picavet, F. (2014). Putting prosody first – Some practical solutions to a perennial problem:
The innovalangues project. Research in Language, 12, 233–244.
Fung, H. S. H., & Mok, P. P. K. (2014). Realization of narrow focus in Hong Kong English
declaratives-a pilot study. Proceedings of the International Conference on Speech Prosody, 7,
964–968.
Gabriel, C., Stahnke, J., & Thulke, J. (2015). Assessing foreign language speech rhythm in multilingual
learners: An interdisciplinary approach. In H. Peukert (Ed.), Transfer effects in multilingual language
development (pp. 191–220). Amsterdam: John Benjamins Publishing Company.
Gananathan, R. Y., Yin, Y., & Mok, P. K. P. (2015). Interlanguage influence in cues of narrow focus:
A study of Hong Kong English. In Proceedings of the 18th international congress of phonetic sciences
(ICPhS 2015).
Gluhareva, D., & Prieto, P. (2017). Training with rhythmic beat gestures benefits L2 pronunciation in
discourse-demanding situations. Language Teaching Research, 21, 609–631.
Godfroid, A., Lin, C., & Ryu, C. (2017). Hearing and seeing tone through color: An efficacy study of
web-based, multimodal Chinese tone perception training. Language Learning, 819–857. doi: 10.1111/
lang.12246
Gordon, J., & Darcy, I. (2016). The development of comprehensible speech in L2 learners: A classroom
study on the effects of short-term pronunciation instruction, Journal of Second Language
Grabe, E., & Low, L. (2002). Durational variability in speech and the rhythm class hypothesis. In C.
Gussenhoven & N. Warner (Eds.), Laboratory phonology (Vol. 7, pp. 515–546). New York: Mouton
de Gruyter.
Graham, C., & Post, B. (2018). Second language acquisition of intonation: Peak alignment in American
English. Journal of Phonetics, 66, 1–14.
Grenon, I., & White, L. (2008). Acquiring rhythm: A comparison of L1 and L2 speakers of Canadian
English and Japanese. In Proceedings of the 32nd Boston university conference on language devel-
opment (pp. 155–166).
Guion, S. G., Flege, J. E., Liu, S. H., & Yeni-Komshian, G. H. (2000). Age of learning effects on the
duration of sentences produced in a second language. Applied Psycholinguistics, 21, 205–228.
Gut, U. (2003). Non‐native speech rhythm in German. In Proceedings of the ICPhS conference,
(pp. 2437–2440).
Hahn, L. D. (2004). Primary stress and intelligibility: Research to motivate the teaching of supraseg-
Hao, Y. C. (2012). Second language acquisition of Mandarin Chinese tones by tonal and non-tonal
language speakers. Journal of Phonetics, 40, 269–279.
Hardison, D. M. (2004). Generalization of computer-assisted prosody training: Quantitative and
qualitative findings, Language Learning and Technology, 8, 34–52.
He, X., Van Heuven, V. J., & Gussenhoven, C. (2012). The selection of intonation contours by Chinese
L2 speakers of Dutch: Orthographic closure vs. prosodic knowledge. Second Language Research
(Vol. 28). doi: 10.1177/0267658312439668
Hirano-Cook, E. (2011). Japanese pitch accent acquisition by learners of Japanese: Effects of training
on Japanese accent instruction, perception, and production. Japanese Language & Literature, 45,
363–364.
Hirata, Y. (2004). Computer assisted pronunciation training for native English speakers learning
Japanese pitch and durational contrasts. Computer Assisted Language Learning, 17, 357–376.
Holm, S. (2009). Intonational and durational contributions to the perception of foreign-accented
Norwegian. Retrieved from http://www.fonkonsult.com/PhD_thesis_Snefrid_Holm.pdf
211
Huang, B. H., & Jun, S. A. (2011). The effect of age on the acquisition of second language prosody.
Language and Speech, 54, 387–414.
Huang, T., & Erickson, D. (2019). Articulation of English “prominence” by L1 (English) and L2
(French) speakers. In ICPhS 2019, 1, 1–5.
Hutchinson, S. P. (1973). An objective index of the English-Spanish pronunciation dimension.
Unpublished Master’s thesis, University of Texas, Austin, TX.
Jonasson, J., & McAllister, R. (1972). Foreign accent and timing: An instrumental study. PILUS,
14, 11–40.
Kainada, E., & Lengeris, A. (2015). Native language influences on the production of second-language
prosody. Journal of the International Phonetic Association, 45, 269–287.
Kaiser, D. J. (2013). Practical approaches and strategies for teaching stress-timed English rhythm. In
The conference proceedings of MIDTESOL: Cultivating best practices in ESL: 2013–2014, 71–90.
Lee, C. Y., Tao, L., & Bond, Z. S. (2010). Identification of acoustically modified mandarin tones by
non-native listeners. Language and Speech, 53, 217–243.
Levis, J. M. (2018). Intelligibility, oral communication, and the teaching of pronunciation. Cambridge,
UK: Cambridge University Press.
Levis, J. M., & Levis, G. M. (2018). Teaching high-value pronunciation features: Contrastive stress for
intermediate learners. CATESOL Journal, 30, 139–160.
Levis, J., & Pickering, L. (2004). Teaching intonation in discourse using speech visualization tech-
nology. System, 32, 505–524.
Lima, E. D. (2015). Development and evaluation of online pronunciation instruction for international
teaching assistants’ comprehensibility. Unpublished PhD Dissertation, Iowa State University,
Ames, Iowa.
Lin, C. Y., Wang, M., Idsardi, W. J., & Xu, Y. (2020). Stress processing in Mandarin and Korean
second language learners of English. Bilingualism: Language and Cognition, 17, 316–346.
McGory, J. T. (1997). Acquisition of intonational prominence in English by Seoul Korean and Mandarin
Chinese speakers. Unpublished PhD Dissertation, Ohio State University, Columbus, Ohio.
Mennen, I. (2004). Bi-directional interference in the intonation of Dutch speakers of Greek. Journal of
Phonetics, 32, 543–563.
Mennen, I., Chen, A., & Karlsson, F. (2010). Characterising the internal structure of learner intonation
and its development over time. In Proceedings of the 6th international symposium on the acquisition of
second language speech, new sounds, 319–324.
Mennen, I., Schaeffler, F., & Dickie, C. (2014). Second language acquisition of pitch range in German
learners of English. Studies in Second Language Acquisition, 36, 303–329.
Mok, P. K. P., & Dellwo, V. (2008). Comparing native and non-native speech rhythm using acoustic
rhythmic measures: Cantonese, Beijing Mandarin and English. Speech Prosody 200, 423–426,
Campinas/Brazil.
Mok, P. K. P., Yin, Y., Setter, J., & Nayan, N. M. (2016). Assessing knowledge of English intonation
patterns by L2 speakers. In Speech Prosody 2016, 543-547. Boston, MA.
Morett, L. M., & Chang, L. Y. (2015). Emphasising sound and meaning: Pitch gestures enhance
Mandarin lexical tone acquisition. Language, Cognition and Neuroscience, 30, 347–353.
Moyer, A. (2004). Age, accent and experience in second language acquisition. Clevedon: Multilingual
Matters.
Munro, M. J., & Derwing, T. M. (1998). The effects of speaking rate on listener evaluations of native
and foreign-accented speech. Language Learning, 42, 159–182.
Munro, M. J., & Derwing, T. M. (1999). Foreign accent, comprehensibility, and intelligibility in the
Munro, M. J., & Derwing, T. M. (2008). Segmental acquisition in adult ESL learners: A longitudinal
study of vowel production. Language Learning, 58, 479–502.
Niebuhr, O., Alm, M., Schümchen, N., & Fischer, K. (2017). Comparing visualization techniques for
learning second language prosody. International Journal of Learner Corpus Research, 3, 250–277.
Ortega-Llebaria, M., Gu, H., & Fan, J. (2013). English speakers’ perception of Spanish lexical stress:
Context-driven L2 stress perception. Journal of Phonetics, 41, 186–197.
Pelzl, E., Lau, E. F., Guo, T., & DeKeyser, R. (2019). Advanced second language learners’ perception
of lexical tone contrasts. Studies in Second Language Acquisition, 41, 59–86.
Pickering, L. (2001). The role of tone choice in improving ITA communication in the classroom.
TESOL Quarterly, 35, 233–253.
212
Pytlyk, C. (2008). Interlanguage prosody: Native English speakers’ production of Mandarin yes-no
questions. In Proceedings of the 2008 annual conference of the Canadian linguistic association.
Qin, Z., Chien, Y. F., & Tremblay, A. (2017). Processing of word-level stress by Mandarin-speaking
second language learners of English. Applied Psycholinguistics, 38(3), 541–570.
Quené, H., & Delft, L. E. Van. (2010). Non-native durational patterns decrease speech intelligibility.
Speech Communication, 52, 911–918.
Quené, H., & Orr, R. (2014). Long-term convergence of speech rhythm in L1 and L2 English. In 7th
international conference on speech prosody 2014, 342–345.
Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal.
Cognition, 73, 265–292.
Santiago, F., & Delais‐Roussarie, E. (2015). The acquisition of question intonation by Mexican
Spanish learners of French. InProsody and language in contact (pp. 243–270). Springer, Berlin,
Heidelberg.
Saito, K., & Wu, X. (2014). Communicative focus on form and second language suprasegmental
learning: Teaching cantonese learners to perceive mandarin tones. Studies in Second Language
Setter, J. (2003). A comparison of speech rhythm in British and Hong Kong English.In Proceedings of
the 15th International Congress of Phonetic Sciences, (pp. 467–470).
Setter, J., & Deterding, D. (2003). Extra final consonants in the English of Hong Kong and Singapore.
In Proceedings of the 15th international congress of phonetic sciences, Barcelona, 12–14.
Silpachai, A. (2020). The role of talker variability in the perceptual learning of Mandarin tones by
American English listeners. Journal of Second Language Pronunciation, 6, 209–235.
Slowiaczek, L. M. (1990). Effects of lexical stress in auditory word recognition. Language and Speech,
33, 47–68.
Smith, C. L., Erickson, D., & Savariaux, C (2019). Articulatory and acoustic correlates of prominence
in French: Comparing L1 and L2 speakers. Journal of Phonetics, 77, 1–29.
Tajima, K., Port, R., & Dalby, J. (1997). Effects of temporal correction on intelligibility of foreign-
accented English. Journal of Phonetics, 25, 1–24.
Tortel, A., & Hirst, D. (2010). Rhythm metrics and the production of English L1 / L2. In Proceedings of
speech prosody 2010-5th International Conference.
Tremblay, A., Broersma, M., & Coughlin, C. E. (2018). The functional weight of a prosodic cue in the native
language predicts the learning of speech segmentation in a second language. Bilingualism, 21, 640–652.
Trofimovich, P., & Baker, W. (2006). Learning second language suprasegmentals: Effect of L2 ex-
Turco, G., Dimroth, C., & Braun, B. (2015). Prosodic and lexical marking of contrast in L2 Italian.
Second Language Research, 31, 465–491.
Wang, X. (2006). Perception of L2 tones: L1 lexical tone experience may not help. In Proceedings of
speech prosody, Dresden, Germany.
Wennerstrom, A. (1998). Intonation as cohesion in academic discourse: A study of Chinese speakers of
English. Studies in Second Language Acquisition, 20, 1–25.
White, D., & Mok, P. (2018). L2 speech rhythm development in new immigrants. In Proceedings of
speech prosody 2018 (pp. 838–842). Poznán.
White, D., & Mok, P. (2019). L2 speech rhythm and language experience in new immigrants. In
Proceedings of 19th international congress of phonetic sciences, Melbourne, Australia 2019
(pp. 334–338).
Wiener, S., Chan, M. K. M., & Ito, K. (2020). Do explicit instruction and high variability phonetic
training improve nonnative speakers’ Mandarin tone productions? The Modern Language Journal,
104, 152–168.
Willems, N. (1982). English intonation from a Dutch point of view. In Proceedings of the tenth inter-
national congress of phonetic sciences, 1–6 August 1983, Utrecht, The Netherlands, 706–709.
Wu, X., Munro, M. J., & Wang, Y. (2014). Tone assimilation by Mandarin and Thai listeners with and
without L2 experience. Journal of Phonetics, 46, 86–100.
Zhang, Y., Nissen, S. L., & Francis, A. L. (2008). Acoustic characteristics of English lexical stress
produced by native Mandarin speakers. The Journal of the Acoustical Society of America, 123,
4498–4513.
213
Zheng, A., Hirata, Y., & Kelly, S. D. (2018). Exploring the effects of imitating hand gestures and head
nods on L1 and L2 Mandarin tone production, Journal of Speech, Language and Hearing Research,
61, 1–18.
Zhu, Y., Chen, A., Sudhoff, S., & Mok, P. (2019). Third language prosody: Evidence from Cantonese-
English-German trilinguals. In ICPhS 2019, 1–5.
Zhu, Y., & Mok, P. K. P. (2016). Intonational phrasing in a third language: The production of German
by Cantonese-English bilingual learners. In Proceedings of the 8th international conference on speech
prosody (SP2016) (pp. 751–755).
Zielinski, B. W. (2008). The listener: No longer the silent partner in reduced intelligibility. System,
36, 69–84.
214
15
GRAMMAR FOR SPEAKING
Grammatical knowledge is defined as knowledge of the structures and patterns that govern a
language, and its acquisition as the ability to use this information in communication
(Nassaji, 2019). We consider both syntax and morphology as part of this definition, in-
asmuch as they constitute linguistic patterns and are treated as grammar in pedagogical
practice. We also include lexicogrammar, the grammatical behaviour of lexical items
(Halliday & Matthiesen, 2013).
In second language acquisition (SLA), grammar learning has commonly been associated
with the learning of rules, most of which are derived from written language. However, as
spoken corpora have become more numerous and accessible, it is now well established that
the rules we commonly see in language teaching and pedagogical materials do not fully
capture the unique grammatical properties of speech. There are competing views on whether
these differences are attributable to a distinct grammar of speech, as espoused by Brazil
(1995), or whether speech and writing are governed by the same grammar but with structures
occurring with different distributions (Biber et al., 1999). However, most studies, and the
pedagogical recommendations resulting from them, support the latter view. This view is also
consistent with the functionalist approach, which proposes that language is primarily a tool
for communication. From this perspective, grammar is best examined in terms of how it is
used to express meaning and how contextual factors influence speakers’ language choices.
One observation that supports a functionalist approach to the grammar of speech – in
particular, that grammar fulfils different functions in writing and speaking – is that the
grammatical “rules” derived from writing are consistently flouted in speech, particularly in
conversation. Examples (1) to (3) below illustrate some deviations.
1. Incomplete sentences
• Team meeting today!
• What time?
• Two o’clock
2. Subject ellipsis
• [I] Had to get my bike fixed
DOI: 10.4324/9781003022497-19 215

• How come?
• [It] Was making this funny noise
3. There is + plural
• There’s too many assignments in this class
• Yeah? Last term there was five papers to write
• Oh, there’s only three this time
The incomplete sentences in (1) indicate that conversations are co-constructed across turns;
rather than formulating well-formed sentences, the speakers build on each other’s utterances
to complete a thought. In (2), shared knowledge between speakers eliminates the need to
specify the subject: it is clear whose bike needs fixing and what was making the noise.
Example (3) shows how a non-standard form that facilitates speaking can be con-
ventionalized: generalizing there’s to singular and plural subjects not only conveys the
message more quickly, but also bypasses the task of ensuring subject–verb agreement.
Spoken grammar features appear to arise from the social and cognitive demands of speaking.
Learners wishing to communicate effectively in their second language (L2) must therefore
learn to use grammar to meet these demands.
Differentiating Speech Events

That grammatical needs can vary broadly between spoken and written production is attested
in the corpus work of Biber et al. (1999), Brazil (1995), and Carter and McCarthy (1995),
who provide evidence that some grammatical features are more typical of speech than
writing. However, grammatical properties also change across spoken registers, what Biber
(1995) refers to as situationally defined varieties. Speech events may be classified, for example,
based on to whom one is speaking and for what purposes. The discourse can be monologic;
that is, controlled by one person with occasional input from listeners (e.g., plenary speeches).
They can be dialogic or interactive, as is typical of most conversations, involving two or more
speakers, each contributing substantially to the exchange. In terms of purpose, speech can be
conversational, involving exchange between speakers for a shared goal such as understanding
or consensus, transactional, involving the exchange of services or information, as with service
encounters (e.g., in a store), or informational, involving one-way transmission of information
from speaker to listener. Speaking can also be rehearsed, as with presentations or public
speeches; planned, as with structured interviews; or spontaneous. Here, we focus on con-
versational grammar (Rühlemann, 2006), or the grammar of interactive, conversational
registers, as this is where the grammar of speech is most distinct (Leech, 2000) and the issues
it raises with SLA are most relevant. Rühlemann (2007) summarizes the functions of con-
versation into five “situational factors” that give rise to its unique grammatical properties.
These are shared context, co-construction, real-time processing, discourse management, and
relation management. We illustrate these properties in the following parts.
The notion that speech is grammatically distinct from writing is not recent. For example, during
the Grammarians’ War of 1519–1521, prescriptive grammarians took issue with spoken pas-
sages in the Latin grammar book Vulgaria, which reflected the colloquial language of the vulgus
or common people (Carlson, 1992). Colloquial varieties were seen as degrading the language,
characterized as “abusions of speech” (Jonson, 1640, cited in Carter & McCarthy, 2017) and
216
Grammar for Speaking
“adventitious sounds…rather than voices of art” (Harris, 1773, cited in Carter & McCarthy,
2017). The debate was also pedagogical, with some scholars recognizing the primacy of spoken
language over written (Emerson, 1896; Sweet, 1899). Sweet (1899), for example, noted that
because speech predates writing developmentally, native speakers can comfortably maintain
their speech patterns despite learning “formal” grammar later on; and so learners must also
develop an equally strong association with the spoken register (p. 211). Linn (2013) offers an
overview of similar divides in languages where written forms are historically associated with
scholarship, education, and careful style. In contemporary usage, examples include the French
passé simple (il parla, “he spoke”) and the imperfect subjunctive (qu’il parlât, “that he have
spoken”), which occur almost exclusively in literary works and are considered inappropriate, or
even comical, in conversation.
Today, attempts to standardize language continue to uphold the notion of speech as
inferior to writing, albeit to a lesser degree. Until 2018, for example, the American Heritage
Dictionary employed a “usage panel” of linguists, authors, journalists, and other language
experts who voted on the acceptability of certain constructions. Many were popular gram-
matical devices that were not considered acceptable until they had been in use for many
years. An example is hopefully as a sentence modifier (Hopefully we’ll be on time), which was
rated acceptable by only 34% of the panel members in 1999 but was accepted by 63% in 2012
(American Heritage Dictionary, 2020).
Pedagogical approaches have also shifted over the years. The Reform Movement of 19th
century Britain and Europe proposed greater attention to spoken language than the
grammar translation approaches that characterized foreign language teaching, but the in-
structional texts still tended to follow principles of written language in terms of vocabulary
and grammar (see Thornbury & Slade, 2006, p. 248). The audio-lingual method, popular in
North America through the 1960s and early 1970s, assumed that grammar was best learned
through oral practice. It favoured mechanical drilling of grammatical constructions in dia-
logues, without making the patterns explicit. The choice of patterns and their forms, how-
ever, was based on the conventions of written grammar (e.g., the use of complete sentences),
resulting in unnatural dialogues. With the introduction of the notion of communicative
competence (Canale & Swain, 1980) and communicative language teaching (CLT) came a
greater focus on authentic interaction in classrooms. Central to CLT is the idea that effective
communication requires not only the ability to construct grammatical sentences, but also the
ability to deploy them in real-life situations. Fluency (Chapter 13) became an important goal
alongside accuracy (which often meant grammatical accuracy), and it is still common to see
pedagogical activities described as targeting one or the other. Contemporary methodologies
favour teaching grammar in context, with the teacher drawing attention to the grammar used
in a given situation and providing opportunities for controlled or free practice. However,
there remains limited attention in textbooks to how learners can use grammar to speak more
efficiently.
Multidimensionality and Language Change

Biber (1995) showed that both spoken and written language are multidimensional; that is,
they vary in such attributes as speaker involvement, narrative intent, situation-dependence,
persuasiveness, and (im)personal style. With linguistic change precipitated by new modalities
such as email and instant messaging, these dimensions are also in flux. For example, much
written communication now resembles informal speech (e.g., online chat), and many contexts
217
commonly considered formal, such as university lectures, are in fact closer to a conversa-
tional register (Biber et al., 2002). Therefore, a critical issue underlying any research on the
grammar of speech is the particular register, variety, or speech event under study within these
dimensions, as well as the larger social context in which it is used.
Authenticity and the Native-Speaker Standard

A core issue in teaching L2 speaking is authenticity. While many textbooks now claim to
teach authentic language (Carter & McCarthy, 1995), pedagogical practice and materials
continue to be based on writing-centric models of grammar (e.g., Cullen & Kuo, 2007 –
English; Etienne & Sax, 2009 – French; Fernández, 2011 – Spanish), such that learners are
not taught grammatical structures that native speakers use to communicate effectively and
appropriately (McCarthy & Carter, 2001). This has begun to change, at least for English,
with the introduction of corpus-based grammars such as the Longman Grammar of Spoken
and Written English (Biber et al., 1999) and pedagogical materials such as Exploring
Grammar in Context (Carter et al., 2000). This has given rise to new debates about where
authentic examples might come from. Scholars have debated whether native-speaker data are
an appropriate model for L2 users. Proponents of English as a Lingua Franca argue that a
native-speaker norm sets an unrealistic goal, given that non-native speakers constitute the
majority of English users (Llurda, this volume). Others believe that native-speaker data serve
as an appropriate starting point for discussing language patterns (e.g., Kuo, 2006; Mumford,
2009). Recent scholarship has shifted the focus towards notions of competence that are not
tied to nativeness. For example, Jenkins (2015) proposes retheorizing the lingua franca as
being constructed from the multilingual repertoires of speakers, whether native or non-
native. Within this framework, lingua franca grammar might result from “jointly generating
shared resources” (Mauranen, 2018, p. 15) that facilitate communication.
Spoken Grammar as Non-standard

The link between grammatical prescription and social class which underpinned the
Grammarians’ War persists today, in that adherence to grammatical norms is associated with
scholarship and education, and spoken or colloquial forms are associated with lower social
class. This becomes problematic when we consider regional variation, as regional varieties
are characterized by features that are both non-prescribed and primarily spoken. Examples
include double negation in African American Vernacular English (ain’t got none), the -autres
object pronouns in Quebec French (nous-autres, vous-autres, “us, you”), and the second-
person address form voseo in Latin American Spanish. Labov (1969) notes that these vari-
eties are as systematic as their more prestigious, usually colonial counterparts; in fact, their
use may carry covert prestige (Labov, 1972) by signalling solidarity with the speech com-
munity. The issue, then, is whether and how L2 speakers should be taught spoken grammar,
especially non-standard forms that are widely used in their communities. It is also worth
considering learners’ motivations to identify with certain communities or express identities in
their choice of grammatical forms (Ruivivar, 2020).
Grammar, Speaking, and Assessment

The issues described earlier are of particular importance in spoken language assessment,
where grammatical accuracy is often a key criterion. For example, the maximum speaking
score in IELTS, a well-known standardized English test, requires speakers to accurately use
218
“a full range of structures,” with occasional “‘slips’ characteristic of native speaker speech”
considered acceptable (IELTS, 2020). The TOEFL iBT only specifies an “appropriate range
of grammatical structures” and tolerates minor errors that do not impede comprehension
(Educational Testing Service, 2019). These descriptors raise the question of what forms
might be considered “errors” or “slips” when they are in fact serving the purposes of speech.
There is evidence that L2 speakers are held to a higher standard of grammatical accuracy
than native speakers, especially when they use features of spoken grammar. For example,
Ruivivar and Collins (2018) found that topic fronting (This book, have you read it?), which
aids both speaker and listener by specifying the topic of a sentence, is judged as less gram-
matical when produced by even moderately accented speakers.
The Grammar of Speech Is Variable, and Variably Acquired

Variation pervades all areas of language use, but it has been most extensively studied in
spoken language. This research has revealed that morphosyntactic variation is influenced by
linguistic and extralinguistic factors, and that L2 speakers can acquire the variation patterns
of L1 speakers, to an extent. For example, Gudmestad (2012) found that learners’ use of the
Spanish subjunctive and indicative moods was conditioned by an increasing range of lin-
guistic factors over time, beginning with verb semantics and eventually expanding to include
time reference, hypotheticality, and form regularity. Other instances of grammatical varia-
tion present a greater challenge. Fafulas’s (2015) study of variable present tense forms in L2
Spanish showed that learners across all levels favoured two well-known constructions, simple
present (ella canta, “she sings”) and be + present participle (ella esta cantando, “she is
singing”), but that less frequent constructions such as go + present participle (ellos andan
estudiando, “they go studying”) were acquired much later.
Variation also allows speakers to convey information about their identities, positioning, in-
tentions, and other aspects relevant to the conversation, in addition to expressing apparent
meanings. This skill is one component of sociolinguistic competence, the ability to adjust one’s
language to suit different social situations. Speakers exercise sociolinguistic competence by de-
ciding among multiple forms, or variants, each of which might perform a different social role. In
the following examples, different aspects of the interactional context are conveyed grammatically:
(4a) Es-tu prêt? (French)

Are you ready?
(4b) T’es prêt?
You are ready?
(4c) T’es-tu prêt?
You are-TU ready?
(5a) Did you take some pictures?
(5b) Did you take any pictures?
(6a) Good afternoon. May I please have a large cup of coffee and a piece of chocolate
cake? Thank you.
(6b) Hi, one large coffee and chocolate cake please. Thanks.
The questions in 4a to 4c offer different types of social and contextual information. A

speaker wishing to appear well-spoken might use subject–verb inversion (4a), or she might
use declarative word order (4b) to convey familiarity. Example 4c, a colloquial variant
common in Quebec French, would only be appropriate among close acquaintances.
219
Grammar is therefore used here for what Rühlemann (2007) calls relation management.
Examples 5a and 5b illustrate discourse management: the quantifier can communicate a
speaker’s stance or attitude, with some indicating that the speaker expects a positive re-
sponse, and any indicating an expected negative response (Larsen-Freeman, 2001). L2 users
appear to struggle with this type of variation (e.g., Mougeon et al., 2004), although near-
native speakers have been shown to closely approximate L1 speakers’ patterns
(Donaldson, 2016).
Grammatical choices can also mark contextual appropriacy. While 6a is, by textbook
standards, more grammatical than 6b, the latter may be more appropriate in a typical coffee
shop, where encounters are expected to be brief. L2 users might benefit from knowing that in
such situations, speaking in full sentences (as they may have been trained to do) is not only
unnecessary, but perhaps dispreferred. Larsen-Freeman (2001) argues that such grammatical
options represent choices available to speakers and which allow them to express politeness,
authority, and other social attributes.
Grammatical Features Distinguish Between Spoken Registers

Descriptive accounts of spoken grammar have increased exponentially with the availability
of spoken corpora, and the work of Biber and colleagues (Biber, 1986, 1988; Biber et al.,
1999) in differentiating dimensions of spoken language through their linguistic features.
Grammatical differences appear to be most distinct in interactive (versus edited) and situated
(versus abstract) contexts, which align with descriptions of conversational grammar.
Interactive spoken texts, for example, have high frequencies of that-clauses and turn- or
utterance-final prepositions, reflecting the co-construction of messages across turns. Other
features reflect indexicality, the reliance on information that is contextually defined or shared
between interlocutors, consistent with Rühlemann’s notion of shared context. These include
greater use of questions, the present tense, and the singular personal pronouns I, you, and it
(Biber, 1986; Biber et al., 1999). These findings have been instrumental in understanding the
grammar of speech from a functional perspective, as they show how the demands of specific
communicative contexts dictate the grammatical structures used.
Grammar Supports Fluency

The ability to “keep speaking at normal rates in real time” (Skehan, 1998, p. 3) is a persistent
challenge for L2 speakers. In interaction, speakers must understand their interlocutor and for-
mulate a response based on this understanding, often in a split second. Proficient speakers use
grammatical knowledge to cope with real-time processing, usually by reducing lexical and
morphological content while keeping the message intact (Leech, 1983). The forms provided in
the introduction (e.g., there is + plural) perform this function, as do contractions (it’s, gonna) and
multi-functional units (e.g., be like to report speech, emotion, and gesture; Rühlemann, 2007).
Yet many of these would be considered ungrammatical by prescriptive standards. For example,
declarative-order questions (You’re sure you want to go?), which are often corrected in L2 speech,
actually aid fluency by eliminating the need to craft well-formed sentences in real time
(Mumford, 2009). Hilton (2009) provides evidence of this strategy from a spoken corpus of L2
French and English. She reported strong correlations between speakers’ grammatical knowledge
and several measures of fluency, including number, mean length, and total length of hesitations,
and rate of repetitions and reformulations. In other words, the greater a speaker’s grammatical
knowledge, the less likely his or her speech is to be interrupted.
220
What types of grammatical knowledge are most beneficial for L2 fluency? Researchers
have answered this question indirectly by identifying features that support fluent usage
among L1 users (e.g., Biber et al., 1999). This research suggests that speakers draw upon
mental representations of lexicogrammatical chunks, or constructions – ordered combina-
tions of words that function as one unit with a specific semantic or discoursal function (see
Peters, this volume). Examples of constructions include determiner + noun (a bicycle, your
brother) and subject pronoun + verb + object pronoun (I found it, she called them). The use of
constructions to produce stretches of language while limiting attention to grammatical form
has been noted by several scholars (e.g., Pawley & Syder, 1983) and is consistent with a
usage-based theory of language use (Tomasello, 2003). In L2 speech, it has been shown that
learners fulfil pragmatic functions using unanalyzed chunks of language, or “holophrases”
(Corder, 1973). More recently, Gilquin (2018) found that advanced learners primarily use the
same set of simple, two-word constructions as native speakers. While native speakers ex-
hibited longer and more complex constructions, some constructions were more frequent in
the learner corpus (e.g., subordinating conjunction + pronoun, because you), suggesting that
at least in advanced stages, learners can use language chunks to support speaking.
Grammar Interacts With Phonology

Spoken grammar is influenced by the phonological features of speech. One way this re-
lationship manifests is perceptual salience, or the ease with which a linguistic item is per-
ceived. Grammatical items can be rendered more or less salient by speech properties. More
salient morphemes contain more phones or individual sounds, contain vowels, and have high
sonority (Goldschneider & DeKeyser, 2001). For example, of the three allomorphs of the
English past tense morpheme -ed (/əd/ - waited, /d/ - swooned, and /t/ - flossed), the syllabic
ending /əd/ is the most perceptually salient as it consists of multiple phones, and is thus more
readily produced (Solt et al., 2003). Salience can also be moderated by frequency; that is, a
less salient form may be more readily acquired if it is frequent in the input, such as reduced
articles (e.g., French le commonly reduced to l’).
Researchers have also investigated the interaction between grammaticality and accent-
edness in perceptions of L2 speech, with accentedness defined as the degree of perceived
deviation from an L1 speaker (Munro & Derwing, 1995). One of the earliest investigations
into this relationship found that speech containing grammatical errors may be judged as
accented, and that accented speech can be judged as ungrammatical (Varonis & Gass, 1982).
Since then, only a handful of studies have revisited the issue, generally confirming the same
two-way relationship: grammatical incorrectness leads people to perceive an accent (Asano &
Weber, 2016), and accentedness leads people to hear errors in otherwise grammatical sen-
tences (Hanulikova et al., 2012; Ruivivar & Collins, 2019). Trofimovich and Isaacs (2012)
have also found grammatical accuracy (along with word stress and total unique words
produced) to be a significant predictor of comprehensibility; that is, the ease with which a
listener can understand what is being said.
Descriptive Corpus Analyses

Corpus-based descriptions form the basis of most research on the grammar of speech. Biber
and colleagues (e.g., Biber, 1986, 1988; Biber et al., 2002) have used multivariate statistical
analyses to reveal clusters of linguistic features associated with different text types. This has
221
allowed them to identify the distinguishing grammatical and lexical properties of different
spoken and written registers. One important outcome is a probabilistic view of grammar,
which proposes that rather than following separate grammars, speech and writing (and their
various registers) use grammatical structures at different frequencies, such that the nature of
texts and speech events can be identified by their linguistic properties and vice versa.
Corpora have also been used to study the relationship between grammar and discourse. In
addition to frequency analyses, discourse studies are also concerned with the social and
pragmatic meanings conveyed by grammatical choices. It considers contextual factors such
as the relationship between interlocutors and the intent of the interaction. For example,
Adolphs and Carter (2003) analyzed the frequency and uses of like in spoken English, finding
that it occurs largely in intimate and social contexts (as opposed to professional and
transactional) and serves a variety of discourse properties, such as reporting speech (We were
like, watch out!), analogizing (House work? Like what? Like dusting the shelves), and hedging
(I ran at like the speed of light). Drawing from the same corpus, Carter & McCarthy (1999)
reported that the get-passive was overwhelmingly favoured in the reporting of negative
events, with focus on the subject rather than the agent (e.g., We got stranded).
Research on the grammar of discourse can also focus on processing and information struc-
ture; that is, how grammatical forms define the (non)prominence of different elements of a turn
or utterance. Speakers often decide on word order depending on what they wish to emphasize
(e.g., Your teacher called vs. I got a call from your teacher), which in most languages means
putting prominent elements at the end (Halliday, 1985). This can also be achieved through non-
standard word order, such as cleft constructions in French and English, as illustrated.
(7a) It’s Gary who’s driving

(7b) C’est Gary qui conduit (French)
Discourse-based studies typically focus on grammatical features selected for their functions
in narrow contextual categories or specific varieties, and so cannot easily be generalized
across contexts (Dontcheva-Navratilova, 2012). Rather, this research identifies the linguistic
markers of different interactional situations, which can tell us where certain forms may be
more or less appropriate, and what information will be most useful for L2 learners in a
variety of contexts.
Perception and Judgement Studies

Recent studies have also investigated the role of grammar in the perception, judgement, and
assessment of L2 speech. This research examines listeners’ quantitative and qualitative re-
sponses to speech samples on different dimensions. The most common measures in relation
to grammar are accentedness and comprehensibility (defined briefly in the previous parts and
described in greater detail in Chapter 12). Unlike other measures such as fluency and in-
telligibility, accentedness and comprehensibility are often treated as subjective; researchers
are interested in how accented or comprehensible speech is perceived to be by listeners,
usually through measures such as Likert scales. To ensure reliability, the measures therefore
must be defined clearly and consistently across raters.
Grammaticality can be measured both objectively and subjectively. Common objective
measures include the number of morphosyntactic errors, and the proportion of words with
errors versus total number of words (e.g., Trofimovich & Isaacs, 2012). As for subjective
measures, studies have also considered the perceived gravity of grammatical errors, which
may range from barely noticeable to egregious or causing a breakdown in comprehension
222
(Derwing et al., 2002). Ruivivar and Collins (2019) have used a holistic measure of gram-
maticality, in which raters judge how correct a sentence sounds overall, following a training
session that discourages attention to specific grammatical features.
Perception and judgement studies are usually carried out in laboratories, with raters listening to
pre-written speech samples. The limitation of this approach is that the speech samples might lack
authenticity compared to natural language. However, it is challenging to obtain consistent num-
bers and types of grammatical errors, or sufficient samples of specific grammatical features, from
naturalistic data, and so laboratory studies remain prevalent. Authenticity can be maximized by
instructing speakers to speak as naturally as possible, giving them time to practice, and recording
multiple versions of each item from which the most natural is selected for the study.
Selecting Features That Support Speaking Goals

An ideal method is to focus on grammatical items that reflect the interactional nature of
speech and which help L2 users become effective communicators. Specific pedagogical goals
might be defined by identifying linguistic targets based on learners’ communicative needs.
For example, features that support fluency and appropriacy are tied to shared goals such as
reducing cognitive demands, emphasizing some words over others, completing a transaction,
or expressing familiarity. Such an approach might help learners better understand the
challenges of speaking and consciously apply grammatical knowledge to address them.
Because speaking is often targeted alongside reading and writing, instructional time dedi-
cated solely to grammar for speaking may be limited. Timmis’s (2005) consciousness-raising
approach may help to address this issue. Here, students examine written texts, reflect on how
they might sound different in spoken form, and manipulate the text to make it appropriate for
conversation. This activity could be adapted for a focus on a particular grammatical item,
whose frequency patterns could be examined (perhaps using corpus data) and compared across
modalities and registers. Fluency practice may also be integrated into functional topics; for
example, learners can practice making requests using the 4–3–2 minute activity (Nation, 1989),
where learners repeat the same service encounter in decreasing amounts of time to show how
ellipsis can be deployed for effective communication (see example 6b).
Frequent and Useful Constructions

Following the tenets of usage-based learning, L2 learners would benefit from learning lex-
icogrammatical constructions that are frequent and useful in their target speaking contexts.
To do so, Folse (2015) suggests using corpora to identify “grammar vocabulary” – words
that frequently occur with a given grammatical pattern – in contrast with content vocabu-
lary, which is chosen to match the unit theme or topic (p. 124). He showed that the present
progressive in English typically occurs with a small handful of verbs, including do, go, try,
talk, and get. Teaching spoken grammatical forms within their usual lexical environments
can reinforce form-meaning connections and facilitate retrieval during communication.
Teachers can also do the reverse; that is, use corpora to determine the grammatical beha-
viour of a given word. For example, a quick search in the Corpus of Contemporary
American English reveals that guess appears most frequently in speech with I and with the
relative pronoun omitted (I guess [that] you’re right), and occasionally at the end of a clause
(That’s fine, I guess). Learners may thus want to learn this word alongside these grammatical
features, as part of a discourse marker, or in association with pragmatic functions such as
223
hedging or hesitation. At the same time, less frequent expressions such as wild guess or
educated guess, though not frequent overall, may be useful as they are conventionalized ways
of expressing certain notions (Bybee, 2008), and often only work in limited grammatical
contexts. Corpora searches can help learners evaluate the appropriacy or acceptability of
grammatical utterances they may create (e.g., wildly guessing or guessing educatedly).
Exploring Grammatical Choices in Context

The selection of realistic language samples has been a core tenet of language teaching since the
adoption of CLT. In line with the first two recommendations, grammatical structures are best
presented within the social and pragmatic functions they perform. In examples 4a–4c, the word
order of yes–no questions in French is conditioned by socio-stylistic factors. When teaching
such features, it would be wise to provide examples in plausible contexts; for example, the
Quebec vernacular -tu, a highly informal feature, should ideally be encountered in situations
where it would be expected – in casual conversations, or between close acquaintances.
In the past, scholars have noted the difficulty of finding spoken texts that meet these
criteria, while also being authentic and interesting (McCarthy, 1998). Today, however,
teachers have access to authentic speech from videos, movies, and TV, for which scripts are
often available for use in grammatical analysis. Large corpora such as COCA and the British
National Corpus also contain sizeable spoken components, with COCA also representing
different registers and genres. Although the authenticity of scripted speech can be ques-
tioned, the same could be said of any corpus where the speakers are aware of being recorded.
Further, Timmis (2013) suggests that scripted texts might be sufficient for pedagogical
purposes, so long as they are “plausible as natural interaction” (p. 87).
Different Types of Oral Practice

Context-appropriate practice is key to enabling learners to use grammar for effective com-
munication. The few studies that have explored the teaching of spoken grammar have included
picture description and role-play tasks, which follow explicit teaching of the target forms (Jones
& Carter, 2014). These provide valuable opportunities for putting recently learned structures to
use. To use these forms in a variety of communicative tasks, however, learners may need a wider
range of oral practice opportunities, particularly ones that draw from functional and usage-
based perspectives. This may be especially useful for aspects of form rendered difficult by their
phonological features, such as reduced syllables in complex conditional forms (e.g., If I’d known
I wouldn’t have gone). Both oral and aural practice can promote the noticing of these features.
Larsen-Freeman (2012) has advocated iterative practice, in which learners either use
language in different contexts or use various structures to perform a single function. Though
often considered dry or inauthentic, communicative drills may be useful in this regard, as
they help reinforce associations between spoken grammatical forms and their communicative
functions. An example is Nation’s (1989) 4–3–2 timed activity. Besides promoting fluency, it
reinforces associations between form and use by providing iterative practice with different
interlocutors in the same fast-paced context. Furthermore, in line with usage-based peda-
gogy, repetition supports the associative learning processes that link forms to specific lin-
guistic or contextual cues. For example, the subtle phonological difference between the
imperfect and past participle forms in French (parlais /parlɛ/ vs. parlé /parle/), and the social
cues that signal the presence of quotative like in informal narratives, may be made more
salient through repeated, guided exposure to the contexts in which they occur, consolidating
knowledge of such forms that can subsequently be accessed during oral production.
224
7 Future Directions
Grammar and Sociolinguistic Competence

A substantial aspect of speaking involves choosing between grammatical forms that fulfil
various social and pragmatic functions. A prolific line of research has explored L2 speakers’
sociolinguistic competence, usually operationalized as their ability to navigate these choices,
often in comparison with native speakers. These studies typically examine learners’ pro-
duction, and to a lesser extent, metalinguistic understanding, of sociolinguistically variable
features. Future research might complement current findings by taking a functional view of
development, investigating (1) whether and how L2 speakers come to understand discoursal
functions in the L2; (2) what factors might support such knowledge, such as type of inter-
action, instruction, and L1–L2 functional similarities; and (3) whether and how these support
mastery of grammatical choice in interaction. We expect increased research interest in these
areas, especially with the availability of spoken learner corpora such as the Louvain
International Database of Spoken English Interlanguage (LINDSEI) and French Learner
Language Oral Corpora (FLLOC). Such research will contribute to an understanding of
grammar as a resource for speaking, in general, and its role in promoting sociolinguistic
competence and communicative skills, in particular.
Grammar for Speaking Across Languages

Much of what we know about the grammar of speech derives from studies on English, with
some insights from Western European languages such as French and Spanish. We thus know
a great deal about grammar for speaking in specific linguistic contexts, but we know little
about how grammar aids L2 speaking in other languages. Some functions of spoken
grammar appear to be shared cross-linguistically; for example, English and French both use
subject ellipsis in shared-knowledge contexts and cleft constructions for emphasis (see ex-
ample 2, 7a, and 7b). Some cross-linguistic research has been done on specific grammatical
constructions, notably in an edited volume on subordination by Laury and Suzuki (2011).
Future research might build on this work by revealing universal ways in which grammar
changes according to the contextual and cognitive demands of a speaking situation. While we
can expect some languages to have unique spoken grammar features, a better understanding
of the commonalities across languages can help us better understand spoken grammar as a
phenomenon. From a pedagogical standpoint, it can also help identify which aspects of
speech might facilitate (or hinder) the learning of grammar for speaking, and what gram-
matical features are most beneficial for promoting oral communication skills.
Pedagogical Approaches
Given the continued pedagogical focus on communicative competence, one promising direc-
tion for research is to examine the role of grammar instruction on different aspects of L2
speech, and what aspects of L2 speech benefit the most from instruction. This can build on
research linking grammar to fluency (Hilton, 2009) and classroom studies showing that explicit
instruction can promote conceptual understanding and productive use of sociolinguistically
variable grammar (e.g., van Compernolle, 2013). Further studies might explore how fluency,
appropriacy, and other speech measures might be targeted through grammar instruction.
Teaching approaches could address specific grammatical features such as conversational dis-
course markers (e.g., right, okay) and variable question forms (see examples 4a to 4c), or they
225
may take discourse or pragmatic concepts (such as distancing and politeness) as units of in-
struction, with the relevant grammar identified to achieve the desired communicative intent.
Finally, research should take a more learner-centric approach to exploring pedagogical
approaches to grammar for speaking. The literature has generally centred on two issues:
what features to teach and how, and whether native speech is an appropriate model for
pedagogy. Most pedagogical methods have focused on either features of informal discourse
(e.g., Jones & Carter, 2014) or non-standard structures such as left-dislocation (e.g., Timmis,
2005), using various forms of inductive teaching and speaking practice. The native-speaker
question has not been resolved, though the selection of features is generally informed by
native-speaker corpus data. Missing from this discussion is learners’ perspectives on their
own language use. There is some evidence that learner-centric issues of identity and align-
ment with the target language community can influence learning and production; in parti-
cular, some learners are reluctant to use features associated with native-speaker usage
(Ruivivar, 2020; Soruç & Griffiths, 2015). Future research might explore these perceptions
and how teaching can incorporate learners’ stance on learning conversational grammar and
what aspects of spoken language they find useful or want to learn.
Further Reading
Cook, V. (2016). Where is the native speaker now? TESOL Quarterly, 50(1), 186–189.
Hughes, R. (2010). What can a corpus tell us about grammar teaching materials? In M. O’Keeffe & McCarthy
(Eds.), The Routledge handbook of corpus linguistics (1st edn, pp. 401–412). London: Routledge.
Rühlemann, C. (2012). Conversational grammar. In C. Chapelle (Ed.), The encyclopedia of applied
linguistics. Oxford: Wiley Blackwell.
Thornbury, S., & Slade, D. (2006). Conversation: From description to pedagogy. Cambridge: Cambridge
University Press.
References
Adolphs, S., & Carter, R. (2003). And she’s like it’s terrible, like: Spoken discourse, grammar and
corpus analysis. International Journal of English Studies, 3(1), 45–56.
American Heritage Dictionary of the English Language. (2020). Hopefully [dictionary entry]. Retrieved
from https://ahdictionary.com/word/search.html?q=hopefully
Asano, Y., & Weber, A. (2016). Listener sensitivity to foreign-accented speech with grammatical errors.
In A. Papafragou, D. Groder, D. Mirman, & J. C. Trueswell (Eds.), Proceedings of the 38th annual
conference of the cognitive science society (pp. 1775–1780). Austin, TX: Cognitive Science Society.
Biber, D. (1986). Spoken and written textual dimensions in English: Resolving the contradictory
findings. Language, 62, 384–414.
Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press.
Biber, D. (1995). Dimensions of register variation: A cross-linguistic comparison. Cambridge: Cambridge
University Press.
Biber, D., Conrad, S., Reppen, R., Byrd, P., & Helt., M. (2002). Speaking and writing in the university:
A multidimensional comparison. TESOL Quarterly, 36(1), 9–48.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and
written English. London: Longman.
Brazil, D. (1995). A grammar of speech. Oxford: Oxford University Press.
Bybee, J. (2008). Usage-based grammar and second language acquisition. In P. Robinson and N. Ellis (Eds.),
Handbook of cognitive linguistics and second language acquisition (pp. 216–236). New York: Routledge.
Canale, M., & Swain, M. (1980). Theoretical bases of communicative approaches to second language
teaching and testing. Applied Linguistics, 1, 1–47.
Carter, R., & McCarthy, M. (1995). Grammar and the spoken language. Applied Linguistics, 16(2), 141–158.
Carlson, D.R. (1992). The “Grammarians’ War” 1519–21: Humanist careerism in early Tudor England
and printing. Medievalia et Humanistica, 18, 157–181.
226
Carter, R., & McCarthy, M. (1999). The English get-passive in spoken discourse: Description and
implications for an interpersonal grammar. English Language and Linguistics, 3(1), 41–58.
Carter, R., & McCarthy, M. (2017). Spoken grammar: Where are we and where are we going? Applied
Linguistics, 38(1), 1–20.
Carter, R., Hughes, R., & McCarthy, M. (2000). Exploring grammar in context: Upper-intermediate and
advanced. Cambridge: Cambridge University Press.
Cook, V. (2002). Background to the L2 user. In V. Cook (Ed.), Portraits of the L2 user (pp. 1–28).
Clevedon: Multilingual Matters.
Corder, S. P. (1973). Introducing applied linguistics. New York: Penguin
Cullen, R., & Kuo, I. (2007). Spoken grammar and ELT course materials: A missing link? TESOL
Quarterly, 41(2), 361–386.
Derwing, T. M., Rossiter, M. J., & Ehrensberger-Dow, M. (2002). “They speaked and wrote real
good”: Judgements of non-native and native grammar. Language Awareness, 11(2), 84–99.
Donaldson, B. (2016). Aspects of interrogative use in near-native French: Form, function, and register.
Linguistic Approaches to Bilingualism, 6(4), 467–503.
Dontcheva-Navratilova, O. (2012). The grammar of discourse. In C. Chapelle (Ed.), The encyclopedia
of applied linguistics. Oxford: Wiley Blackwell.
Educational Testing Service (2019). Performance descriptors for the TOEFL iBT® Test. Retrieved from
https://www.ets.org/s/toefl/pdf/pd-toefl-ibt.pdf
Emerson, O. F. (1896). The teaching of English grammar. The School Review, 5.
Etienne, C., & Sax, K. (2009). Stylistic variation in French: Bridging the gap between research and
textbooks. The Modern Language Journal, 93(4), 584–606.
Fafulas, S. (2015). Progressive constructions in native-speaker and adult-acquired Spanish. Studies in
Hispanic and Lusophone Linguistics, 8(1), 85–133.
Fernandez, C. (2011). Approaches to grammar instruction in teaching materials: A study in current L2
beginning-level Spanish textbooks. Hispania, 94(1), 155–170.
Folse, K. (2015). Creating corpus-based vocabulary lists for two verb tenses: A lexicogrammar ap-
proach. In M. A. Christison, D. Christian, P. A. Duff, & N. Spada (Eds.), Teaching and learning
English grammar: Research findings and future directions (pp. 119–135). New York: Routledge.
Gilquin, G. (2018). Exploring the spoken learner English construction. In R. Alonso Alonso (Ed.),
Speaking in a second language (pp. 128–152). Amsterdam: John Benjamins.
Goldschneider, J. M., & DeKeyser, R. M. (2001). Explaining the “natural order of morpheme acqui-
sition” in English: A meta-analysis of multiple determinants. Language Learning, 51(1), 1–50.
Gudmestad, A. (2012). Acquiring a variable structure: An interlanguage analysis of second language
mood use in Spanish. Language Learning, 62(2), 373–402.
Halliday, M. A. K. (1985). An introduction to functional grammar. London, England: Edward Arnold.
Halliday, M. A. K., & Matthiesen, C, (2013). Halliday’s introduction to functional grammar (4th edn).
London: Routledge.
Hanulikova, A., Van Alphen, P. M., Van Goch, M. M., & Weber, A. (2012). When one person’s
mistake is another’s standard usage: The effect of foreign accent on syntactic processing. Journal of
Cognitive Neuroscience, 24(4), 878–887.
Hilton, H. (2009). The link between vocabulary knowledge and spoken L2 fluency. The Language
Learning Journal, 36(2), 153–166.
IELTS (2020). Speaking: Band descriptors. Retrieved from https://www.ielts.org/-/media/pdfs/speaking-
band-descriptors.ashx?la=en
Jenkins, J. (2015). Repositioning English and multilingualism in English as a Lingua Franca. Englishes
in Practice, 2(3), 49–85.
Jones, C., & Carter, R. (2014). Teaching spoken discourse markers explicitly: A comparison of III and
PPP. International Journal of English Studies, 13(1), 37–54.
Kuo, I. (2006). Addressing the issue of teaching English as a lingua franca. ELT Journal, 60(3), 213–221.
Labov, W. (1969). Contraction, deletion, and inherent variability of the English copula. Language, 45,
715–762.
Labov, W. (1972). The logic of nonstandard English. Philadelphia: University of Pennsylvania Press.
Larsen-Freeman, D. (2001). The grammar of choice. In E. Hinkel & S. Fotos (Eds.), New perspectives in
grammar teaching in second language classrooms (pp. 104–118). New York: Routledge.
Larsen-Freeman, D. (2012). On the roles of repetition in language teaching and learning. Applied
Linguistics Review, 3(2), 195–210.
227
Laury, R., & Suzuki, R. (2011). Subordination in conversation: A cross-linguistic perspective.

Leech, G. N. (1983). Principles of pragmatics. London: Longman.
Leech, G. (2000). Grammars of spoken English. Language Learning, 50, 675–724.
Mauranen, A. (2018). Conceptualizing ELF. In J. Jenkins, W.Baker, & M. Dewey (Eds.), The
Routledge handbook of English as a Lingua Franca (pp. 7–24). New York: Routledge.
Linn, A. (2013). Vernaculars and the idea of a standard language. In K. Allan (Ed.), The Oxford
handbook of the history of linguistics (pp. 359–374). Oxford: Oxford University Press.
McCarthy, M. (1998). The spoken language and applied linguistics. Cambridge: Cambridge University Press.
McCarthy, M., & Carter, R. (1995). Spoken grammar: What is it and how can we teach it? ELT
Journal, 49(3), 207–218.
McCarthy, M., & Carter, R. (2001). Ten criteria for a spoken grammar. In E. Hinkel & S. Fotos (Eds.),
New perspectives on grammar teaching in second language classrooms (pp. 51–75). Mahwah, NJ:
Lawrence Erlbaum.
Mougeon, R., Rehner, K., & Nadasdi, T. (2004). The learning of spoken French variation by im-
mersion students from Toronto, Canada. Journal of Sociolinguistics, 8(3), 408–432.
Mumford, S. (2009). An analysis of spoken grammar: The case for production. ELT Journal, 63(2), 137–144.
Munro, M., & Derwing, T. (1995). Foreign accent, comprehensibility, and intelligibility in the speech of
second language learners. Language Learning, 45(1), 73–97.
Nassaji, H. (2019). Grammar acquisition. In S. Loewen and M. Sato (Eds.), The Routledge handbook of
instructed second language acquisition (pp. 205–223). New York: Routledge.
Nation, P. (1989). Improving speaking fluency. System, 17, 377–384.
Pawley, A., & Syder, F. H. (1983). Two puzzles for linguistic theory: Nativelike selection and nativelike
fluency. In J. Richards & R. W. Schmidt (Eds.), Language and communication (pp. 191–226).
London: Longman.
Rühlemann, C. (2006). Coming to terms with conversational grammar: ‘Dislocation’ and ‘dysfluency’.
International Journal of Corpus Linguistics, 11(4), 385 – 409.
Rühlemann, C. (2007). Conversation in context: A corpus-driven approach. London: Bloomsbury.
Ruivivar, J. (2020). Engagement, social networks, and the sociolinguistic performance of Quebec
French learners. The Canadian Modern Language Review, 76(3), 243–264.
Ruivivar, J., & Collins, L. (2018). The effect of foreign accent on perceptions of nonstandard grammar:
A pilot study. TESOL Quarterly, 52(1), 187–198.
Ruivivar, J., & Collins, L. (2019). Nonnative accent and the perceived grammaticality of spoken
grammar forms. Journal of Second Language Pronunciation, 5(2), 269–293.
Skehan, P. (1998). A cognitive approach to language learning. Oxford: Oxford University Press.
Solt, S., Pugach, Y., Klein, E. C., Adams, K., Stoyneshka, I., & Rose, T. (2003). L2 perception and
production of the English regular past: Evidence of phonological effects. In Brugos, A., Micciulla,
L., & Smith, C. (Eds.), Proceedings of the 36th annual Boston university conference on language
development (pp. 553–564). Somerville, MA: Cascadilla Press.
Soruç, A., & Griffiths, C. (2015). Identity and the spoken grammar dilemma. System, 50, 32–42.
Sweet, H. (1899). The practical study of languages: A guide for teachers and learners. London: Oxford
University Press.
Thornbury, S., & Slade, D. (2006). Conversation: From description to pedagogy. Cambridge: Cambridge
University Press.
Timmis, I. (2005). Towards a framework for teaching spoken grammar. ELT Journal, 59(2), 117–125.
Timmis, I. (2013). Spoken language research: The applied linguistic challenge. In B. Tomlinson (Ed.),
Applied linguistics and materials development (pp. 79–94). London: Bloomsbury.
Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition.
Cambridge, MA: Harvard University Press.
Trofimovich, P., & Isaacs, T. (2012). Disentangling accent from comprehensibility. Bilingualism:
Language and Cognition, 15(4), 905–916.
van Compernolle, R. A. (2013). Concept appropriation and the emergence of L2 sociostylistic varia-
tion. Language Teaching Research, 17(3), 343–362.
Varonis, E., & Gass, S. (1982). The comprehensibility of non-native speech. Studies in Second Language
228
16
CONVERSATIONAL INTERACTION
STUDIES
Jaemyung Goo
Interaction researchers claim that conversational interaction provides crucial opportunities
for learners to refine and restructure their inter-language by drawing their attention to lin-
guistic code features during negotiation for meaning (see Gass, 1997, 2003; Gass & Mackey,
2015; Goo, 2019; Long, 1985, 1996, 2007; Mackey, 2012; Mackey, Abbuhl, & Gass, 2012;
Pica, 1994, 1996 for reviews). Negotiation for meaning unfurls several interactional features
and activates cognitive processes optimally attuned to L2 development. Empirical research
has yielded sufficient evidence that interaction precipitates L2 learning. A number of inter-
actional features and processes, and learner-internal and -external factors believed to mediate
the extent to which L2 learners benefit from conversational interaction (e.g., modified output
opportunities, noticing, CF type, task type/complexity, working memory, and language
aptitude) have been investigated in various learning contexts and from diverse perspectives.
Overall, research results have furthered our understanding of the role of interaction in L2
learning. However, given that the field of SLA has expanded its theoretical and experimental
boundaries by adopting varied research methods and theoretical frameworks, the current
state of affairs, with many questions left unanswered or partially answered, bespeaks the
complex and multi-faceted nature of interaction-based learning involving a multitude of
potential mediating factors interacting with each other. I will first illustrate the Interaction
Hypothesis, along with relevant interactional features, then discuss variables that may
mediate, or are believed to mediate, the extent of interaction effects. Finally, I will provide a
brief summary and future directions.
Key concepts in interaction studies
Input: The language L2 learners are exposed to.

Output: The language produced by L2 learners.
Interaction: Conversation L2 learners engage in with native speakers (NSs) or peers.
Negotiation for meaning: A portion of conversation where meaning is negotiated to reach an
acceptable level of understanding (see Long, 1996).
DOI: 10.4324/9781003022497-20 229

Jaemyung Goo
Corrective feedback (CF): Oral or written responses to L2 learners’ erroneous output that
indicate its non-target-likeness in some way.
Modified output: Output modified by a learner following CF on their non-target-like original
utterance.
Uptake: A learner’s immediate response of any kind to CF which includes modified output.
Noticing: Registering L2 input/CF with some level of awareness
Attention to the role of linguistic and conversational adjustments in L2 learning within the
interaction framework was triggered by Long’s (1981) call for experimental studies to test the
hypothesis that “participation in conversation with NS, made possible through the mod-
ification of interaction, is the necessary and sufficient condition for SLA” (p. 275). The early
version of Long’s Interaction Hypothesis drew on Krashen’s (1982) emphasis on compre-
hensible input. That is, if linguistic/interactional adjustments contribute to making input
more comprehensible, and if comprehensible input leads to L2 acquisition, as Krashen has
claimed, we can assume that linguistic/interactional adjustments result in L2 development.
Accordingly, early interaction research was conducted to investigate and describe various
linguistic and conversational adjustments, discourse patterns, and negotiation sequences in
the process of dealing with incomplete understandings.
The pivotal value of negotiated interaction in L2 learning is further illustrated in Long’s
(1996) revised Interaction Hypothesis, which includes cognitive aspects. He proposed that
environmental contributions to acquisition are mediated by selective attention and by

the learner’s developing L2 processing capacity, and that these resources are brought
together most usefully, although not exclusively, during negotiation for meaning.
Negative feedback obtained during negotiation work or elsewhere may be facilitative
of L2 development, at least for vocabulary, morphology, and language-specific
syntax, and essential for learning certain specifiable L1–L2 contrasts (p. 414).
That is, negotiation for meaning, which involves “provoking adjustments to linguistic form,
conversational structure, message content, or all three, until an acceptable level of under-
standing is achieved” (p. 418), creates invaluable breeding grounds for L2 acquisition by
providing nontrivial learning conditions and opportunities for attentional and psycho-
linguistic processes optimized for the development of various L2 aspects. As Long suggested,
negotiation for meaning involving interactional adjustments likely “connects input, internal
learner capacities, particularly selective attention, and output in productive ways” (p. 452),
functioning as a critical interaction-learning bridge.
Findings of early interaction research indicate that modified input through interactional
adjustments is more effective at precipitating L2 comprehension (and later production) than
premodified or unmodified input without interaction opportunities, and that the amount of
interaction may depend on task type and grouping (see Gass, 1997, 2003). However, early
research also suggested that comprehension via interaction does not necessarily lead to L2
acquisition, that is, “posing a linear relationship between comprehension of input and intake
of the structures contained therein may be untenable” (Loschky, 1994, p. 320). Accordingly,
interaction researchers began to investigate a direct relationship between interaction and L2
learning. For instance, Mackey (1999) found active participation in negotiated interaction
230
Conversational Interaction Studies
was facilitative of her learners’ production of developmentally more advanced question

forms than simple observations or passive participation in interaction without negotiation.
Slightly better performance was observed for the developmentally ready group compared to
the performance of the developmentally unready group. Since then, an overwhelming
number of empirical studies have provided evidence for interaction effects (see Gass &
Mackey, 2015; Goo, 2019; Loewen & Sato, 2018) to the extent that the question is no longer
whether interaction promotes L2 development, but how it does so. Further, do cognitive and
affective individual differences influence the level of interaction effects, and if so, how?
Input, Negotiated Interaction, and Output

Echoing the significant role of negotiated interaction in L2 learning, Pica (1994) indicated that
“the twofold potential of negotiation – to assist L2 comprehension and draw attention to L2 form
– affords it a more powerful role in L2 learning than has been claimed so far” (p. 508). To ensure
that a message is sufficiently comprehensible for a smooth flow of communication, negotiated
interaction often takes one of three forms: clarification requests (e.g., What do you mean? What?
Pardon me?), confirmation checks (e.g., Do you mean X? X, right?), or comprehension checks (e.g.,
Do you understand? OK?). These negotiation moves are employed to prevent and resolve com-
munication breakdowns; negotiation for meaning via these moves likely enables input to become
“uniquely tailored to individuals’ strengths, weaknesses, and communicative needs, providing
language that suits their distinct developmental levels” (Mackey, 2012, p. 12), an important source
for L2 learning. That is, as Swain (1985) notes, “it is not input per se that is important to second
language acquisition but input that occurs in interaction where meaning is negotiated” (p. 246).
Evidence for the beneficial role of negotiated input in L2 comprehension was clearly manifested in
early interaction studies (Gass, 1997; Mackey, 2012 for reviews).
Furthermore, negotiated interaction offers valuable opportunities for L2 learners to pro-
duce (modified) output, leading them to engage in additional cognitive processes. Emphasizing
the importance of output production opportunities in L2 learning processes, Swain (1985)
suggested that L2 learners should be “pushed toward the delivery of a message that is not only
conveyed, but that is conveyed precisely, coherently, and appropriately” (p. 249). According to
Swain (1995, 2005), in addition to developing automatization, (pushed) output fulfils three
functions in L2 learning: the noticing function, hypothesis testing function, and metalinguistic
(reflective) function. First, for the noticing function, while attempting to produce output,
learners may notice that they do not know how to say/write a message that they want to
convey. Second, the hypothesis testing function indicates that learners test their own linguistic
hypotheses by producing output and observing its effect on communication. Finally, the
metalinguistic function refers to “using language to reflect on language produced by others or
the self” (Swain, 2005, p. 478), which allows learners to become aware of their own language as
well as their interlocutor’s language use in the output. Swain (1995) further noted that
output may stimulate learners to move from the semantic, open-ended, non-
deterministic, strategic processing prevalent in comprehension to the complete gram-
matical processing needed for accurate production. Output, thus, would seem to have a
potentially significant role in the development of syntax and morphology (p. 128).
Accordingly, output is now viewed as an essential part of the L2 learning mechanism, not
just as an opportunity to practice, or an outcome of practicing what has been learned, as was
231
Jaemyung Goo
traditionally understood. Stressing the significance of producing modified output, Mackey

(2012) claims that because “the process of modifying one’s output is as important as the
ultimate product” (p. 17), L2 learners still benefit from modified output even when it is non-
targetlike. Unsurprisingly, benefits of (modified) output appear in several studies (e.g.,
Loewen, 2005; McDonough, 2005).
Corrective Feedback (CF)

Among varied features of interaction, CF, considered to be effective at drawing learners’ at-
tention to linguistic code features, has generated significant empirical research with regard to its
potential effects on L2 development (see Brown, 2016; Li, 2010; Lyster & Saito, 2010; Lyster
et al., 2013; Mackey & Goo, 2007; Nassaji, 2016 for reviews). Lyster and Ranta (1997) identified
six different CF moves in their analysis of French immersion classroom data (i.e., recasts, me-
talinguistic feedback, elicitation, repetition, explicit correction, and clarification requests). These
CF moves are categorized as input-providing (e.g., recasts) or output-prompting (e.g., clar-
ification requests). CF can be further distinguished as explicit (e.g., explicit correction, meta-
linguistic feedback, and elicitation) or implicit (e.g., recasts, clarification requests, repetition).
Explicit CF clearly indicates that the learner’s utterance is ungrammatical (e.g., you said
X, but it’s ungrammatical, you shouldn’t include Y). The corrected version may or may not be
provided; explicit correction provides the corrected form(s), but metalinguistic feedback
often does not - it prompts learners to modify their earlier non-target-like utterances. Implicit
CF moves are of interest within the interaction approach due to their unobtrusive nature,
maintaining the flow of communication. As in the case of explicit CF, the corrected version
may or may not be contained in implicit CF moves. For instance, a clarification request does
not provide the corrected version, but requires learners to modify their original utterance.
Another form of implicit CF, but with the corrected version, is a recast, which comprises a
target-like reformulation of the learner’s non-target-like utterance.
The efficacy of recasts has been the focus of empirical research, especially in comparison
with other CF types (see Goo, 2020 for a review of recast studies). Recasts are presumed to
generate propitious conditions for L2 development during interaction because they are
provided immediately following the learner’s non-target-like utterance, the intended meaning
of which has been comprehended to a great extent, obviating the need to attend to meaning,
and therefore, allowing more attentional resources to the form targeted in recasts (e.g.,
Doughty, 2001; Long, 1996, 2007, 2015). Beneficial effects of recasts stem from their se-
mantic transparency and an immediate juxtaposition of the learner’s erroneous utterance and
a recast, both of which enable learners to make a cognitive comparison between the two
forms. As Long (2015) suggests, recasts offer “crucial points at which implicit and explicit
learning converge in optimal ways” (p. 55). Nevertheless, the effectiveness of recasts has been
questioned and tested empirically with overall findings indicating varying degrees of positive
potential for L2 development (see Goo, 2020).
Lyster and Ranta (1997) observed that recasts were the most frequently used CF move but
led to the least immediate response/reaction to CF or uptake. Later studies suggested that the
rate of uptake following recasts (and other CF moves) may differ depending on learning
contexts or instructional focus (e.g., Lyster & Mori, 2006; Sheen, 2004). Language-focused
learning contexts are much more likely to result in uptake following recasts compared to
meaning-oriented contexts. Also, uptake or repair preceded by recasts is influenced by
varying characteristics of recasts. In general, stressed, declarative, short, reduced recasts are
likely to elicit more uptake/repair (e.g., Loewen & Philp, 2006; Philp, 2003; Sheen, 2006). The
underlying assumption behind researchers’ attempts to explore learners’ responses to CF
232
moves is that learner noticing of corrective intent of CF is represented in the form of uptake
or repair, which may not necessarily be the case (see Bao et al., 2011; Yoshida, 2010 for
disparities between uptake and noticing).
Noticing
Given that noticing through selective attention during negotiation for meaning (Long, 1996), and
noticing in general (Schmidt, 1990, 2001), plays a pivotal role in L2 learning, noticing of CF
moves is a critical aspect of how CF precipitates L2 development. Learner noticing of CF de-
termines the effectiveness of CF in refining and restructuring L2 learners’ interlanguage (IL), that
is, noticing functions as “a potential mediator in the feedback-learning relationship” (Mackey,
2006, p. 426). Research findings suggest that noticing of CF is influenced by factors including
target type, CF type, characteristics of recasts, type of uptake, teaching contexts, and learner
beliefs about CF. Mackey et al.’s (2000) results, for instance, revealed that whereas the learners
were relatively more accurate in their perceptions of lexical and phonological feedback, their
perceptions of morphosyntactic feedback provided mostly in recasts were generally inaccurate,
which indicates recasts on morphosyntactic errors may sometimes go unnoticed. In a similar
vein, the level of noticing depends on the type of target; some structures are more amenable to
noticing than are others (e.g., Kartchava & Ammar, 2014a; Mackey, 2006).
Other issues have been associated with noticing (e.g., learner beliefs, modified output, and CF
type). Rassaei (2013) found that learner noticing of explicit correction occurred more often than
noticing of recasts. In Kartchava and Ammar’s (2014a) study, prompts and mixed CF (prompts
+ recasts) led to more noticing of corrective intent, compared to recasts. In addition, Kartchava
and Ammar (2014b) showed that learner noticing of recasts was significantly correlated with
learner beliefs about the effectiveness of recasts and CF in general, but no such correlation was
found for the noticing of prompts. The corrective intent of recasts is perceived as such when
recasts are short and contain only one or two changes (e.g., Egi, 2007; Philp, 2003). Also, the
production of modified output facilitates the level of noticing (e.g., Egi, 2010). Although noticing
and modified output production are not entirely isomorphic (e.g., Gass, 2003; Goo & Mackey,
2013; Long, 2007; Mackey & Philp, 1998), learner noticing may be mirrored in modified output
production (e.g., Egi, 2010; Lyster, et al., 2013). Nevertheless, noticing does not guarantee L2
development (e.g., Kartchava & Ammar, 2014a; Mackey, 2006).
L2 Learning Via CF
Numerous experimental and quasi-experimental studies have examined the CF-learning re-
lationship in interactional contexts and provide empirical evidence for its beneficial role in L2
development (see Nassaji, 2016 for a review). Regarding the relative efficacy of different CF
moves, research findings suggest that prompts or explicit CF moves may be more effective at
promoting L2 development than implicit CF moves such as recasts (Lyster et al., 2013). Goo and
Mackey (2013) discussed methodological issues that led them to question the validity of previous
CF research designs/methods and the reliability of findings (see Lyster & Ranta, 2013 for their
response). As Ellis (2015) notes, different CF moves, whether implicit or explicit, and whether
recasts or prompts, when combined effectively, are likely to function as critical interactional
devices to cause L2 learning. Recently, researchers have explored many factors that may mediate
the extent to which L2 learners benefit from interactional features (e.g., CF moves). These factors
include effects of CF via active participation in interaction (Yilmaz, 2016), feedback timing (Li,
233
Jaemyung Goo
Ellis, & Zhu, 2016), different types of recasts (Wacha & Liu, 2017), recasts versus scaffolded
feedback (Rassaei, 2014), and extensive versus intensive recasts (Nassaji, 2017).
Oral Synchronous Computer-Mediated Communication (SCMC)

Interaction research has expanded its scope by investigating interactional features and L2
development in SCMC contexts with early studies focusing on text-chats (see Ziegler &
Mackey, 2017 for an overview). However, with recent technological advancement, SCMC
research has steered its course towards video-based oral SCMC in relation to interactional
patterns and L2 learning (Akiyama, 2017; Akiyama & Saito, 2016; Loewen & Wolff, 2016;
Saito & Akiyama, 2017). Loewen and Wolff (2016) observed significantly more confirmation
checks and LREs in the face-to-face (FTF) and oral SCMC groups than in the written
SCMC condition. Akiyama (2017) found that as in most FTF contexts, recasts were the most
preferred CF in Sky-based eTandem interactions, and that more successful uptake occurred
when CF provided corresponded to learners’ preferred CF type. Learner beliefs about CF
appeared to change over time in favour of recasts.
The role of CF in actual learning has been investigated in video-based oral SCMC alone
or in comparison with FTF contexts. Results indicate positive effects of oral SCMC (e.g.,
Akiyama & Saito, 2016; Saito & Akiyama, 2017). Saito and Akiyama (2017) observed that
recasts led to more repairs than did negotiation strategies in video-based oral SCMC, and
that video-based SCMC resulted in improvements in comprehensibility, perceived fluency,
lexical variation, and grammatical accuracy. However, as evidenced in Akiyama and Saito’s
(2016) and Parlak and Ziegler’s (2017) studies, the extent of L2 learning via video-based oral
SCMC may vary depending on target areas and other related factors.
Task Complexity
Given that noticing plays a nontrivial role in L2 learning, and “task demands are a powerful
determinant of what is noticed” (Schmidt, 1990, p. 143), investigating tasks of different
complexity levels involving varying degrees of cognitive demands is of importance in inter-
action research (e.g., Gilabert et al., 2009; Kim, 2009, 2012; Kim et al., 2015; Kourtali &
Révész, 2020; Révész, 2009, 2011). Most interaction research on task complexity has been
conducted to test Robinson’s Cognition Hypothesis (2001, 2007, 2011). Robinson claims that
increasing task complexity along resource-directing dimensions (e.g., ±reasoning, ±few ele-
ments, ±here-and-now) “has the potential to connect cognitive resources, such as attention
and memory, with effort at conceptualization and the L2 means to express it” (2011, p. 14),
leading to greater accuracy and complexity of production. He also notes that increased
cognitive/conceptual demands of tasks likely trigger more interaction and negotiation for
meaning and furthermore, by directing learners’ attentional and memory resources to diverse
features of the L2 linguistic system, give rise to “more noticing of task relevant input, and
heightened memory for it, and so lead to more uptake of forms made salient in the input
through various focus on form interventions” (2007, p. 23). Overall findings are inconclusive,
with some studies reporting (partial) supportive evidence (e.g., Kim, 2012; Révész, 2011) but
others showing counterevidence or no evidence (e.g., Kim et al., 2015; Kourtali & Révész,
2020). Interaction researchers have also explored potential mediating variables (e.g. profi-
ciency, type of task dimensions, etc.). Révész (2011) found no mediating role of self-
confidence, anxiety, and self-perceived communicative competence in the impact of task
complexity on L2 learner performance. Cognitive abilities are another such variable and, in
fact, possible links between cognitive capacities and task complexity were evidenced in terms
234
of noticing and learning gains (e.g., Kim et al., 2015; Kourtali & Révész, 2020). Studies of
this kind are important to note because relevant findings delineate the scope of the impact of
task complexity on L2 learning. Further research is clearly warranted in this area.
Working Memory (WM)

WM refers to “a temporary storage system under attentional control that underpins our
capacity for complex thought” (Baddeley, 2007, p. 1) and has drawn much scholarly at-
tention from L2 researchers in terms of whether it influences L2 learning processes, and if so,
how (see Wen, 2016). Because of its relevance to selective attention and noticing, which are
assumed to affect L2 learning during negotiation for meaning, WM has a clear bearing on
interaction-driven L2 learning. Numerous empirical studies within the interactionist fra-
mework have investigated the extent to which WM mediates interaction effects (e.g., Goo,
2012; Kim et al., 2015; Li et al., 2019; Mackey et al., 2010; Mackey et al., 2002; Révész,
2012). Overall, research findings indicate that high-WM learners notice more CF than low-
WM learners (e.g., Kim et al., 2015; Mackey et al., 2002), produce more modified output
(e.g., Mackey et al., 2010), and benefit more from CF-based instruction (e.g., Goo, 2012; Li,
2013; Li et al., 2019; Mackey & Sachs, 2012; Révész, 2012; Yilmaz, 2013b). As for the re-
lationship between WM and implicit/explicit CF moves, results are mixed, with some studies
showing a link between implicit CF and WM but not between explicit CF and WM (Goo,
2012), and other studies showing the opposite (e.g., Li, 2013, Yilmaz, 2013b). Also, task
factors and instructional conditions have been considered in this line of research producing
mixed results (e.g., Ahmadian, 2012; Kim et al., 2015; Kormos & Trebits, 2011; Li et al.,
2019). Kormos and Trebits (2011) found that WM was related to the complexity of learner
output only in less complex tasks with no relationship observed between WM and learner
performance on more complex tasks. Kim et al. (2015), on the other hand, observed WM
variation in the performance of the complex task group only (the higher WM, the more
development). More recently, Li et al. (2019) compared five different instructional condi-
tions. WM was found to be predictive of learning gains in both grammaticality judgments
and elicited imitation measures under two within-task feedback conditions only. That is,
individual variation in WM mediates L2 learners’ online cognitive processing of feedback
provided during interaction, clearly signifying the association of WM and interaction effects.
Language Aptitude
Language aptitude refers to the ability to adapt to and benefit from instructed or naturalistic
exposure to the L2 (Robinson, 2013). Long (2015) emphasizes the essential role of language
aptitude in L2 learning, maintaining that “(given otherwise comparable abilities and learning
opportunities) one factor, sensitivity to input (not to negative input only), is the most likely
predictor of success and failure at the level of the individual” (p. 60). It is not a single/unitary
cognitive entity, but a composite of cognitive abilities that are critical in promoting L2
learning. Aptitude tests comprise several subtests measuring different components. Li’s
(2016) recent meta-analysis showed a correlation of .49 between language aptitude and L2
proficiency (k = 53), that is, approximately 25% of variance can be accounted for by lan-
guage aptitude. Accordingly, a growing number of cognitive-interactionist researchers have
investigated whether language aptitude mediates the extent of beneficial effects of interaction
on L2 learning. Relevant research has not offered a clear picture of how language aptitude
works in interactional contexts, showing less-than-consistent results depending on target type
(e.g., Granena & Yilmaz, 2019), feedback condition (e.g., Li, 2013; Yilmaz, 2013a; Yilmaz &
235
Jaemyung Goo
Granena, 2016), task type (e.g., Kourtali & Révész, 2020; Li et al., 2019), and dependent
variables (e.g., Kourtali & Révész, 2020; Li, 2013; Yilmaz & Granena, 2019). For example,
Li (2013) and Kim (2021) provided supporting evidence for a significant mediating role of
language analytic ability (LAA), meaning the ability to recognize grammatical patterns and
other linguistic entities in language samples and infer underlying rules, in the effectiveness of
recasts in L2 development. Li’s (2013) study, however, showed no significant correlation
between LAA and the effectiveness of explicit feedback (metalinguistic correction).
Somewhat differently, Yilmaz (2013a) revealed a significant mediating role of LAA measured
via the test, LLAMA F (Meara, 2005), in the efficacy of explicit feedback (i.e., explicit
correction) and no evidence of LAA playing a role in the case of recasts. Yilmaz’s results
further suggested that explicit correction worked better than recasts for learners with high
LAA, but not for those with low LAA. Yilmaz and Granena (2016, 2019; Granena &
Yilmaz, 2019) conducted a series of CF-aptitude studies to obtain a more detailed look at
how aptitude measured via LLAMA subtests functions. Findings suggest that LLAMA
subtests are differentially associated with implicit and explicit CF moves with target type and
dependent variables influencing the extent of such association (see Kourtali & Révész, 2020
on relationships among LLAMA subtests, task complexity, recasts, and dependent variable
measures). Regarding instructional conditions, Li et al. (2019) manipulated five learning
conditions. LAA was found to be associated with learning gains under two conditions (i.e.,
Task Only and Post-task Feedback) unrelated to WM, indicating that WM and LAA may
function differently under different instructional/learning conditions. Obviously, any con-
clusions are still premature. More research needs to be conducted to obtain a better un-
derstanding of the role of language aptitude and its relationship with other cognitive
variables (e.g., WM, attention control, etc.) as well.
Other Issues
Interaction effects, especially CF moves (and their effectiveness), have also been examined in
terms of developmental readiness with findings indicating that more proficient learners are
more likely to benefit from CF moves compared to less proficient learners (e.g., Ammar &
Spada, 2006; Mackey & Philp, 1998). Research findings suggest that advanced learners may
benefit from both recasts and prompts, whereas low-proficiency learners likely benefit more
from prompts than recasts (e.g., Ammar & Spada, 2006; H. Li, 2018; Li, 2014) although
details of this general observation may vary depending on other variables. One such variable
is the type of target feature, (e.g., Kim, 2021; Li, 2014; Mackey, 2006; van de Guchte et al.,
2015) with more salient target features being more susceptible to CF effects. In terms of
interaction between target type and CF type, research findings are inconclusive (Li, 2014).
In addition, some studies have investigated how form-focused instruction (FFI) optimizes
the effectiveness of CF moves (e.g., Lyster, 2004; Saito & Lyster, 2012). Findings suggest that
FFI may enhance the efficacy of CF moves with evidence of more benefits when combined
with prompts than recasts. Interlocutor types and characteristics also mediate interaction-
driven learning (Gurzynski-Weiss, 2017 for a review). For instance, peer interaction has been
found to be beneficial for L2 learning (e.g., Adams et al., 2011; Saito & Lyster, 2012).
Research findings have also revealed that instructors’ educational background and teaching
experiences are associated with their feedback use (e.g., Mackey et al., 2004; Gurzynski-
Weiss, 2016; Kartchava et al., 2020). Kartchava et al. (2020) reported a disparity between
pre-service teachers’ beliefs about CF and their in-class correction behaviours as reflected in
a smaller percentage of error correction in actual classroom practices. Nevertheless, their
preference for the same CF type (i.e., recasts) was observed with a much higher rate in actual
236
teaching practices. Learner beliefs, attitudes, and anxiety are also important to note because
they may influence the amount of modified output (e.g., repair), learner noticing, and L2
learning as a consequence (e.g., Akiyama, 2017; Kartchava & Ammar, 2014b; Lee, 2013;
Sheen, 2008). Lee’s (2013) advanced ESL learners, for example, chose explicit correction as
the most preferred CF move and perceived clarification requests as an anxiety-provoking CF
move that likely created emotional discomfort. It makes intuitive sense that, as indicated in
Akiyama (2017), CF moves that match leaners’ preferred types may possibly lead to more
favourable grounds for L2 learning, although not necessarily so. Attempts to investigate
variables such as cognitive style (e.g., Rassaei, 2015b), creativity (e.g., McDonough et al.,
2015) and gestures (e.g., Nakatsukasa, 2016, 2021), albeit few, have contributed to expanding
the spectrum of interaction research.
Nevertheless, empirical evidence is insufficient to offer unambiguous conclusions about
the specific functions of many of the variables described earlier. More research, including
replication studies, should be conducted to broaden our understanding of these already-
complicated cognitive phenomena involving a multitude of variables that may affect
interaction-driven L2 development.
5 Future Directions
The interaction approach, unparalleled in its impact on the L2 research community, has
attracted substantial scholarly attention to whether and how it contributes to L2 develop-
ment. Long’s (1996) revised Interaction Hypothesis
reoriented the interactionist tradition decisively in a cognitive direction, as far as

learning theory was concerned, directing attention towards learner attention and L2
processing capacity as mediating factors which would affect the availability of L2
input for intake and for acquisition (Mitchell et al., 2019, p. 234).
Empirical research has offered clear evidence for interaction effects, that is, L2 learners
benefit from negotiation for meaning because it affords L2 learners opportunities to receive
(modified) input and CF on erroneous output and produce modified output in response to
their interlocutors’ CF moves (see Gass & Mackey, 2015; Goo, 2019). Given that L2 learning
involves a constellation of complex phenomena encompassing many cognitive and affective
processes, it stands to reason that recent interaction research has focused on investigating
relevant variables in terms of their mediating role in the extent of interaction effects (e.g.,
WM, language aptitude, cognitive style, and anxiety). These variables have attracted much
attention, but still merit further research because we currently lack sufficient evidence for
generalizable statements. As Goo (2019) notes,
although more than thirty years of interaction research have no doubt revealed that
interaction precipitates L2 learning, we now have much more complicated issues to
deal with than when this idea of interaction first emerged, and far more questions
than answers when it comes to how it does so (p. 247).
Further Reading
Gass, S. M. (2018). Input, interaction, and the second language learner. New York, NY: Routledge.
A reprint of Gass’s (1997) classic text with the same title. It comprises the original 1997 text and a
newly-added preface that contains insights from Alison Mackey, Rod Ellis, and Mike Long. The main
237
Jaemyung Goo
text, albeit not updated, deals with such fundamental issues as the nature of input, attention and
awareness, output, and the role of interaction.
Ellis, R., Skehan, P., Li, S., Shintani, N., & Lambert, C. (2020). Task-based language teaching: Theory
and practice. Cambridge, UK: Cambridge University Press.
A crucial overview of the current state of affairs in task-based language teaching (TBLT), illustrating
important theoretical perspectives that underpin TBLT. It also discusses pedagogic and research per-
spectives on TBLT and various nontrivial issues that both researchers and practitioners should take
into consideration.
Nassaji, H., & Kartchava, E. (Eds.) (2021). The Cambridge handbook of corrective feedback in second
language learning and teaching. Cambridge, UK: Cambridge University Press.
An insightful analysis and discussion of cutting-edge research on the role of CF in diverse dimensions of
L2 development. Also discussed are a wide range of methodological and pedagogical issues considered
to be crucial for both research and teaching communities.
References
Adams, R., Nuevo, A.-M., & Egi, T. (2011). Explicit and implicit feedback, modified output, and SLA:
Does explicit and implicit feedback promote learning and learner-learner interactions? Modern
Language Journal, 95(Supplement), 42–63.
Ahmadian, M. J. (2012). The relationship between working memory capacity and L2 oral performance
under task-based careful online planning condition. TESOL Quarterly, 46, 165–175.
Akiyama, Y. (2017). Learner beliefs and corrective feedback in telecollaboration: A longitudinal in-
vestigation. System, 64, 58–73.
Akiyama, Y., & Saito, K. (2016). Development of comprehensibility and its linguistic correlates: A
longitudinal study of video-mediated telecollaboration. Modern Language Journal, 100, 585–609.
Ammar, A., & Spada, N. (2006). One size fits all? Recasts, prompts, and L2 learning. Studies in Second
Baddeley, A. D. (2007). Working memory, thought, and action. Oxford: Oxford University Press.
Bao, M., Egi, T., & Han, Y. (2011). Classroom study on noticing and recast features Capturing learner
noticing with uptake and stimulated recall. System, 39, 215–228.
Brown, D. (2016). The type and linguistic foci of oral corrective feedback in the L2 classroom: A meta-
analysis. Language Teaching Research, 20, 436–458.
Doughty, C. (2001). Cognitive underpinnings of focus on form. In P. Robinson (Ed.), Cognition and
second language instruction (pp. 206–257). Cambridge, UK: Cambridge University Press.
Egi, T. (2007). Interpreting recasts as linguistic evidence: The roles of linguistic target, length, and
degree of change. Studies in Second Language Acquisition, 29, 511–537.
Egi, T. (2010). Uptake, modified output, and learner perceptions of recasts: Learner responses as
language awareness. Modern Language Journal, 94, 1–21.
Ellis, R. (2015). Understanding second language acquisition (2nd edn). Oxford: Oxford University Press.
Gass, S. M. (1997). Input, interaction, and the second langauge learner. Mahwah, NJ: Lawrence
Erlbaum.
Gass, S. M. (2003). Input and interaction. In C. J. Doughty & M. H. Long (Eds.), The handbook of
second language acquisition (pp. 224–255). Oxford: Blackwell.
Gass, S. M., & Mackey, A. (2015). Input, interaction, and output in second language acquisition. In B.
VanPatten & J. Williams (Eds.), Theories in second language acquisition (pp. 180–206). New York,
NY: Routledge.
Goo, J. (2012). Corrective feedback and working memory capacity in interaction-driven L2 learning.
Studies in Second Language Acquisition, 34, 445–474.
Goo, J. (2019). Interaction in L2 learning. In J. W. Schwieter & A. Benati (Eds.), The Cambridge
handbook of language learning (pp. 233–257). Cambridge: Cambridge University Press.
Goo, J. (2020). Research on the role of recasts in L2 learning. Language Teaching, 53, 289–315.
Goo, J., & Mackey, A. (2013). The case against the case against recasts. Studies in Second Language
Gilabert, R., Barón, G., & Llanes, A. (2009). Manipulating cognitive complexity across task types
and its impact on learners’ interaction during oral performance. IRAL, 47, 367–395.
Granena, G. & Yilmaz, Y. (2019). Corrective feedback and the role of implicit sequence-learning ability
in L2 online performance. Language Learning, 69(S1), 127–156.
238
Gurzynski-Weiss, L. (2016). Factors influencing Spanish instructors’ in-class feedback decisions.

Gurzynski-Weiss, L. (2017). L2 instructor individual characteristics. In S. Loewen & M. Sato (Eds.),
The Routledge handbook of instructed second language acquisition (pp. 451–467). New York, NY:
Routledge.
Kartchava, E. & Ammar, A. (2014a). The noticeability and effectiveness of corrective feedback in
relation to target type. Language Teaching Research, 18(4), 428–452.
Kartchava, E. & Ammar, A. (2014b). Learners’ beliefs as mediators of what is noticed and learned in
the language classroom. TESOL Quarterly, 48(1), 86–109.
Kartchava, E., Gatbonton, E., Ammar, A., & Trofimovich, P. (2020). Oral corrective feedback: Pre-
service English as a second language teachers’ beliefs and practices. Language Teaching Research,
24(2), 220–249.
Kim, J. H.(2021). The relative effect of recasts on L2 Korean learners’ accuracy development of two
different forms and its relationship with language analytic ability. Language Teaching Research,
25(3), 451–475.
Kim, Y. (2009). The effects of task complexity on learner-learner interaction. System, 37, 254–268.
Kim, Y. (2012). Task complexity, learning opportunities and Korean EFL learners’ question devel-
opment. Studies in Second Language Acquisition, 34, 627–658.
Kim, Y., Payant, C., & Pearson, P. (2015). The intersection of task-based interaction, task complexity,
and working memory: L2 question development through recasts in a laboratory setting. Studies in
Kormos, J., & Trebits, A. (2011). Working memory capacity and narrative task performance. In P.
Robinson (Ed.), Second language task complexity: Researching the cognition hypothesis of language
learning and performance (pp. 267–285). Amsterdam: John Benjamins.
Krashen, S. (1982). Principles and practice in second language acquisition. London: Pergamon.
Kourtali, N.-E. & Révész, A. (2020). The roles of recasts, task complexity, and aptitude in child second
language development. Language Learning, 70(1), 179–218.
Lee, E. J. (2013). Corrective feedback preferences and learner repair among advanced ESL students.
System, 41, 217–230.
Li, H. (2018). Recasts and output-only prompts, individual learner factors and short-term EFL
learning. System, 76, 103–115.
Li, S. (2010). The effectiveness of corrective feedback in SLA: A meta-analysis. Language Learning, 60,
309–365.
Li, S. (2013). The interactions between the effects of implicit and explicit feedback and individual
differences in language analytic ability and working memory. Modern Language Journal, 97,
634–654.
Li, S. (2014). The interface between feedback type, L2 proficiency, and the nature of the linguistic
target. Language Teaching Research, 18(3), 373–396.
Li, S. (2016). The construct validity of language aptitude: A meta-analysis. Studies in Second Language
Li, S., Ellis, R., & Zhu, Y. (2016). The effects of the timing of corrective feedback on the acquisition of a
new linguistic structure. Modern Language Journal, 100, 276–295.
Li, S., Ellis, R., & Zhu, Y. (2019). The associations between cognitive ability and L2 development under
five different instructional conditions. Applied Psycholinguistics, 40, 693–722.
Loewen, S. (2005). Incidental focus on form and second language learning. Studies in Second Language
Loewen, S., & Philp, J. (2006). Recasts in the adult English L2 classroom: Characteristics, explicitness,
and effectiveness. Modern Language Journal, 90, 536–556.
Loewen, S., & Sato, M. (2018). Interaction and instructed second language acquisition. Language
Teaching, 51(3), 285–329.
Loewen, S., & Wolff, D. (2016). Peer interaction in F2F and CMC contexts. In M. Sato & S. Ballinger
(Eds.), Peer interaction and second language learning: Pedagogical potential and research agenda
(pp. 163–184). Amsterdam: John Benjamins.
Long, M. H. (1981). Input, interaction, and second language acquisition. Annals of the New York
Academy of Sciences. 379, 259–278.
Long, M. H. (1985). Input and second-language acquisition theory. In S. M. Gass (Ed.), Input in Second
Language Acquisition.Rowley, Mass.: Newbury House.
Long, M. H. (1996). The role of the linguistic environment in second language acquisition. In W. C.
239
Jaemyung Goo
Ritchie & T. K. Bhatia (Eds.), Handbook of second language acquisition (pp. 413–468). New York,
NY: Academic Press.
Long, M. H. (2007). Problems in SLA. Mahwah, NJ: Lawrence Erlbaum Associates.
Long, M. H. (2015). Second language acquisition and task-based language teaching. Malden, MA: Wiley-
Blackwell.
Loschky, L. (1994). Comprehensible input and second language acquisition: What is the relationship?
Studies in Second Language Acquisition, 16(3), 305–325.
Lyster, R. (2004). Differential effects of prompts and recasts in form-focused instruction. Studies in
Lyster, R., & Mori, H. (2006). Interactional feedback and instructional counterbalance. Studies in
Lyster, R., & Ranta, L. (1997). Corrective feedback and learner uptake: Negotiation of form in
communicative classrooms. Studies in Second Language Acquisition, 19, 37–66.
Lyster, R., & Ranta, L. (2013). Counterpoint piece: The case for variety in corrective feedback research.
Lyster, R., & Saito, K. (2010). Oral feedback in classroom SLA: A meta-analysis. Studies in Second
Lyster, R., Saito, K., & Sato, M. (2013). Oral corrective feedback in second language classrooms.
Language Teaching, 46, 1–40.
Mackey, A. (1999). Input, interaction, and second language development: An empirical study of
question formation in ESL. Studies in Second Language Acquisition, 21, 557–587.
Mackey, A. (2006). Feedback, noticing and instructed second language learning. Applied Linguistics,
27, 405–430.
Mackey, A. (2012). Input, interaction and corrective feedback in L2 classrooms. Oxford, UK: Oxford
University Press.
Mackey, A., Abbuhl, R., & Gass, S. (2012). Interactionist approach. In S. Gass & A. Mackey (Eds.),
The Routledge handbook of second language acquisition (pp. 7–24). New York, NY: Routledge.
Mackey, A., Adams, R., Stafford, C., & Winke, P. (2010). Exploring the relationship between modified
output and working memory capacity. Language Learning, 60, 501–533.
Mackey, A., Philp, J., Egi, T., Fujii, A., & Tatsumi, T. (2002). Individual differences in working
memory, noticing of interactional feedback and L2 development. In P. Robinson (Ed.), Individual
differences and instructed language learning (pp. 181–209). Amsterdam: John Benjamins.
Mackey, A., Gass, S. M., & McDonough, K. (2000). How do learners perceive interactional feedback?
Mackey, A., & Goo, J. (2007). Interaction research in SLA: A meta-analysis and research synthesis. In
A. Mackey (Ed.), Conversational interaction in second language acquisition: A collection of empirical
studies (pp. 407–452). Oxford: Oxford University Press.
Mackey, A. & Philp, J. (1998). Conversational interaction and second language development: Recasts,
responses, and red herrings? Modern Language Journal, 82, 338–356.
Mackey, A., Polio, C., & McDonough, K. (2004). The relationship between experience, education and
teachers’ use of incidental focus-on-form techniques. Language Teaching Research, 8, 301–327.
Mackey, A., & Sachs, R. (2012). Older learners in SLA research: A first look at working memory,
feedback, and L2 development. Language Learning, 62, 704–740.
McDonough, K. (2005). Identifying the impact of negative feedback and learners’ responses on ESL
question development. Studies in Second Language Acquisition, 27(1), 79–103.
McDonough, K., Crawford, W. J., & Mackey, A. (2015). Creativity and EFL students’ language use
during a group problem-solving task. TESOL Quarterly, 49, 188–199.
Meara, P. (2005). LLAMA language aptitude tests. Swansea, UK: Lognostics.
Mitchell, R., Myles, F., & Marsden, E. (2019). Second language learning theories (4th edn). New York,
NY: Routledge.
Nakatsukasa, K. (2016). Efficacy of recasts and gestures on the acquisition of locative prepositions.
Nakatsukasa, K. (2021). Gesture-enhanced recasts have limited effects: A case of the regular past tense.
Language Teaching Research, 25(4), pp. 587–612.
Nassaji, H. (2016). Interactional feedback in second language teaching and learning: A synthesis and
analysis of current research. Language Teaching Research, 20, 535–562.
Nassaji, H. (2017). The effectiveness of extensive versus intensive recasts for learning L2 grammar. The
240
Parlak, Ö., & Ziegler, N. (2017). The impact of recasts on the development of primary stress in a
synchronous computer-mediated environment. Studies in Second Language Acquisition, 39, 257–285.
Philp, J. (2003). Constraints on noticing the gap: Nonnative speakers’ noticing of recasts in NS-NNS
interaction. Studies in Second Language Acquisition, 25, 99–126.
Pica, T. (1994). Research on negotiation: What does it reveal about second language learning condi-
tions, processes, and outcomes? Language Learning, 44, 493–527.
Pica, T. (1996). Do second language learners need negotiation? International Review of Applied
Linguistics in Language Teaching, 34, 1–21.
Rassaei, E. (2013). Corrective feedback, learners’ perceptions, and second language development.System,
41(2), 472–483.
Rassaei, E. (2014). Scaffolded feedback, recasts, and L2 development: A sociocultural perspective.The
Rassaei, E. (2015a). Recasts, field dependence/independence cognitive style, and L2 development.
Language Teaching Research, 19, 499–518.
Rassaei, E. (2015b). Oral corrective feedback, foreign language anxiety and L2 development. System,
49, 98–109.
Révész, A. (2009). Task complexity, focus on form, and second language development. Studies in
Révész, A. (2011). Task complexity, focus on L2 constructions, and individual differences: A
classroom-based study. Modern Language Journal, 95(Supplement), 162–181.
Révész, A. (2012). Working memory and the observed effectiveness of recasts on different L2 outcome
measures. Language Learning, 62, 93–132.
Robinson, P. (2001). Task complexity, cognitive resources, and syllabus design: A triadic framework
for examining task influences on SLA. In P. Robinson (Ed.), Cognition and second language in-
struction (pp. 287–318). Cambridge, UK: Cambridge University Press.
Robinson, P. (2007). Task complexity, theory of mind, and intentional reasoning: Effects on L2 speech
production, interaction, uptake and perceptions of task difficulty. Interactional Review of Applied
Robinson, P. (2011). Second language task complexity, the Cognition Hypothesis, language learning,
and performance. In P. Robinson (Ed.), Second language task complexity: Researching the cognition
hypothesis of language learning and performance (pp. 203–235). Amsterdam: John Benjamins.
Robinson, P. (2013). Aptitude in second language acquisition. In C. A. Chapelle (Ed.), Encyclopedia of
applied linguistics: Language learning and teaching (pp. 129–133). Oxford: Wiley-Blackwell.
Saito, K., & Akiyama, Y. (2017). Video-based interaction, negotiation for comprehensibility, and
second language speech learning: A longitudinal study. Language Learning, 67(1), 43–74.
Saito, K., & Lyster, R. (2012). Effects of form-focused instruction and corrective feedback on L2 pro-
nunciation development of /ɹ/ by Japanese learners of English. Language Learning, 62(2), 595–633.
Schmidt, R. (1990). The role of consciousness in second language learning. Applied Linguistics, 11, 129–158.
Schmidt, R. (2001). Attention. In P. Robinson (Ed.), Cognition and second language instruction
(pp. 3–32). Cambridge: Cambridge University Press.
Sheen, Y. (2004). Corrective feedback and learner uptake in communicative classrooms across in-
structional settings. Language Teaching Research, 8, 263–300.
Sheen, Y. (2006). Exploring the relationship between characteristics of recasts and learner uptake.
Language Teaching Research, 10, 361–392.
Sheen, Y. (2008). Recast, language anxiety, modified output and L2 learning. Language Learning, 58,
835–874.
Swain, M. (1985). Communicative competence: Some roles of comprehensible input and comprehensible
output in its development. In S. Gass & C. Madden (Eds.), Input in second language
acquisition.Rowley, MA: Newbury House.
Swain, M. (1995). Three functions of output in second language learning. In G. Cook & B. Seidelhofer
(Eds.), Principle and practice in Applied Linguistics: Studies in honor of H.G. Widdowson
(pp. 125–144). Oxford: Oxford University Press.
Swain, M. (2005). The output hypothesis: Theory and research. In E. Hinkel (Ed.), Handbook on re-
search in second language teaching and learning (pp. 471–484). Mahwah, NJ: Lawrence Erlbaum.
van de Guchte, M., Braaksma, M., Rijlaarsdam, G., & Bimmel, P. (2015). Learning new grammatical
structures in task-based language learning: The effects of recasts and prompts. Modern Language
Journal, 99, 246–262.
241
Jaemyung Goo
Wacha, R. C., & Liu, Y.-T. (2017). Testing the efficacy of two new variants of recasts with standard
recasts in communicative conversational settings: An exploratory longitudinal study. Language
Teaching Research, 21(2), 189–216.
Wen, Z. (2016). Working memory and second language learning: Towards an integrated approach.
Bristol, UK: Multilingual Matters.
Wen, Z., Biedron, A., & Skehan, P. (2017). Foreign language aptitude theory: Yesterday, today and
tomorrow. Language Teaching, 50, 1–31.
Yilmaz, Y. (2013a). The relative effectiveness of mixed, explicit and implicit feedback in the acquisition
of English articles. System, 41, 691–705.
Yilmaz, Y. (2013b). Relative effects of explicit and implicit feedback: The role of working memory
capacity and language analytic ability. Applied Linguistics, 34, 344–368.
Yilmaz, Y. (2016). The role of exposure condition in the effectiveness of explicit correction. Studies in
Second Language Acquisition, 38, 65–96
Yilmaz, Y., & Granena, G. (2016). The role of cognitive aptitudes for explicit language learning in the
relative effects of explicit and implicit feedback. Bilingualism: Language and Cognition, 19, 147–161.
Yilmaz, Y., & Granena, G. (2019). Cognitive individual differences as predictors of improvement and
awareness under implicit and explicit feedback conditions. Modern Language Journal, 103(3),
686–702.
Yoshida, R. (2010). How do teachers and learners perceive corrective feedback in the Japanese lan-
guage classroom? Modern Language Journal, 94, 293–314.
Ziegler, N., & Mackey, A. (2017). Interactional feedback in synchronous computer-mediated com-
munication: A review of the state of the art. In H. Nassaji & E. Kartchava (Eds.), Corrective
feedback in second language teaching and learning: Research, theory, applications, implications
(pp. 80–94). New York, NY: Routledge.
242
17
PRAGMATICS: SPEAKING AS A
PRAGMALINGUISTIC RESOURCE
Readers may have heard or perhaps can imagine an exchange such as the following in which
a parent intervenes when one sibling has somehow impinged on another:
PARENT: Apologize to your sister/brother.

CHILD: Sorry. [Sullen,monotonic delivery]
PARENT: Now, say it like you mean it.
CHILD: I’m sorry!
This chapter examines the intersection of research on the acquisition of pragmatics (the
apology part of this example) and characteristics of speaking and what they contribute to
pragmatics (the say-it-like-you-mean-it part of the example).
What Is Pragmatics?
Pragmatics is the study of how-to-say-what-to-whom-when (Taguchi, 2019, adds “what not
to say,” p. 1; see also Ishihara & Cohen, 2014, pp. 3–4). If that captures the sense of
pragmatics, then L2 pragmatics is the study of how learners come to know how-to-say-what-
to-whom-when (and what not to say!) in their second language. Although this is fairly ac-
curate in spirit, it lacks the detail required for research. Following Kasper & Rose (2002), I
adopt Crystal’s (1997) definition of pragmatics as “the study of language from the point of
view of users, especially of the choices they make, the constraints they encounter in using
language in social interaction and the effects their use of language has on other participants in
the act of communication” (p. 301; italics added by Kasper & Rose, 2002, p. 2).
The focus on interaction is echoed in two handbooks on pragmatics: “Pragmatics studies
the connection between a linguistic form and a context, where that form is used, and how this
connection is perceived and realized in a social interaction” (Taguchi, 2019, p. 1) and
It [pragmatics] is a broad area of investigation, addressing the ways in which

messages are communicated and interpreted between interlocutors through any
number of means, including face-to-face interactions, technologically mediated
DOI: 10.4324/9781003022497-21 243

communication, written texts, and paralinguistic (non-verbal and prosodic) modes

of conveying meaning (Koike & Félix-Brasdefer, 2021, p. 1).
In these definitions, both interaction and mode open the way for the investigation of how
speaking contributes to and/or influences the intended message (or illocutionary force) of an
utterance and the perceived intention of the message (or perlocutionary effect). However, in
mainstream L2 pragmatics, even though the focus has been on pragmatics for conversation
(Bardovi-Harlig, 2012, 2015, 2018; Ishihara & Cohen, 2014), characteristics of speaking have
not been included as part of standard analyses.
What Is Speaking?
Goh and Burns (2012; Burns, 2013) propose a three-component model of speaking com-
petence which includes knowledge of language and discourse, core speaking skills, and
communication strategies. The first two components are particularly relevant to prag-
matics. Knowledge of language and discourse includes the L2 sound system (including
intelligibility at segmental and suprasegmental levels), lexis, and morphosyntax (see
Bardovi-Harlig, 1999, for a discussion of the relation of grammar and pragmatics in L2
acquisition) and “understanding how stretches of connected speech (discourse, genre) are
organized, so that they are socially and pragmatically appropriate” (Burns, p. 167). The
second component of the model, core speaking skills, includes “the ability to process
speech quickly to increase fluency (e.g., speech rate, chunking, pausing, formulaic lan-
guage, discourse markers),” negotiating speech (including responding to the utterances of
others, checking comprehension, giving feedback, and repair), and “managing the flow of
speech as it unfolds (e.g., initiating topics, turn-taking, signaling intentions, opening/
closing conversations)” (Burns, p. 167).
Pragmatics and Speaking

Some of the features relevant to both pragmatics and speaking – turn-taking, topic initiation,
back-channelling, overlapping speech, hesitation, and repair – have been investigated ex-
tensively by Conversation Analysis (CA), but less so in L2 pragmatics. Less frequently re-
searched, especially in L2 pragmatics, are suprasegmentals (prosody), speech rate, chunking,
pausing, and hesitations. Yet, given the definitions of pragmatics, it is reasonable to consider
these as “form” in the search to understand “the connection between a linguistic form and a
context, where that form is used, and how this connection is perceived and realized in a social
interaction” (Taguchi, 2019, p. 1). Ishihara and Cohen (2014) position pragmatic ability
(both pragmatic knowledge and the ability to use it) as encompassing speaking and listening
as well as reading and writing. They observe,
As listeners we need to interpret what is said, as well as what is not said, and what
may be communicated non-verbally. These verbal and non-verbal cues transmit to
us just how polite, direct, or formal the communication is and what the intent is
(e.g., to be kind, loving, attentive, or devious, provocative, or hostile) (pp. 3–4).
House (1996) uses the term pragmatic fluency for the intersection of pragmatics and speaking
(pp. 228–229) and Kang, Kermad and Taguchi (2021) employ the term pragma-prosodic to
describe prosody used in the service of pragmatics (akin to the term pragmalinguistic which
refers to resources available to realize sociopragmatics). Yates (2017) identifies both
244
Pragmatics
pragmatics and delivery (encompassing all of pronunciation) as contributing to the im-

pression that speakers make on their interlocutors, and thus as essential to impression
management for speakers in their professional and personal lives.
The following example further illustrates the intersection of pragmatics and speaking: The
hearer in the exchange reported the perlocutionary effect of the speech characteristics. This
exchange was reconstructed in a questionnaire eliciting reports of disinvitations that speakers
had received. The hearer’s commentary is in square brackets; “S” is speaker and “H” is
hearer (the reporter).
1. Reconstructed, face-to-face, oral disinvitation (Bardovi-Harlig, 2015)

[This interaction was just a week or two ago, so I remember quite a bit of the con-
versation. First, she approached me in person as we were walking across campus from our
last class and introduced the topic of the conversation. It went something like]
S: You know how I’ve been talking about how big my wife’s party is that I’ve been
planning?
[This friend of mine had been describing the ups and downs of her party planning process
for a couple of months at that point.]
H: “Yeah?”
S: “Well, you know how I’ve been saying that it’s all getting too big way too fast?”
[At this point, I could kind of predict what she was trying to tell me from the tone of her
voice. I perceived that she sounded a bit embarrassed and nervous.]
H: nodded “uh huh”
S: Well, I might have to cut down the number of people and I was wondering if you’d be
really offended if we only had family and friends that both of us [her and her wife]
know? I mean, I wouldn’t even ask if I didn’t feel like you’d understand since you’ve
heard me complaining about it for so long?”
[She definitely sped up her speech and either her pitch went up or she got louder. I can’t
exactly remember, but I could definitely feel that she was uncomfortable.]
H: ((laugh)) “No, it’s cool. I may not even be here in the summer anyway.”
[I was trying to make her feel better and not be so embarrassed.] (p. 99)
In this example, the hearer links tone to the anticipated illocutionary force and to the
speaker’s affect: “she sounded a bit embarrassed and nervous.” She interprets increased
speech rate and pitch or loudness as indicators of the speaker’s discomfort at performing the
disinvitation.
Three comprehensive reviews of pragmatics and prosody have recently appeared
(Escandell-Vidal & Prieto, 2021, on Spanish pragmatics and prosody; Hirschberg, 2017, on
pragmatics and prosody; Kang & Kermad, 2019, on L2 pragmatics and prosody).
Although these reviews indicate interest, Kang and Kermad also report a general paucity
of L2 studies. I will not attempt to replicate the coverage of prosody undertaken by these
reviews. Instead, I will explore the intersection of pragmatics and speaking and consider
what pragmatics research might look like with more sustained interest in the characteristics
of speaking; that is, what would pragmatics research discover if we regularly considered
speaking as the addressee of the disinvitation did?
Here, I consider the history of L2 pragmatics research related to speaking. (For in-
formation on how interlanguage pragmatics and L2 pragmatics research relate to SLA
245
generally, see Bardovi-Harlig, 2012; Taguchi, 2019.) This part situates spoken tasks among
the tasks used to collect data in L2 pragmatics research and demonstrates that although L2
pragmatics research views characteristics of speaking as contributing to illocutionary force
and perlocutionary effect, such characteristics have not been investigated systematically.
Means of Data Collection

The earliest research on pragmatics in the tradition of Austin (1962), Searle (1969), and Grice
(1975) was introspective. Like mainstream linguistics at the time, philosophy of language
used invented examples. Speech Act theory did not provide a template for empirical in-
vestigations of language use. Early cross-linguistic research culminating in the Cross-
Cultural Speech Act Realization Project (Blum-Kulka et al., 1989) operationalized empirical
investigation of speech acts through the use of a written discourse completion task (DCT),
which provided a dominant template for early interlanguage pragmatics research
(Example 2). In written DCTs, respondents provide a written answer in various formats,
including open DCTs with a single turn, dialogue completion tasks with one or more turns
indicated (either before or after the turns the participants are asked to provide), and free
DCTs in which participants write both sides of an exchange in response to the prompt
provided.
(2). Dialogue Completion with rejoinder (Blum-Kulka et al., 1989, p. 14).
At the university
Ann missed a lecture yesterday and would like to borrow Judith’s notes.
Ann: ____________________________________________
Judith: Sure, but let me have them back before the lecture next week.
Three other major data types used to study pragmatics production include oral DCTs, role-
plays, and conversation. What elicitation tasks have in common is a scenario that describes
the setting in which talk is imagined to take place, the relevant characteristics of the speakers
involved, their relationship, the event that precipitates the speech, and the goal. In oral
DCTs, speakers provide a single spoken turn; the respondent may initiate a turn or reply to a
spoken turn as in (3) from Bardovi-Harlig (2009, p. 795).
(3). You give your classmate a ride home. He lives in the building next to yours. He gets out
of the car and says,
(audio only): “Thanks for the ride.”

You say:
In role-plays, speakers interact either with a researcher, research assistant, or fellow learner
to negotiate over several turns the goal stated by the scenario. Conversation may be elicited
in interviews, information gap activities, problem solving, and peer feedback tasks, or
spontaneously in institutional talk, service encounters, and conversation. Many of these are
treated as separate categories of talk, but I will discuss them as one large category, identi-
fying the type of talk investigated by individual studies.
In a study using a written DCT to explore regional variation in pragmatics, Schneider
(2011) suggested that written representation of pragmatics and speaking in dramatic scripts –
in the absence of speech characteristics – nevertheless, displays “most essential features of
246
Pragmatics
dialogue” (p. 17). However, the use of written production tasks precludes the study of
speaking in L2 pragmatics. Similarly, the presentation of written language samples in jud-
gement and interpretation tasks excludes phonetic and prosodic information from the
utterances to be judged.
Two large-scale reviews (Bardovi-Harlig, 2010, 1979–2008; Nyugen, 2019, 1979–2017)
show that oral data are in the majority by task. Moreover, production tasks greatly outweigh
non-production tasks in L2 pragmatics. Bardovi-Harlig reported that of 152 studies, 129
(70%) included a production task exclusively, only 23 (or 15%) exclusively used a non-
production task (judgement or interpretation), and 22 (or 15%) used both. Taken together,
63% of the tasks are oral/aural (production tasks weigh in at 69%, a higher orality rate than
non-production tasks at 35% or mixed studies at 59%). Nyugen (2019) similarly reported that
217 (88%) of the 246 studies reviewed were production studies. Of those, at least 64% were
oral (there could be more because oral DCTs are in a mixed oral–written DCT category).
Thus, we cannot attribute the lack of integration of speaking into pragmatics to a lack of oral
data or the dominance of written data. However, we might consider that the analyses es-
tablished during the dominance of large-scale written studies has influenced current analyses.
Analyzing oral production data without explicit reference to characteristics of speech relies
on the same “essential” features (as Schneider 2011 described written dialogue without
speaking) and misses whatever information speaking provides.
The next part reviews L2 pragmatic studies that include speaking as part of their results
and lay the groundwork for more systematic investigation. The remainder of this discussion
will be limited to the study of speech acts, the dominant approach to L2 pragmatics research
(65% of the 152 studies reviewed by Bardovi-Harlig, 2010, explored speech acts).
Speaking in L2 Pragmatics Research

Here, I consider studies on L2 pragmatics that made observations about speaking, but did
not include speaking in the research questions or systematically code learner data for
characteristics of speech.
Intonation can function as an illocutionary force indicating device or IFID (Couper-
Kuhlen, 1986). One of the first studies to document the importance of intonation in prag-
matics was Gumperz (1982). He described a case in which customers in a British cafeteria
complained that South Asian food service workers were rude. Audio-recordings revealed
that intonation was at the heart of the complaint. When customers ordered meat, they were
asked if they wanted gravy. British employees used a rising intonation “Gravy↗,” whereas
the South Asian employees used a falling intonation “Gravy↘.” The British customers heard
the rising intonation as an offer “Would you like gravy?” and the falling intonation as a
statement such as “This is gravy” (take it or leave it) (Gumperz, 1982).
Tateyama (2001) also illustrates the difference between what is said and how it is said.
Tateyama (2001) used raters to determine whether instruction facilitated learners’ use of
pragmatic routines, one type of formulaic language (see Vu & Peters, this volume). Although
some accounts expect formulaic language to be “phonologically coherent – that is, fluently
articulated, nonhesitant” (Myles et al., 1998, p. 325), not all felicitous deliveries are non-
hesitant. Tateyama’s raters noted that in some role-plays learners were not hesitant enough
to convey the sincerity of the speech act. One rater remarked about a very polite permission-
requesting expression that the learner “should have said it in a more hesitant manner”
(p. 214). Rushed delivery or “wrong intonation” were also cited as reasons for assigning low
scores. One rater called his delivery “abrupt” and another “remarked that the manner in
which [the learner] said it did not sound as if he were actually apologizing” (p. 214). House
247
also reported that “one of the raters commented, routines realized in the opening phase
appeared to be rattled off quickly and … unfeelingly, so as to get them over with in a rather
artificial fashion” (House, 1996, p. 239). Likewise, Bardovi-Harlig (2013) noted that some
utterances produced in response to an oral DCT were “flat” or “monotonic.” Learners ex-
hibited both overly rapid and slowed delivery of condolence formulas in contrast to native
speaker (NS) production (Bardovi-Harlig, 2013). They also exhibited pauses in a range of
conventional expressions, suggesting that they had not fully mastered the formulaic se-
quences. Word stress also plays a role in pragmatics. A study of academic advising sessions
reported that the hedge “I thínk” becomes an aggravator when produced with a stressed
pronoun, as in “Í think” (Bardovi-Harlig & Hartford, 1996). Rather than softening the
proposition expressed, the stress on the pronoun appears to contrast the student’s opinion
with her advisor’s.
House (1996) reported that the advanced German university EFL speakers in her study
were “so advanced that their productions are never characterized by markedly slow speech or
irritatingly long (unfilled) pauses that necessitate excessive repairing or other overt, para-
linguistic signs of disfluency (pp. 244–245).” Nevertheless, even those learners differed in
initiating and responding utterances. Preliminary results showed that when responding,
“learners slow down more markedly, pauses are longer, and repairs are frequent,” an ob-
servation that warrants more investigation.
Shively’s (2018) examination of the development of humour in second language during
study abroad reveals an important intersection of speaking and pragmatics. Learners of
Spanish frequently used humour with host families and native-speaker peers, and most of
these humour tokens (94%) were accompanied by a change in a characteristic of speaking.
These occurred “with laughter and/or prosodic markers such as smile voice, exaggerated
intonation, singsong intonation, increases or decreases in volume and pitch, sound length-
ening, and slowing down or speeding up the rate of speech” (p. 85). Deadpan humour,
tokens that occurred without such prosodic cues, was much more likely to go unrecognized
by interlocutors.
In contrast to pragmatically problematic deliveries, Bardovi-Harlig (2013) reported the
successful intonation of a single expression “you too!” in response to “have a nice day!” Both
are frequently delivered with singsong intonation in the Midwest community where the
learners resided, and this was captured on the aural prompt to which learners responded.
Several learners responded with a sing-song production of “you too!” This was noticeable
given the frequent monotonic production by learners (cf. Kang et al., 2021; Pickering, 2001).
Finally, Taguchi (2007) asked whether the type of social situation differentially affects rate
of L2 speech act production. However, she linked rate of speech to task difficulty, reasoning
that social demands presented in the scenario may make the task more difficult, and task
difficulty leads to slower speech rate. Although that is not the direct link between pragmatics
and speaking that other researchers have noted, it is nevertheless one of the first studies to
systematically investigate the relationship. Taguchi reported that lower level EFL learners
produced requests and refusals with low power (P), social distance (D), and degree of im-
position (R) appropriately and relatively quickly, but requests and refusals with higher PDR,
less appropriately and more slowly. Additionally, the lower-proficiency group was slower
than the higher-proficiency group, whereas NSs did not show a difference. Taguchi inter-
preted the scenarios describing low PDR as easier tasks than scenarios describing high PDR.
The crucial step is to link speech rate to speech act realization directly, rather than to task.
Recall the analysis of Tateyama’s raters of apology scenarios: at least some speech acts
should be performed slowly, demonstrating the cost to the speaker of making the speech act.
Taguchi observed that in addition to proficiency and task difficulty, speech rate may be
248
Pragmatics
influenced by L1 expectations for reduced speed for consequential acts. More recent work
returns to this question.

The survey in the previous part demonstrates that there is a connection between pragmatics
and the attributes of speaking, although it has not yet been systematically investigated in
pragmatics research. The biggest challenge in studying the intersection of pragmatics and
speaking in 2021 is operationalizing observations about pragmatics and speaking into feasible
research designs. Some research has been undertaken, and will be highlighted in the following
part, but it is not yet part of the established description of pragmatic development.
The preceding review showed that:
• Intonation is an illocutionary force indicating device (IFID) (Couper-Kuhlen, 1986;

Gumperz, 1982)
• Speech fluency (hesitant or non-hesitant delivery) is implicated in the assessment of
sincerity of speech act delivery (Tateyama, 2001)
• Downgraders (e.g., I thínk) and upgraders (e.g., Í think) – also called mitigators and
aggravators – may differ only by stress (Bardovi-Harlig & Hartford, 1996)
• Perception of delivery as flat, monotonic, or rushed may lead to the interpretation of
learners as unfeeling or artificial (House, 1996)
• Scenarios that set up high-imposition and high-stakes speech act realization may con-
stitute more difficult tasks than scenarios eliciting low-imposition, low-stakes speech acts
as measured by rate of speech (Taguchi, 2007)
Our greatest challenge is to move from isolated observations to systematically studying

characteristics of speaking (intonation, stress, rate of speech, and hesitation) within the
speech act framework which characterizes much of L2 pragmatics research. The next part
considers current research that points the way to a transition to pragmatics research that
routinely takes delivery into account.

This part discusses five L2 studies that illustrate the integration of speaking and pragmatics.
The studies examine rate of speech, intonation, and the role of prosody in illocutionary force
and perlocutionary effect. In discussing the studies, I include the research questions to help
readers envision how these studies are formulated.
Speech Acts and Speech Rate

Taguchi (2011) investigated the relation of speech rate to the realization of speech acts, but
unlike her earlier study, this research focused on pragmatics rather than task. Taguchi in-
vestigated high- and low-imposition requests and high- and low-stakes negative opinions
produced by learners studying English at home (EFL) and abroad (ESL). Of the four
measures – appropriateness, planning time, grammaticality, and rate of speech – appro-
priateness and speech rate are most relevant here. The NS control, like the learners, showed
slower speech rates with speech act production under high-imposition/high-stakes conditions
than under low-imposition/low-stakes conditions. Learners were more appropriate with
low-stakes than high-stakes realizations. High-level learners scored higher than lower level
249
learners but environment was not a factor. In this case, as suggested in the previous part,
slower rate of speech was characteristic for all speakers for the higher-stakes realizations.
Intonation in L2 Agreements and Disagreements

Pickering et al. (2012) investigated the pragmatic function of intonation in agreements
and disagreements in English spoken by 12 native speakers of English and 12 Chinese
learners of English. The study “investigate[d] the pragmatic function of intonation in
cueing (dis)agreements in the naturally occurring discourse of American English speakers
and Chinese learners of English” (p. 200). The participants were undergraduate
and graduate students in the southeastern United States, organized into six pairs of
NS–NS and NNS–NNS with two each of male–male, female–male, and female–female
groups. Participants were given pictures of 10 concept cars and asked to determine
which car was the best. The interactions were recorded and transcribed. Six NS English
judges identified agreements and disagreements, independently working from the tran-
scripts (to avoid influence from intonation; in fact, this closely resembles a traditional
pragmatic analysis). Not surprisingly, agreement is more frequent than disagreement in
the tasks (68 agreements and 8 disagreements for NSs; 56 agreements and 13 disagreements for
Chinese ESL learners). The sequences were then subjected to auditory and instrumental
analysis.
Both NSs and learners showed evidence of pitch concord in agreement sequences; that is,
speakers’ pitch range matched their interlocutors’ within 1 semitone (48% and 41%, re-
spectively) and within 2 semitones (72% and 77%, respectively). NS disagreements reliably
showed that the second speaker made a discordant pitch choice, whereas learner disagree-
ments did not. Pitch discord might be a consistent feature of disagreements, but there were
many fewer instances of disagreement to analyze.
Perception of Sincere and Ostensible Apologies and the Role of

Intonation and Pitch
Alexander (2011) investigated the production (and illocutionary force) and the perception
(perlocutionary effect) of apologies. She employed two tasks, the production of sincere and
ostensible apologies by NSs and ESL/EFL learners and the rating of a subset of the apologies
by NSs of English. She also performed an instrumental (acoustic) analysis of a subset of
sincere and ostensible apologies that measured pitch, amplitude, and duration which cor-
relate with listener perception of intonation, pitch prominence, and boundary tones. Two
relevant questions guided the study:
Based on the NSs’ judgments, to what extent can speakers (EFL, ESL, NS) convey
their intent in producing sincere and ostensible apologies?
Are there intonation correlates of perceived sincere and ostensible apologies?
The learners were 30 Thai learners of English, the NS speakers were from the American
Midwest, as were the 45 NS judges. Speakers were given role-plays clearly indicating whether
the apology to be produced was heartfelt (sincere) or ostensible (the speaker was not re-
morseful). The 320 production tokens were judged. Both learners and native speakers pro-
duced apologies recognized as sincere by the judges. However, only the NSs were considered
successful in conveying ostensible apologies. Judges did not report perceiving ostensible
250
Pragmatics
apologies in the learner productions to the same extent. Alexander notes that for all speakers,
and especially learners, the word-level IFIDs present in the apologies likely interacted with
acoustic characteristics, and intonation as an IFID. Apologies including intensifiers such as
“I’m really sorry” and “I’m so sorry” were frequently judged to be sincere, regardless of
prosody.
The analysis of intonation, pitch, and boundary tones was carried out on the highest
scoring apologies from each of three categories from the judgement task: 24 top-rated tokens
from sincere apologies judged to be sincere, 24 intended to be ostensible (insincere) but
judged to be sincere, and 24 that were ostensible apologies correctly perceived to be insincere.
Auditory analysis was conducted: Intonational transcriptions were carried out by the author
and a phonetician; pitch prominence was marked by the author. The 72 tokens meeting the
selection criteria were distributed over three apology formulas, three tokens of bare Sorry, 41
I’m sorry tokens, and 28 tokens of I’m + intensifier + sorry (e.g., I’m really sorry, I’m so
sorry). Considering speech characteristics, items more likely to be perceived as sincere were:
(1) intensified apologies (with or without a pitch accent on the intensifier), (2) apologies
carrying a high pitch accent (H*) on lexical items other than I’m, and (3) apologies ending
with a low boundary tone (L%). Utterances more likely to be perceived as ostensible included
(1) apologies with accented I’m, (2) apologies with a double pitch accent (i.e., L*+ H and L +
H*) which follows an accented intensifier, and (3) apologies ending with a high boundary
tone (H%).
Learners’ Use of Intonation as a Cue of Illocutionary Force

Wan (2020) investigated pragmatics and intonation with a speech-act orientation, but instead
of exploring a particular speech act, she investigated the interpretation of the illocutionary
force of utterances, posing three research questions.
Do learners consider intonation in the interpretation of meaning/force of an

utterance?
Are intonationally unmarked utterances easier to interpret compared to in-

tonationally marked utterances?
Does English proficiency play a role in the success rate of interpreting the meaning/
force of the utterance?
Wan enlisted a male NS to produce 12 sentences with 2 different intonation contours,

yielding 24 utterances. Each was produced with unmarked intonation defined as having
meaning and illocutionary force consistent with the words in the utterance, and marked
intonation, or intonation at odds with the literal interpretation and apparent illocutionary
force of the utterance based on words alone. Differences in intonation were verified by in-
strumental analysis (Praat). Twenty learners (14 intermediate and 6 advanced) and 5 NSs
listened to a series of 24 dialogues (and 12 distractors) in which male and female voices
alternated, always ending with the male voice. Learners heard dialogues such as the
following.
(4). A: What are you working on?

B: A cover letter for a job application. I’m not sure how to talk about my experience.
A: I could take a look at it.
251
B: Do you mind? [unmarked intonation indicating a request for help]
(5). A: I really don’t think you should be walking in your condition.

B: I should be okay.
A: Here, let me help you. Give me your arm.
B: Do you mind? [marked intonation indicating a refusal of the offer of help]
Participants listened to each dialogue twice, then wrote a one-sentence paraphrase of the
last turn in the conversation on their answer sheet. Intermediate learners interpreted
unmarked intonation very easily (97% accuracy), but marked intonation much less ac-
curately (41%). Advanced learners showed 100% accuracy for unmarked and 83% for
marked intonation (in the NS range). To address concerns that the different contexts may
incline learners to the correct interpretation, Wan is developing interpretation tasks to
supplement the current task so that both renditions of a given string occur with the same
dialogue, thus the pair of utterances in the same dialogue is distinguished only by in-
tonation. In an item like Example (6) a participant would hear either B or B′ and provide
an interpretation.
(6). A: I really don’t think you should be walking in your condition.

B: I should be okay.
A: Here, let me help you. Give me your arm.
B: Do you mind? [acceptance of the offer of help]
B′: Do you mind? [refusal of the offer]
The Effect of Proficiency and Study Abroad on the Acquisition of

Tone, Pitch, and Prominence in the Realization of Speech Acts
Kang et al. (2021) investigated the prosody of high-imposition requests and high-stakes
negative opinions compared to low-imposition requests and low-stakes negative opinions,
posing the following research question: To what extent do L2 English learners from varying
proficiency and study abroad backgrounds use prosody differently in speech acts involving
high and low imposition?
The speech acts were elicited by an oral DCT presenting eight scenarios after which the
learners produced a request or negative opinion. The speech acts were subject to instru-
mental analysis for six categories: tone choice (rising tones, falling tones, and level tones),
pitch (overall pitch range), and prominence (space – the percentage of prominent words
out of the total number of words – and pace – the percentage of prominent syllables out of
the total number of tone units). This investigation of pragma-prosodic characteristics is
embedded in a study of the effect of proficiency and study abroad on pragmatic
development.
Sixty-four learners were divided into beginning (443 paper-based TOEFL, N = 22) and
advanced (553 paper-based TOEFL); half of whom studied abroad from 3 months to 3 years
(N = 20) and half stayed home (N = 22). Both proficiency and study abroad played a role.
Lower-proficiency learners used rising tone more frequently and stressed more words than
the higher-proficiency learners, particularly in the realization of high-imposition/impact
speech acts. The use of level tone and pitch range were affected by both proficiency and
study-abroad experience. High-proficiency study-broad students used more level tone and a
wider pitch range than either the high-proficiency at-home group or the lower-proficiency
learners, similar to NS production.
252
Pragmatics
Overview
The five studies reviewed here illustrate the interaction of pragmatics and speaking. Both
words and delivery contribute to conveying illocutionary force (Alexander, 2011; Pickering
et al., 2012; Wan, 2020), the perception of it (perlocutionary effect; Alexander, 2011; Wan,
2020), and sincerity (Alexander, 2011). Pitch concord/discord seems to function as an IFID
(Pickering et al., 2012). Length of utterance influences delivery (Alexander, 2011; Kang et al.,
2021) and should be investigated as a variable. In addition, rate of speech (Taguchi, 2011)
and pitch, stress, and tone were affected by degree of imposition (Kang et al., 2021). If we
combine the two studies (Taguchi, 2011; Kang et al., 2021), we get a more complete view of
development. No doubt, prosody played some role in acceptability judgements. Taguchi
reported that proficiency, and not study abroad, significantly correlated with pragmatic
appropriateness; Kang et al. reported that study abroad was associated with the use of level
tones and wider pitch range.
None of these discoveries could be made without investigating both pragmatics and
speaking, thus suggesting that there is much to learn from exploring this interface.

The relatively few studies at the intersection of speech acts and speaking discussed in Part 4
combine features of pragmatics and speaking studies. The production studies elicited their
language samples using established pragmatics elicitation tasks, sampling reasonable num-
bers of participants for pragmatics studies. The use of free learner production within a
pragmatics task has the advantage of being as close to natural as possible while the scenarios
(Kang et al., 2021; Alexander, 2011) and the ranking task (Pickering et al., 2012) increase the
comparability of the responses. The conditions under which naturalistic production is re-
corded have to be carefully enough controlled to yield analyzable acoustic analyses. If in-
strumental analysis is not possible, multiple raters establishing interrater reliability should be
used for auditory analysis. Judges are used by both research traditions.
The same naturalistic production that I find to be a benefit for pragmatics research comes
at a cost, however. Kang et al. (2021) note that low-imposition requests tend to be shorter
than high-imposition requests, which tend to be elaborated. This means that samples from
the high and low categories can be different lengths and thus have different spoken char-
acteristics. This is echoed in other studies. Recall that raters found apologies with intensifiers
to be sincere, regardless of intonation (Alexander, 2011). In a test of agreements and dis-
agreements, learners were inclined to agree, yielding a highly skewed sample in favour of
agreements (Pickering et al., 2012). This is not to say that we should not employ open-ended
pragmatics tasks like role-plays, but if we do, oral DCTs might be a good supplement,
although even they show natural variation.

Yates (2017) and Pickering (2001, 2018, inter alia) have advocated the integrated teaching of
pragmatics, discourse, and speaking. Ishihara and Cohen (2014) offer comprehensive re-
commendations for teaching pragmatics, most of which are for speaking. Such proposals are
bolstered by an instructional effect study conducted by Derwing, Waugh, and Munro (2021)
showing that pragmatics instruction was insufficient to help learners improve their fluency on
role-plays. They investigated the effect of pragmatics instruction targeting speech acts on the
perceived fluency and comprehensibility of 11 adult community-based ESL learners of
253
intermediate proficiency. The study had two parts starting with eight role-plays with the
researchers (2 per 4 speech acts, requests, refusals, compliments, and apologies), instruction
informed by pragmatics research and recommendations, and a post-test consisting of the
same role-plays. The pragmatics group showed improvement in pragmatics, whereas a
control group, who received no pragmatics instruction, did not. In the second part of the
study, 56 listeners judged 2 refusals and 2 requests from each of the 11 students in the
pragmatics group for both pretests and posttests. The listeners rated the role-plays on three
9-point scales: pragmatics (socially appropriate to extremely inappropriate), comprehensi-
bility (extremely easy to understand to impossible to understand) and fluency (extremely
fluent to extremely disfluent). Raters found the learners to be more appropriate pragmati-
cally after instruction. Comprehensibility ratings improved on three out of four scenarios
(the remaining scenario involved asking an employer for a previously promised raise).
Fluency ratings showed an increase for only one scenario (refusing a customer’s request at a
bank because of lack of ID).
This study suggests that although learners may improve their pragmatics through in-
struction focused on oral production, they did not automatically improve on delivery, and it
further suggests that integrated instruction should be offered. Such instruction is unlikely to
be found ready-made in language textbooks as reviews by pragmatics researchers have found
them to rarely cover pragmatics and only inauthentically when included (Ishihara & Cohen,
2014, among others). Two recent reviews of speaking and pragmatics in English language
teaching textbooks reached similar conclusions.
Diepenbroek and Derwing (2013) reviewed pragmatics and fluency activities separately
in 48 ESL texts used in Canada (12 series of 4 textbooks each). They found fluency ac-
tivities to be lacking and unevenly distributed across textbooks. Like previous reviews,
they cited limited quality, depth of coverage, and lack of contextualization in the pre-
sentation of pragmatics. Petraki and Bayes (2013) reviewed five intermediate-level ESL
textbooks popular in Australia. One textbook addressed intonation and presented ex-
amples of polite, impolite, and sarcastic intonation, and students were encouraged to
express impoliteness and negative affect in role-plays. One textbook has what was de-
scribed as a mini-lesson on polite and impolite intonation, along with a diagram of polite
request intonation, and another presents only four examples. In short, not surprisingly,
the teaching of speaking and pragmatics fares no better than pragmatics alone; the review
did not report whether intonation patterns were authentic (an issue in the portrayal of
speech acts).
Fortunately, both Yates and Pickering have developed concrete proposals and activities
for integrated instruction of pragmatics and pronunciation (Yates, 2017) and discourse in-
tonation (Pickering, 2018). Yates persuasively argues that both pragmatics and pronuncia-
tion should have greater prominence in language teaching. She details the PREFER
approach (Practice-relevant models, Raising awareness of pragmatic and pronunciation is-
sues and their interaction, Experimentation with new pragmatic resources and pronuncia-
tion, Feedback, Exploring the world outside, and Reflection on what to do and how to do it)
(p. 240). She provides excerpts of L2 speakers interacting in their professional setting, and
then provides an instructional activity sequence using the PREFER approach that addresses
the relevant features.
Pickering includes teaching suggestions in many of her articles. In addition, her 2018
monograph provides a practical introduction to discourse intonation for ESL/EFL teachers.
Seven chapters include parts on pedagogical implications; the penultimate chapter includes
teaching suggestions and means for evaluating them. Two appendices developed by ESL
specialists provide additional elaborated activities.
254
Pragmatics
Integrated approaches to teaching pragmatics, discourse, and speaking, such as those

discussed here, can be supplemented by proposals for teaching pragmatics for conversation
and proposals for teaching speaking. Fortunately, there are resources on which innovative
teachers can build. The key to making real progress will be the dissemination of materials.
7 Future Directions
This chapter advocates integrating the study of speaking into the study L2 pragmatics so
regularly that speaking comes to be recognized as a pragmalinguistic resource. The question
is how to achieve that.
One starting point is to replicate a classic study (making it oral, if necessary), include spoken
characteristics in the analysis, and follow it up by including speech characteristics in the results.
A second route is for researchers with existing oral data to consider revisiting the re-
cordings, if they are of sufficient quality to do so. If they cannot be analyzed instrumentally
or auditorily, the recordings may suffice for a pilot for a new study.
A third type of investigation involves studying task effects, exploring how tasks influence the
spoken characteristics of speech act realization. While writing this chapter, I thought about the
extent to which the oral DCT might contribute to what we impressionistically described as
monotonic or monotonous production. Could recording oneself talking aloud to no one other
than the computer contribute to the observed lack of pitch variation? Kang et al. (2021) also
report monotonic production by some learners completing a similar task. On the other hand,
Pickering et al. (2012) also reports monotonous delivery of class lectures which do have an
audience. However, the lectures of international teaching assistants are often not interactive in
the way that the lectures of NS teaching assistants are. Thus, the relevance of interlocutors and
their influence on speech act delivery remains to be investigated.
Finally, this is an area that calls for collaboration. Few L2 pragmatics researchers are
trained in acoustic analysis and few L2 speaking researchers are trained in pragmatics, al-
though as this chapter shows, such individuals exist. However, combining expertise within
the SLA subfields is another route towards establishing research at the interface of L2
pragmatics and speaking.
Author Note
I wrote this chapter as a pragmatics researcher from the perspective of what speaking can
contribute to pragmatics. I would be pleased to read the mirror-image chapter, what prag-
matics can add to the research of speaking, but I do not have the expertise to write it.
Further Reading
Kang, O., & Kermad, A. (2019). Prosody in L2 pragmatics research. In N. Taguchi (Ed.), The
Routledge handbook of second language acquisition (pp. 78–92). Routledge.
Various approaches to understanding prosody are explored, along with the role of prosody in NS
pragmatics. L2 pragmatics and prosody are discussed.
Yates, L. (2017). Learning how to speak: Pronunciation, pragmatics and practicalities in the classroom
and beyond. Language Teaching, 50, 227–246.
Yates argues for integrating the teaching of pragmatics and pronunciation. This article presents a
pedagogical plan adaptable to any target language.
Kang, O., Kermad, A., & Taguchi, N. (2021). The interplay of proficiency and study abroad experience
on the prosody of L2 speech acts.
An example of a pragmatics study integrating a speech act approach with a description of six char-
acteristics of delivery.
255
References
Alexander, S. T. (2011). Sincerity, intonation, and apologies: A case study of Thai EFL and ESL learners
[Unpublished doctoral dissertation]. Indiana University.
Austin, J. L. (1962). How to do things with words. Cambridge: Harvard University Press.
Bardovi-Harlig, K. (1999). Exploring the interlanguage of interlanguage pragmatics: A research agenda
for acquisitional pragmatics. Language Learning, 49, 677–713.
Bardovi-Harlig, K. (2009). Conventional expressions as a pragmalinguistic resource: Recognition
and production of conventional expressions in L2 pragmatics. Language Learning, 59,
755–795.
Bardovi-Harlig, K. (2010). Exploring the pragmatics of interlanguage pragmatics: Definition by design.
In A. Trosborg (Ed.), Pragmatics across languages and cultures (Vol. 7 of Handbooks of pragmatics;
pp. 219–259). Berlin: Mouton de Gruyter.
Bardovi-Harlig, K. (2012). Pragmatics in SLA. In S. M. Gass & A. Mackey (Eds.), The Routledge
handbook of second language acquisition (pp. 147–162). London: Routledge/Taylor Francis.
Bardovi-Harlig, K. (2013). On saying the same thing: Assessing the production of conventional ex-
pressions in L2 pragmatics. Pragmatics and language learning, 13, 191–211.
Bardovi-Harlig, K. (2015). Disinvitations: You’re not invited to my birthday party! Journal of
Pragmatics, 75, 91–110.
Bardovi-Harlig, K. (2015). Operationalizing conversation in studies of instructional effects in L2
pragmatics. System, 48, 21–34.
Bardovi-Harlig, K. (2018). Matching modality in L2 pragmatics research design. System, 75, 13–22.
Bardovi-Harlig, K., & Hartford, B. S. (1996). Input in an institutional setting. Studies in Second
Blum-Kulka, S., House, J., & Kasper, G. (Eds.) (1989). Cross-cultural pragmatics: requests and
apologies. Norwood, NJ: Ablex.
Burns, A. (2013). A holistic approach to teaching speaking in the language classroom. In M. Olofsson
(Ed.), Symposium 2012: Lärarrollen i svenska som andraspråk (pp. 165–178). Stockholm: Stockholms
universitets förlag.
Couper-Kuhlen, E. (1986). An introduction to English prosody. Tübingen: Niemeyer.
Derwing, T. M., Waugh, E., & Munro, M. J. (2021). Pragmatically speaking: Preparing adult ESL
students for the workplace. Applied Pragmatics, 3, 107–135.
Diepenbroek, L., & Derwing, T. M. (2013). To what extent do popular ESL textbooks incorporate oral
fluency and pragmatic development? TESL Canada Journal, 30, 1–20.
Escandell-Vidal, V., & Prieto, P. (2021). Pragmatics and prosody in research on Spanish. In D. A.
Koike, & J. C. Félix-Brasdefer (Eds.), The Routledge handbook of Spanish pragmatics (pp. 149–166).
Gass, S. M. & Houck, N. (1999). Interlanguage refusals: A cross-cultural study of Japanese English.
Berlin: Mouton de Gruyter.
Goh, C. & Burns, A. (2012). Teaching speaking: A holistic approach. New York: Cambridge University Press.
Grice, H. P. (1975). Logic and Conversation. In P. Cole & J. Morgan (Eds.), Speech Acts (Syntax and
Semantics, Vol. 3, pp. 41–58). New York: Academic Press.
Hirschberg, J. (2017). Pragmatics and prosody. In Y. Huang (Ed.), The Oxford handbook of pragmatics
House, J. (1996). Developing pragmatic fluency in English as a foreign language: Routines and me-
tapragmatic awareness. Studies in Second Language Acquisition, 18, 225–252.
Ishihara, N., & Cohen, A. D. (2014). Teaching and learning pragmatics: Where language and culture
meet. Abingdon, UK: Routledge.
Kang, O., & Kermad, A. (2019). Prosody in L2 pragmatics research. In N. Taguchi (Ed.), Routledge
handbook of SLA and pragmatics (pp. 78–92). New York: Routledge.
Kang, O., Kermad, A., & Taguchi, N. (2021). The interplay of proficiency and study abroad experience
on the prosody of L2 speech acts. Journal of Second Language Pronunciation.
Koike, D. & Félix-Brasdefer, J. C. (Eds). The Routledge handbook of Spanish pragmatics. New York:
Routledge.
Kasper, G., & Rose, K. R. (2002) Pragmatic development in a second language.Oxford: Blackwell.
Myles, F., Hooper, J., & Mitchell, R. (1998). Rote or rule? Exploring the role of formulaic language in
classroom foreign language learning. Language Learning, 48, 323–363.
256
Pragmatics
Nyugen, T. T. M. (2019). Data collection methods in L2 pragmatics research: An overview. In N.

Taguchi (Ed.), Handbook of SLA and pragmatics (pp. 195–211). New York: Routledge.
Petraki, E., & Bayes, S. (2013). Teaching oral requests: An evaluation of five English as a second
language coursebooks. Pragmatics, 23, 499–517.
Pickering, L. (2001). The role of tone choice in improving ITA communication in the classroom.
TESOL Quarterly, 35, 233–255.
Pickering, L. (2018). Discourse intonation: A discourse approach to teaching the pronunciation of English.
Ann Arbor: University of Michigan Press.
Pickering, L., Hu, G., & Baker, A. (2012). The pragmatic function of intonation: Cueing agreement and
disagreement in spoken English discourse and implications for ELT. In J.Romeo (Ed.), Pragmatics
and prosody in English language teaching (pp. 199–218). Heidelberg: Springer-Verlag.
Schneider, K. P. (2011). Imagining conversation: How people think people do things with words.
Sociolinguistic Studies, 5, 15–36.
Searle, J. R. (1969). Speech acts: An essay in the philosophy of language. Cambridge: Cambridge
University Press.
Shively, R. (2018). Learning and using conversational humor in a second language during study abroad.
Berlin: Mouton.
Taguchi, N. (2007). Task difficulty in oral speech act production. Applied Linguistics, 28, 113–135.
Taguchi, N. (2011). Do proficiency and study-abroad experience affect speech act production?: Analysis
of appropriateness, accuracy, and fluency. International Review of Applied Linguistics, 49, 265–293.
Taguchi, N. (Ed.) (2019). The Routledge handbook of second language acquisition and pragmatics. New
York: Routledge.
Tateyama, Y. (2001). Explicit and implicit teaching of pragmatic routines. In K. Rose & G. Kasper
(Eds.), Pragmatics in language teaching (pp. 200–222). Cambridge: Cambridge University Press.
Wan, K. (2020). Interpretation of intonation and meaning by L2 English learners. Unpublished Indiana
University seminar paper.
Yates, L. (2017). Learning how to speak: Pronunciation, pragmatics and practicalities in the classroom
and beyond. Language Teaching, 50, 227–246.
257
PART IV
Teaching Speaking
18
SECOND LANGUAGE SPEAKING
STRATEGIES
Sara Kennedy
In this chapter, second language (L2) speaking strategies are considered a subset of second
language communication strategies, with second language referring to any language(s) other
than a speaker’s dominant language(s) learned in childhood. Both communication strategies and
speaking strategies can be conceived of in quite focused or in broader terms, but speaking
strategies in this chapter encompass any spoken attempts to “enhance the effectiveness of
communication,” per Canale’s (1983) definition (as cited in Hung & Higgins, 2016, p. 903). As
elaborated in the next part, L2 speaking strategies can be conceived of narrowly, as something
used to fill a gap in communications, such as asking an interlocutor to clarify the meaning of an
utterance, or can be conceived of more widely as something done to enhance relations and
interaction between interlocutors, such as re-using an interlocutor’s words or phrases in a
subsequent turn. In Table 18.1, a set of L2 speaking strategies which have been commonly
grouped under a narrow (problem-oriented) approach are presented, followed by Table 18.2,
which presents a set of L2 speaking strategies more oriented towards supporting and enhancing
interaction between interlocutors. It is important to note that the problem-oriented strategies
could also be used to enhance interaction, but are here focused specifically on addressing
problems in L2 communication. A brief history of the framing and analysis of L2 speaking
strategies is described, followed by a discussion of some critical issues and topics in research on
L2 speaking strategies. Current themes in research on L2 speaking strategies are then outlined,
followed by the main research methods used in studies on L2 speaking strategies. Finally, some
recommendations for teaching practice and potential future directions in research are discussed.
L2 speaking strategies (included under L2 communication strategies) were initially described as
errors resulting from speakers’ incomplete knowledge of the L2 (Richards, 1971), such as
inappropriate cross-linguistic transfer of first language (L1) vocabulary (e.g., enregistrates
[records] the light); soon after, L2 speaking strategies became more widely viewed as speakers’
reactions to problems in communicating; these reactions were categorised according to surface-
level moves such as switching topics or reformulating messages (Ervin, 1979). Later researchers
incorporated cognitive processes into the analysis of L2 speaking strategies, such as speakers’
DOI: 10.4324/9781003022497-23 261

Sara Kennedy
Table 18.1 Problem-oriented L2 speaking strategies
Problem-Oriented
Code switching Kouwenhoven et al. (2018)

Asking for clarification – A: “so my first research question in my Jamshidnejad (2011)
project is what the pattern, the model…”B: “Model?”
Comprehension check – “Between the flag and the finish it’s a Kennedy (2017)
mountain. You know?”
Re-structuring “We don’t ha- (.) We don’t know the … decisions” Chang and Liu (2016)
Use of all-purpose words – “it’s a stuff, and a we put it on our Pipes (2019)
um desk”
Circumlocution – explaining the characteristics of a target lexical item Bøhn and Myklevold (2018)
which is not known or remembered in the L2. “You use it to stay
dry in the rain.” – umbrella
Table 18.2 Interaction-oriented L2 speaking strategies
Interaction-Oriented
Let it pass – listener allows unclear words or utterances to “pass” unless Firth (1996)
understanding them becomes essential to communication
Build solidarity by using mutual L1 Lauriks et al. (2015)
Convey warmth and care through voice quality Jain and Krieger (2011)
Invitation to continue A: “…the people don’t accept me and aaaa I don’t Jamshidnejad (2011)
know mmmm what do you want me to say? (Laughing)’B: “what do
you mean they don’t accept you?”
Using different forms of address for rapport building A: “Morning Collier (2010)
Dennis, small cup, darlin’?”B: “Small.”
Anticipating and completing an interlocutor’s phrase or utterance – A: “I Björkman (2014)
am gonna ask him what what does it what does it” B: “consume” A:
“yeah consume…”
control of how meaning could be expressed even if knowledge of the L2 was insufficient, such
as using a word from the L1 (Bialystok, 1990).
Ultimately, approaches to the study and analysis of L2 speaking strategies coalesced into
two spheres: a psycholinguistic perspective and an interactional perspective. In the psycho-
linguistic perspective, L2 speaking strategies are explained through reference to cognitive and
linguistic models of language learning and use, such as Levelt’s model (1983, 1989, 1993,
1995; see de Bot & Bátyi, this volume). From this perspective, L2 speaking strategies are
typically viewed as steps taken to address speakers’ problems or challenges in commu-
nicating, such as providing fillers or hesitation devices while searching for words or con-
tinuing an utterance (e.g., uhh…the thing is…). From the interactional perspective, L2
speaking strategies are part of an overall enterprise of interlocutors jointly constructing and
achieving understanding (e.g., Firth, 1990). For example, one interlocutor who uses a lexical
item from her L1 while speaking an L2 is not solving a problem in production, but taking a
step with other interlocutors to jointly achieve orderly interaction by using all available
resources and reaching mutual understanding (Firth & Wagner, 1997).
Much research in the late 1990s and early 2000s focused on one of two areas: training
in L2 speaking strategies, which was generally framed in the psycholinguistic perspective
262
Second Language Speaking Strategies
(e.g., Scullen & Jourdain, 2000), and studies about the use of English as a lingua franca
(ELF), which explained the use of L2 speaking strategies as resources jointly used by in-
terlocutors to enable successful communication using ELF (e.g., House, 2003). Computer-
mediated communication also became a more frequent context for research on L2
communication strategies in the early 2000s; however, it was only in the 2010s, when tech-
nological advances allowed for widespread use of digitally-based video communication, that
the use of L2 speaking strategies in computer-mediated communication began to be more
widely studied, especially in post-secondary settings (e.g., Shih, 2014). This research generally
has taken a psycholinguistic perspective in identifying and classifying the use of L2 speaking
strategies, while research in ELF settings continues to frame L2 speaking strategy use from
an interactional perspective, with an increasing variety of research contexts such as profes-
sional workplaces, customer service centres, and small enterprises (e.g., Collier, 2010).
Many of the critical issues and topics discussed in the next part build on and extend the
more recent L2 speaking strategy research on pedagogical interventions, descriptions of
second language interactions, and Internet and mobile technology.

Early research on L2 speaking strategies focused on identifying and descriptively framing
those strategies, while later research, especially in the 2000s, increasingly focused on L2
speaking strategy instruction. Results generally showed that after adult L2 learners received
explicit instruction and guided practice in using L2 speaking strategies, they used or reported
using significantly more strategies in prompted speaking tasks than adult L2 learners who
had not received such instruction and practice (e.g., Bøhn & Myklevold, 2018; Dörnyei,
1995; Nakatani, 2005). Not all speaking strategies which were taught showed an increase, but
explicit instruction did have some significant effect on speaking strategy use in most studies.
This raises the question of whether some L2 speaking strategies are more “teachable” or,
rather, “learnable” than others. In those studies where learners were explicitly taught a range
of L2 speaking strategies, some learners showed a general increase in L2 speaking strategy
use (e.g., Lam, 2010; Rossiter, 2003), while others showed most improvement on strategies
related to filling pauses and minimizing hesitations (Dörnyei, 1995; Kongsom, 2009;
Nakatani, 2005). Other L2 strategies used more than others after explicit instruction were
circumlocution (Dörnyei, 1995; Rabab’ah, 2016), appeals for help, and asking for repetition
(Rabab’ah, 2016).
Even if some L2 speaking strategies are adopted more easily by learners than others, it is
unclear whether or how the increased use of those strategies contributes to (a) L2 learning or
(b) the communicative speech event in which they are used. Researchers have noted that
raters’ assessments of speaking test scores and of task effectiveness are higher when L2
learners use or report using more speaking strategies (Lam, 2010; Nakatani, 2005).
Generally, the long-term learning effects of the use of L2 speaking strategies has not been
explored (Guo, 2011). In terms of the ways in which speaking strategy use contributes to
communicative speech events, ratings and test scores do not necessarily reflect the effects on
interlocutors. In authentic communicative speech events (e.g., phone message, service
transaction), whether monologic or interactive, the effective use of L2 speaking strategies is
best judged by the participants (interlocutors) in the communicative event. Interlocutors
might have different views than onlookers of the effects or intent of speaking strategy use.
For example, Chang and Liu (2016) note the different purposes for which interlocutors used
pauses in their spoken interaction, as explained by the interlocutors in stimulated recalls. The
effectiveness of the use of a given L2 speaking strategy might vary from speech event to
263
Sara Kennedy
speech event, depending on the intent and interpretation of the interlocutors and the purpose
of the speech event, as well as other individual and contextual factors such as proficiency
level or membership in a community of practice. The influence of different individual or
contextual factors, such as creativity level or role in the workplace, on speakers’ use of L2
speaking strategies has been established (e.g., Carter Pipes, 2019; Collier, 2010); however, the
outcome of that use is rarely explored, likely because of the difficulty of determining how the
use of a particular speaking strategy has affected communication, interlocutors’ commu-
nicative goals, or interlocutors’ relationships. It is easier to measure changes in the use of
particular L2 speaking strategies than to measure effective use of L2 speaking strategies, but
strategies are used to address a challenge or a purpose in a communicative speaking event. It
is important not only that L2 speakers learn to use L2 speaking strategies more frequently,
but also that they use those strategies in ways that benefit their purpose in speaking.
Therefore, teachers and researchers need to more consistently explore the reasons why L2
speakers use particular speaking strategies and also explore interlocutors’ perceptions of the
effects of employing those strategies on the communication itself and on relationships be-
tween interlocutors (see Hung & Higgins, 2016). Few teachers or researchers would suggest
that word coinage, for example, is a more effective L2 speaking strategy for all interlocutors
and in all contexts than the strategy of asking for assistance. However, more attention has
been paid to how focused instruction can increase the use of particular L2 speaking strategies
than to how L2 speakers effectively use a range of L2 speaking strategies, depending on their
capacities and contexts.
One domain where an increasing amount of L2 speaking strategy research is being done is
Internet and mobile technology. Researchers have found that L2 learners’ computer- or
mobile device-mediated use of L2 speaking strategies may differ in clear ways from L2
learners’ use of L2 speaking strategies in face-to-face interaction (e.g., Smith, 2003). The
modalities of interaction using computers or mobile devices can take many forms, from two-
way voice communication to audio-video communication to communication in virtual rea-
lity settings. These different technologies contribute various affordances, “constraining and
enabling aspects that are brought about by technological artifacts” (Rosenbaun, 2016, p. 7).
In some synchronous computer-mediated communication, the interlocutors may not be
known to each other, may not be physically visible to each other (even if avatars are visible),
and may not demonstrate a clearly defined purpose in communicating except to use the
technology (e.g., Skypecasts in Brandt & Jenks, 2013), so the use of L2 speaking strategies
can be quite different from face-to-face interaction or even telephonic interaction, where one
party purposely has contacted another party. Additionally, the possible combination of L2
speaking strategies together with communication cues of other kinds (e.g., using written text
and symbols, gestures by avatars) can support spoken communication in ways that are less
common in face-to-face communication (Shih, 2014).
Users of Internet and mobile technologies may be in communication even if they are not
affiliated through jobs or careers, schooling, or geographic regions. The use of the tech-
nologies themselves may contribute to what has been called Transient International Groups
(Pitzl, 2019) or Transient Multilingual Communities (Mortensen, 2017), where an L2 speaker
might not interact repeatedly with another speaker and so build a relationship, but might
interact transiently with other speakers, with no clearly shared purpose or shared rules for
engagement. The nature of these interactions may up-end typical expectations for the use of
L2 speaking strategies. For example, an account executive and an international client
speaking via a video call may each expect L2 speaking strategies to be used in somewhat
predictable ways, due to the established relationship and communication norms and
(perhaps partial) joint purpose in communicating; however, two interlocutors who encounter
264
each other via a massively multiplayer online game, such as World of Warcraft, might use L2
speaking strategies in quite different ways from one interaction to the next, due to the lack of
established norms of communicating in an environment with the potential for many transient
encounters (see Jenks, 2009, for an early exploration of this concept). It is important,
therefore, that researchers who study L2 speaking strategies also focus on communicative
technologies which afford transient (virtual) encounters between L2 speakers who may not
want or need to establish shared norms for communication with their interlocutors. The use
and function of L2 speaking strategies in these environments may be noticeably more
idiosyncratic and distinctive to a given interaction.

Recent research in L2 speaking strategies includes focused studies on different factors which
are linked to their use, and research in different contexts of L2 interaction that include L2
speaking strategies. Both individual and contextual factors have been found to be linked to
particular ways that L2 speakers use L2 speaking strategies. In L2 Mandarin classes, learners
from North American backgrounds reported using relatively more socioaffective strategies
when speaking, such as taking risks and telling themselves to relax, as compared to learners
from East Asian backgrounds who reported using relatively more word-oriented strategies,
such as drawing pictures or writing Chinese characters when words they say are not un-
derstood by listeners (Hsieh, 2014). Other studies have reported on relationships between L2
speaking strategy use, performance in speaking tasks or tests, and other factors. Zhang and
Liu (2013) found that L2 English learners’ use of speaking strategies in an oral test was
mediated by their self-reported test anxiety. Learners with high self-reported test anxiety
tended to more frequently use strategies such as message reduction and message abandon-
ment, and less frequently to use strategies such as social affective or fluency-oriented stra-
tegies (e.g. encouraging oneself, taking time to express oneself). These learners also tended to
perform less well on the oral test. An increasing number of studies focus on the ways that L2
speakers’ use of speaking strategies interact with individual difference factors and contextual
factors, and their relation to ratings of L2 speech performance; these studies can reveal
factors which may influence L2 speakers’ use of speaking strategies and listeners’ impressions
of their speech.
Another rapidly proliferating area of research is studies about L2 learning via multiplayer
online gaming [e.g., World of Warcraft, Civilization (Civ)] or online virtual environments
(e.g., Second Life). The interaction taking place in these online environments includes the use
of L2 speaking strategies, with some researchers finding that the use of these strategies in
online gaming or virtual environments showed clear links to the affordances in the en-
vironments or games. For example, when referring to directions or objects in Second Life,
where interlocutors could take the perspective of their own avatar or multiple perspectives,
L2 speakers often used their avatar to make clear what they would subsequently speak
about, such as moving their avatars in certain directions or using their avatars to point to
specific objects (Wigham & Chanier, 2013). L2 speakers in a virtual online environment also
used a combination of verbal and non-verbal strategies to buy time during interactions, to
take or maintain a speaking turn, and to ask for assistance from others. These combinations
of verbal and non-verbal strategies gave L2 speakers more tools for resolving communication
problems and participating in the interactions (Shih, 2014). In multiplayer online gaming,
Newgarden and Zheng (2016) found that L2 English speakers who repeatedly played World
of Warcraft with L1 English speakers in the context of a university course were able to
engage in more advanced language tasks and more coordinated language activities with
265
Sara Kennedy
interlocutors, including speaking strategies which were goal-directed and strongly connected
to interlocutors’ actions and speech. Research on L2 speaking strategies will continue to
intensify in these multiplayer online gaming and virtual environments, which offer contexts
for interaction where interlocutors may differ in their levels of target language proficiency or
of familiarity with each other and with the online environment (see part on Transient
Multilingual Communities); this contrasts with many classroom settings, where patterns of
interaction are more structured and more familiar to interlocutors.
Another area where study of L2 speaking strategies continues to grow is research in ELF
settings. In these contexts, speakers from a range of L1s use English primarily for commu-
nicative purposes, rather than in pedagogical settings. The ways speaking strategies in English
are used may therefore be less informed by speaking according to L1 English norms and more
by using English to meet interlocutors’ communicative purposes. Ehrenreich (2018) explained
how the concept of community of practice was useful for ELF research. To constitute a
community of practice, groups of people must interact regularly, must have a joint goal or
purpose that guides the interrelated actions of group members, and must have a “shared re-
pertoire” (Wenger, 1998, as cited in Ehrenreich, 2018, 43), a set of resources for negotiating
meaning within the group, which includes linguistic resources. In post-secondary academic
settings, Björkman (2014) found that students using ELF while doing group work at a Swedish
university used speaking strategies mainly focused on confirming or checking accurate un-
derstanding (e.g., clarification requests), rather than speaking strategies focused on nativelike
use of English (e.g., word replacement). In business settings, Firth (1996) found that in sales
phone calls between ELF users, non-standard or potentially unclear utterances were often not
singled out by interlocutors (let it pass), but Tsuchiya and Handford (2014) noted that in a
multiparty meeting on designing a large bridge, two members of the meeting regularly re-
quested clarification of or reformulated unclear or ambiguous utterances from other members.
Tsuchiya and Handford suggested that the occurrence of these potentially face-threatening
strategies may have been due to the previously-attested combative nature of construction
communication (communication in construction-related fields), the gender of the participants
(all male), and the belief of one of the two members (the Chair of the meeting) that he had a
special responsibility to ensure that all meeting members understood what was being said.
Clearly, the use of L2 speaking strategies is influenced by the context of target language use;
where the target language is being used as a lingua franca (with little to no reference to L1
speaker norms), the use of L2 speaking strategies might be quite different from their use in an
L2 or language learning context. Additionally, if interaction is taking place in communities of
practice with specific purposes and repertoires for negotiating meaning and understanding, the
use of L2 speaking strategies might be noticeably different amongst communities because of
the range of purposes and repertoires appropriate to each community. The context for use of
L2 speaking strategies is as important as the nature of the strategies themselves.

Various contexts, tasks, and methods have typically been used to record or recall speech and to
frame the identification and interpretation of speaking strategies. Speech can be recorded or
recalled from interactive speaking tasks which are assigned to participants to elicit speech, an
approach taken both in classrooms and in lab contexts (e.g., Fernández Dobao, 2012). Speech
from naturalistic interactions, such as teacher–student speech in classrooms, speech between
colleagues in workplaces, or authentic service encounters or conversations with acquaintances
can also be recorded or recalled (e.g., Birlik & Kaur, 2020). In practical terms, it is easier to
record speech in classroom or lab contexts or in online settings, where researchers can have
266
relatively more influence over the environment and the recording quality, as opposed to
workplaces, homes, or other naturalistic settings where ambient noise can affect the intellig-
ibility of recorded speech. However, the use of recordings from naturalistic settings is more
likely to reflect situated and contextualized L2 speech, where the nature of the surrounding
environment and interlocutors may considerably shape how, when, and which speaking
strategies are used. The recall of L2 speaking strategy use (via retrospection) is less common
than the recorded use of L2 speaking strategies in data collection, likely because of the risk of
speakers forgetting or not noting use of strategies during a specific speaking task; “if the task is
complicated or takes a lot of time, the participant can forget some of the mental processes that
occurred” (Perry, 2011, p. 119). Stimulated recall, where speakers are presented with excerpts
of their recorded speech and are asked to describe their thoughts at the time, can help speakers’
recollection of the strategies they used (e.g., Lam, 2010; Poulisse et al., 1987). Alternatively,
researchers can collect self-reports of recalled speaking strategies in contexts where recording
speech is not practical or feasible. Self-reports from questionnaires (e.g., Kongsom, 2009) can
also be used as teaching opportunities for raising learners’ consciousness of different types of
L2 speaking strategies that they already use or could start to use.
The identification and analysis of L2 speaking strategies has changed over time. Early fra-
meworks identified speaking strategies according to their surface-level characteristics, such as
word coinage or topic switch (Ervin, 1979). Later frameworks incorporated speakers’ cognitive
processes, such as the source of information which speakers drew upon to produce L2 speaking
strategies, such as the speaker’s L1 (Bialystok, 1983). Psycholinguistic models of language
learning and use were incorporated beginning in the late 1980s to categorise ways in which L2
speaking strategies reflected psycholinguistic processes (Bongaerts & Poulisse, 1989). All of these
frameworks focused on individual speakers and the means by which they addressed the chal-
lenges in speaking in an L2 (a cognitive/psycholinguistic approach). However, another approach
to analyzing the use of L2 speaking strategies is not as the actions of individual speakers in
response to possible problems in communicating, but as the interactive use of resources by
interlocutors as a natural outcome of communication, accomplished by interlocutors co-
constructing and jointly achieving understanding (Firth, 1990). With this approach to analyzing
L2 speaking strategies, the L2 communication analyzed must be interactive (i.e., with a person
listening to the speaker) and naturalistic (not elicited or guided by a researcher or prompt).
Generally speaking, speech samples analyzed for L2 speaking strategies range across a
continuum from monologic or interactive speech elicited from prompts in labs or classrooms
to unguided speech authentically used in naturalistic settings, whether the setting is a
classroom, a family residence, a restaurant, or a video call. L2 speaking strategies can be
identified and analyzed according to their surface-level characteristics (e.g., circumlocution,
language switch), to cognitive processes or psycholinguistic models which are presumed to
underlie the use of those speaking strategies, or according to a more qualitative description
of the resources used by interlocutors in the process of accomplishing orderly interaction in a
given context. The choice of speech samples, analytic frameworks, and data collection
methods (recorded versus recalled) can depend on practical considerations of collecting and
analyzing data, the researcher’s questions, and the researcher’s explicit or implicit perspective
on how languages are primarily learned, used, and analyzed, whether it be from a cognitive
basis, a constructivist (interactional) basis, or a combination of the two.

L2 speaking strategies have been observed, elicited, and taught in several contexts: class-
rooms, research labs, work and academic settings, and in virtual communication. Although
267
Sara Kennedy
many L2 speaking strategies have been shown to be learnable through explicit instruction
(e.g., circumlocution, use of all-purpose words), it is strategies related to L2 oral fluency,
such as filling pauses and minimizing hesitations (e.g., “let me see…”), which were learned in
multiple studies in different instructional contexts. These fluency-related strategies may
simply be relatively easy to adopt, and thus “low-hanging fruit” for teachers. Other L2
speaking strategies, such as re-structuring, may be more challenging for L2 speakers to use
proficiently. The acquisition and use of speaking strategies that require a degree of individual
linguistic flexibility or creativity, such as paraphrasing or word coinage, or which involve an
interlocutor, such as appealing for help or rephrasing an interlocutor’s utterance, may be
influenced by individual differences (e.g., vocabulary or syntactic knowledge, extroversion,
or confidence). They may also be influenced by contextual characteristics (e.g., the status or
mutual familiarity of interlocutors, the topic, or each speaker’s assumptions about pragmatic
norms for communication). Although studies on L2 speaking strategy instruction are
longitudinal by nature, with typical instructional periods measured in weeks and months, the
long-term impact of instruction is rarely examined (Guo, 2011). What is clear is that L2
speakers receiving explicit instruction in L2 speaking strategies do produce at least some of
the instructed L2 speaking strategies more frequently. However, the learnability of particular
L2 speaking strategies may be influenced by many factors, including the target language, L2
speakers’ L1, their level of L2 proficiency, or the sociopragmatic context for communication.
While individual L2 speaking strategies may be more or less learnable, it remains unclear
which L2 speaking strategies are important to learn. To my knowledge, the effectiveness of
using particular L2 speaking strategies in particular interactions has not been examined.
Several researchers have analyzed the effects of the use of specific strategies on interlocutors’
cognitions or on the interaction itself, using an etic (external to the interaction) or emic (from
an interlocutor’s perspective) approach (e.g., Chang & Liu, 2016; Kaur, 2011). Nevertheless,
findings from these studies are not meant to represent the utility or effectiveness of particular
speaking strategies; rather, the effect of using a specific L2 speaking strategy is analyzed within
the context of the particular interaction with those particular interlocutors. How, then, are
teachers to decide on which L2 speaking strategies to prioritise for explicit instruction?
Clearly, no L2 speaking strategies have the same effect on L2 communication in every
communicative context. Rather than focusing on a restricted set of L2 speaking strategies as
instructional targets, teachers might consider the purposes for which their L2 learners will
need or want to communicate: to have their L2 speech assessed, to communicate while
travelling, to find a job, etc. The goal for L2 learners would be to develop the flexibility to
select from a range of L2 speaking strategies so as to use ones that support learners’ com-
municative purposes. For example, an L2 learner seeking a job may feel that fluent speech is
more important than complex speech, so may use speaking strategies such as filling pauses
and using formulaic time-gaining phrases (e.g., let me see).
Teachers are familiar with target language contexts and, potentially, with challenges that
learners may face in learning to use relevant strategies. One approach to address the use of
different L2 speaking strategies in a given situation could be for teachers to present examples,
whether authentic or instructional, of specific communicative situations such as retail
transactions to raise awareness of the use or potential use of L2 speaking strategies. L2
learners working in a retail setting, for instance, could potentially build rapport through
repeating interlocutors’ language, or could address difficulties with lexical retrieval by
paraphrasing. Once learners’ awareness is raised about possibilities for using speaking
strategies in the L2, learners could engage in guided or more spontaneous speaking in a
similar communicative situation; learners could then analyze how they used or could have
used L2 speaking strategies for particular purposes, whether by recalling their speech or by
268
listening to recordings of it. This cycle of awareness-raising, practice, and analysis can be
repeated for similar or different communicative situations in order to enhance learners’ re-
pertoire and flexibility in the selection of L2 speaking strategies.
Unfortunately, there is no published evidence for specific L2 speaking strategies which are
generally more communicatively effective than other strategies. L2 speakers, like other
speakers, can analyze the communicative effects of using particular speaking strategies by
reflecting on particular instances of communication. Teachers who know that their learners
struggle in particular areas of L2 speaking, such as lexical retrieval, might want to focus in-
struction on very specific speaking strategies to address those struggles; however, most teachers
and L2 learners cannot easily predict which speaking strategies will be needed in L2 com-
munication. L2 speakers who are resourceful, adaptable, and sensitive to particular commu-
nicative needs will be better placed to use L2 speaking strategies which meet their needs.
7 Future Directions
Much of the early research on L2 speaking strategies was set in classrooms, whether as
snapshots of strategy use by L2 learners, or as studies of effects of explicit instruction on L2
speaking strategies. To date, little published research has examined the long-term effects of
explicit instruction on the use of L2 speaking strategies, or L2 speakers’ longitudinal learning
of L2 speaking strategies in naturalistic (non-instructed) settings. These are valuable topics
for future investigation, especially given the rise in the use of Internet and virtual technology
for professional, educational, and recreational purposes.
These technologies allow for speech and interaction for some of the same purposes as off-
line contexts, such as sales calls or training sessions, and have affordances for the use of
many different modalities at the same time, such as images, video, and text. These multi-
modal communications, which can often be recorded, accentuate the combination of L2
speaking strategies with other, non-speech elements, which contribute in their entirety to the
communication. Gullberg (2006) explored this combination in face-to-face interaction, but
the surge in online multimodal communication means that the L2 speaking strategies could
and should be examined in conjunction with other communication cues, as in Wigham and
Chanier (2013), who found that some L2 users of Second Life L2 often used their avatars to
help convey the message of their next utterance, such as moving their avatars towards a
certain area or using their avatars to point to specific objects.
More research is being done on authentic use of L2 speaking strategies outside classroom
contexts, especially in international work and academic settings (e.g., Björkman, 2014; Du-
Babcock, 2013); however, the examination of how the use of L2 speaking strategies is in-
fluenced by social and political environments and interlocutors’ status (e.g., Lauriks et al.,
2015) is still an emergent area ripe for further research. Similarly, research on Transient
Multilingual Communities has the potential to extend our knowledge of how L2 speaking
strategies are used in contexts where interactional norms may not be established or shared.
Finally, it would be interesting to explore the effects of pedagogical interventions fo-
cusing on preparing L2 learners to use L2 speaking strategies to react to ongoing com-
municative or interactional needs. That is, explicit instruction would centre not only on
particular L2 speaking strategies, but on guiding learners to recognize their communicative
or interactional needs during particular L2 spoken interaction, to draw on their available
strategic resources, and to reflect on the effects of their use of L2 speaking strategies. When
L2 speakers can engage in communication using whichever strategic resources they feel are
suitable, they will be more autonomous and more adaptable to the changing demands of
authentic L2 communication.
269
Sara Kennedy
Further Reading
Guido, M. G. (2012). ELF authentication and accommodation strategies in crosscultural immigration
encounters. Journal of English as a Lingua Franca, 1(2), 219–240.
An observational study of English interactions between Italian immigration officials and asylum-
seekers from African countries.
Kennedy, S. & Trofimovich, P. (2016). Research timeline: Second language communication strategies.
Language Teaching, 49(4), 494–512.
A timeline describing important concepts and research studies from the 1970s to the mid-2010s.
Sato, T., Yujobo, Y. J., Okada, T., & Ogane, E. (2019). Communication strategies employed by low-
proficiency users: Possibilities for ELF-informed pedagogy. Journal of English as a Lingua Franca,
8(1), 9–35.
Lower-proficiency learners of English, a population not often targeted in L2 speaking strategy research,
are observed in paired, task-based interaction with L1 English speakers, to examine the speaking
strategies learners use without instruction and to measure the effectiveness of those, with suggestions
provided for L2 speaking strategies to encourage or discourage for these low-proficiency learners.
Soekarno, M., & Ting, S. H. (2020). Fluency and communication strategy use in group interactions for
occupational purposes. Journal of English Language Teaching Innovations and Materials (JELTIM),
2(2), 63–84.
An exploration of the effects of extended explicit instruction in L2 speaking strategies in an occupational
(culinary) program in a hospitality college in Malaysia. Although participants received training on 11
different strategies over 12 weeks, only the use of time-gaining fillers and the repetition of words for
various purposes were used repeatedly by all participants across three data collection times.
References
Birlik, S., & Kaur, J. (2020). BELF expert users: Making understanding visible in internal BELF
meetings through the use of nonverbal communication strategies. English for Specific Purposes,
58, 1–14.
Bialystok, E. (1990). Communication strategies: A psychological analysis of second-language use. Oxford:
Bialystok, E. (1983). Some factors in the selection and implementation of communication strategies. In
C. Faerch & G. Kasper (Eds.), Strategies in Interlanguage Communication (pp. 100–118). London:
Longman.
Björkman, B. (2014). An analysis of polyadic English as a lingua franca (ELF) speech: A commu-
nicative strategies framework. Journal of Pragmatics, 66, 122–138.
Bøhn, H., & Myklevold, G. A. (2018). Exploring communication strategy use and metacognitive
awareness in the EFL classroom. In Å. Haukås, C. Bjørke, & M. Dypedahl (Eds.), Metacognition in
language learning and teaching (pp. 179–203). New York: Routledge.
Bongaerts, T. & Poulisse, N. (1989). Communication strategies in L1 and L2: Same or different?
Applied Linguistics 10(3), 253–268.
Brandt, A., & Jenks, C. (2013). Computer-mediated spoken interaction: Aspects of trouble in multi-
party chat rooms. Language@ Internet, 10(5). https://www.languageatinternet.org/
Chang, S.-Y. & Liu, Y. (2016). From problem-orientedness to goal-orientedness: Re-conceptualizing
communication strategies as forms of intra-mental and inter-mental mediation. System, 61, 43–54.
Collier, S. (2010). Getting things done in the L1 and L2: Bilingual immigrant women’s use of com-
munication strategies in entrepreneurial contexts. Bilingual Research Journal, 33(1), 61–81.
Dörnyei, Z. (1995). On the teachability of communication strategies. TESOL Quarterly 29(1), 55–85.
Du-Babcock, B. (2013). English as Business Lingua Franca: A comparative analysis of communication
behavior and strategies in Asian and European contexts. Ibérica, Revista de la Asociación Europea de
Lenguas para Fines Específicos, 26, 99–130.
Ehrenreich, S. (2018). Communities of practice and English as a lingua franca. In J. Jenkins, W. Baker,
& M. Dewey (Eds.), The Routledge handbook of English as a Lingua Franca (pp. 37–50). London:
Routledge.
Ervin, G. L. (1979). Communication strategies employed by American students of Russian. The
Fernández Dobao, A. (2012). Collaborative dialogue in learner–learner and learner–native speaker
interaction. Applied Linguistics, 33(3): 229–256. doi: 10.1093/applin/ams002
270
Firth, A. (1990). ‘Lingua franca’ negotiations: Towards an interactional approach. World Englishes,
9(3), 269–280.
Firth, A. (1996). The discursive accomplishment of normality: On ‘lingua franca’ English and con-
versation analysis. Journal of Pragmatics, 26(2), 237–259.
Firth, A. & Wagner, J. (1997). On discourse, communication, and some fundamental concepts in SLA
research. The Modern Language Journal, 81(3), 285–300.
Gullberg, M. (2006). Handling discourse: Gestures, reference tracking, and communication strategies in
early L2. Language Learning, 56(1), 155–196.
Guo, J. (2011). Empirical studies on L2 communication strategies over four decades: Looking back and
ahead. Chinese Journal of Applied Linguistics, 34(4), 89–106.
House, J. (2003). Misunderstanding in intercultural university encounters. In J. House, G. Kasper & S.
Ross (Eds.), Misunderstanding in social life: Discourse approaches to problematic talk (pp. 41–104).
London: Pearson.
Hsieh, A. F.-Y. (2014). The effect of cultural background and language proficiency on the use of oral
communication strategies by second language learners of Chinese. System, 45, 1–16. doi: 10.1016/
j.system.2014.04.002
Hung, Y.-W. & Higgins, S. (2016). Learners’ use of communication strategies in text-based and video-
based synchronous computer-mediated communication environments: Opportunities for language
learning, Computer Assisted Language Learning, 29(5),901–924. doi: 10.1080/09588221.2015.1074589
Jain, P., & Krieger, J. L. (2011). Moving beyond the language barrier: The communication strategies
used by international medical graduates in intercultural medical encounters. Patient Education and
Counseling, 84(1), 98–104.
Jamshidnejad, A. (2011). Developing accuracy by using oral communication strategies in EFL inter-
actions. Journal of Language Teaching and Research, 2(3), 530–536. doi: 10.4304/jltr.2.3.530
Jenks, C. J. (2009). Getting acquainted in Skypecasts: Aspects of social organization in online chat
rooms. International Journal of Applied Linguistics, 19, 26–46.
Kaur, J. (2011). Raising explicitness through self-repair in English as a lingua franca. Journal of
Pragmatics, 43(11), 2704–2715. doi: 10.1016/j.pragma.2011.04.012.
Kennedy, S. (2017). Using stimulated recall to explore the use of communication strategies in English
lingua franca interactions. Journal of English as a Lingua Franca, 6(1), 1–27.
Kongsom, T. (2009). The effects of teaching communication strategies to Thai learners of English. In
G. Raţă (Ed.), Language education today: Between theory and practice (pp. 154–168). Newcastle
upon Tyne, UK: Cambridge Scholars.
Kouwenhoven, H., Ernestus, M., & van Mulken, M. (2018). Communication strategy used by Spanish
speakers of English in formal and informal speech. International Journal of Bilingualism, 22(3),
285–304.
Lam, W. Y. K. (2010). Implementing communication strategy instruction in the ESL oral classroom:
What do low-proficiency learners tell us? TESL Canada Journal, 27(2), 11–30.
Lauriks, S., Siebörger, I., & De Vos, M. (2015). “Ha! Relationships? I only shout at them!” Strategic
management of discordant rapport in an African small business context. Journal of Politeness
Research, 11(1), 7–39.
Levelt, W. J. (1983). Monitoring and self-repair in speech. Cognition, 14(1), 41–104.
Levelt, W. J. M. (1993). Language use in normal speakers and its disorders. In G. Blanken,
J. Dittmann, H. Grimm, C. Marshall & C.-W. Wallesch (Eds.), Linguistic disorders and pathologies
(pp. 1–15). Berlin: de Gruyter.
Levelt, W. J. M. (1995). The ability to speak. From intentions to spoken words. European Review,
3(1),13–23.
Mortensen, J. (2017). Transient multilingual communities as a field of investigation: Challenges and
opportunities. Journal of Linguistic Anthropology, 27(3), 271–288.
Nakatani, Y. (2005). The effects of awareness-raising training on oral communication strategy use. The
Newgarden, K., & Zheng, D. (2016). Recurrent languaging activities in World of Warcraft: Skilled
linguistic action meets the Common European Framework of Reference. ReCALL, 28(3), 274–304.
Perry, F. (2011). Research in Applied Linguistics: Becoming a discerning consumer (2nd edn). New York:
Routledge.
Pipes, A. (2019). Examining creativity as an individual difference in second language production [Doctoral
dissertation, Georgetown University]. Georgetown University Institutional Repository.
271
Sara Kennedy
Pitzl, M.-L. (2019). Investigating communities of practice (CoPs) and transient international groups
(TIGs) in BELF contexts. Iperstoria, 13. https://iperstoria.it/index
Poulisse, N., Bongaerts, T., & Kellerman, E. (1987). The use of retrospective verbal reports in the
analysis of compensatory strategies. In C. Faerch & G. Kasper (Eds.), Introspection in second lan-
guage research (pp. 213–229). Clevedon, Avon: Multilingual Matters.
Rabab’ah, G. (2016). The effect of communication strategy training on the development of EFL
learners’ strategic competence and oral communicative ability. Journal of Psycholinguistic Research,
45, 625–651. doi: 10.1007/s10936-015-9365-3
Richards, J. (1971). Error analysis and second language strategies. Language Sciences, 17(1), 12–22.
Rosenbaun, L. (2016). Interaction management in recreational video-mediated communication:
Participation, multiactivities, and visual playfulness in multiparty Google+ Hangouts. [Doctoral dis-
sertation, University of Haifa].
Rossiter, M. J. (2003). The effects of affective strategy training in the ESL classroom. TESL-EJ,
7(2), 1–20.
Scullen, M. E. & S. Jourdain (2000). The effect of explicit training on successful circumlocution: A
classroom study. In J. F. Lee & A. Valdman (Eds.), Form and meaning: Multiple perspectives
(pp. 231–252). Boston: Heinle & Heinle.
Shih, Y.-C. (2014). Communication strategies in a multimodal virtual communication context. System,
42(1), 34–47.
Smith, B. (2003). The use of communication strategies in computer-mediated communication. System,
31(1), 29–53.
Tsuchiya, K., & Handford, M. (2014). A corpus-driven analysis of repair in a professional ELF
meeting: Not ‘letting it pass’. Journal of Pragmatics, 64, 117–131.
Wigham, C. R., & Chanier, T. (2013). A study of verbal and nonverbal communication in Second Life-
the ARCHI21 experience. Computer Assisted Language Learning, 28(3), 260–283.
Zhang, W. & Liu, M. (2013). Evaluating the impact of oral test anxiety and speaking strategy use on
oral English performance. Journal of Asia TEFL,10(2), 115–148.
272
19
TEACHING VOCABULARY
Marlise Horst
Knowing the words one wants to say is clearly at the very heart of what it means to speak a
new language. The centrality of second language (L2) vocabulary knowledge is vividly il-
lustrated in a 2008 study by Hilton in which learners were asked to describe a clip from a
silent movie. The closely transcribed speech data feature painfully long silences and deep
sighs as speakers search for the language they need. Interestingly, Hilton found that an
overwhelming proportion of the hesitations lasting three seconds or longer could be ascribed
to vocabulary problems. The learners were unable to retrieve a needed word or settled on a
wrong one. Hesitations ascribed to problems with phonology or grammar were relatively
few, the pauses were shorter, and the speakers usually were able resolve the problems and
move on with their stories. But when the problems were lexical, the speaking tended to break
down. This chapter discusses what learners and their teachers can do to overcome the lexical
deficits that stand in the way of successful spoken communication. Techniques that have
shown their effectiveness in teaching and learning spoken L2 vocabulary are reviewed.
We begin with key concepts used to describe vocabulary and learners’ lexical knowledge,
drawing on an excerpt from Hilton’s 2008 study. Transcription codes have been removed for
simplicity; the speaker’s L1 is French (p. 161):
SPEAKER: …and [2 seconds] the result is [1 second] that uh the fridge [8 seconds]… I uh don’t
know uh uh [laughter] tomber (= English fall).
INTERVIEWER: falls down.
SPEAKER: tomber?
INTERVIEWER: mhmm falls down.
SPEAKER: falls down
INTERVIEWER: mhmm
SPEAKER: um [6 seconds] falls down uh [2 seconds] sur? (= English onto]
INTERVIEWER: onto
SPEAKER: on the car.
The exchange highlights the speaker’s need of a larger productive L2 vocabulary size and
knowledge of the English equivalent of French tomber, in particular. Productive vocabulary
DOI: 10.4324/9781003022497-24 273

Marlise Horst
size is defined as the number of L2 words a learner can say or write; it stands in contrast to
receptive vocabulary size, defined as the number of words a learner can recognize and as-
sociate with a basic meaning in reading or listening contexts. Research shows that learners
know more L2 words receptively than productively (Webb, 2008). Thus, it is possible that the
speaker in the example would recognize the basic meaning of fall when hearing it used, but,
as we see in the excerpt, cannot readily produce the spoken form. The tomber/fall connection
has not yet become fully automatized for production. The example makes the point that the
L2 word knowledge needed for speaking involves having a large mental store of concepts
linked to L2 forms and crucially, the ability to make form-meaning connections quickly.
There is evidence that these two go hand in hand. Uchihara and Saito (2019) found that
learners with higher productive vocabulary scores tended to speak with fewer pauses and
repetitions and at a faster tempo. In another study, the comprehensibility of L2 speech was
linked to speakers’ ability to use a wide variety of words appropriately with few pauses and
hesitations (Saito et al., 2016).
There is more to know about the form of an L2 word than its sound and spelling. In the
case of fall in the excerpt, the learner may have intended to say the fridge fell. Well-developed
knowledge of fall includes awareness of fell and its other forms falls, falling and fallen. To
take the multiple morphological forms of a word into account, vocabulary researchers have
devised the pedagogical unit word family. A word family is defined as a headword along with
its grammatical inflections and frequently used derivations (Webb & Nation, 2017). The
definition raises the question of what instruction should include: For example, in explaining
expect, a teacher might also raise awareness of derived forms such as expectant and un-
expectedly – depending on the level and needs of the learners.
Nation (2013) addresses the question of what it means to know a word in an often-cited
scheme that considers form, meaning, and use. Knowing a word’s form includes receptive
and productive knowledge of its spelling, pronunciation, and morphology. Knowledge of
meaning involves connecting a word form to a basic concept, but also knowing extended and
metaphorical senses of a word. Falling has the literal meaning of losing balance, but also
refers to a declining amount or dying in battle. An important use aspect is the word’s
grammatical patterning; like other intransitive verbs, fall is not followed by an object in
sentences. Another use consideration is collocation or co-occurrence, broadly defined as the
company a word keeps. Frequent collocates of fall are down and over; note that the learner in
the excerpt above was unable to produce the needed fall down onto sequence. The point of
Nation’s scheme is to show what “deep” end-state knowledge of a word looks like; it is not a
lesson plan. Instructors could hardly teach all the knowledge aspects in a single vocabulary
teaching episode and learners would be overwhelmed. But the scheme has (at least) two
important messages for teaching. One is that vocabulary knowledge has depth; there is more
to teach and learn than the basic meaning of a new word and its spoken form. The second is
to not expect too much initially; deep knowledge is acquired gradually over time. A single
instance of saying the new word aloud for students and providing a quick definition is un-
likely to result in their full-fledged ability to use the new L2 word in sentences of their own.
Spoken Vocabulary in Language Teaching

The role given to spoken vocabulary in language instruction has waxed and waned. This
overview begins with the Audiolingual method, which emerged after World War II. It em-
phasized the formation of good language “habits” following behaviourist principles; this was
274
Teaching Vocabulary
to be achieved by intensive oral drilling intended to perfect pronunciation and reinforce

awareness of sentence patterns. As such, the method can be seen to promote knowledge of
spoken vocabulary. However, as Zimmerman (1997) explains, the vocabulary used in the
drills was simple, familiar and limited; new vocabulary was added only as needed to add
variation to structures practiced in the drills. The Audiolingual method has fallen out of
favour, but there is renewed interest in “old-fashioned” repetition and memorization tech-
niques because of their usefulness in developing automatized word knowledge. We will re-
turn to this point.
In the 1970s and 1980s, the idea that L2 knowledge developed through exposure to
comprehensible input gained widespread acceptance. Today, vocabulary researchers use the
term “incidental” to describe this learning process; incidental acquisition is defined as “the
learning of vocabulary as the by-product of any activity not explicitly geared to vocabulary
learning” (Hulstijn, 2001, p. 271). According to the strong version of this position, there is no
need to teach words at all: Vocabulary knowledge develops naturalistically through exposure
to the language; as learners are focused on comprehending language, they work out the
meanings of unfamiliar words from context (Krashen, 1982, 1989). Certainly, it is true that
young children learn to speak their home language(s) by attending to the speech they hear in
their surroundings. Research indicates that children who are exposed to greater amounts of
caretaker speech have larger spoken vocabularies than children who hear less input (Hart &
Risley, 2003). In the L2 context, Cummins (2008) has shown that young immigrant learners
tend to acquire the spoken vocabulary used in conversations fairly quickly – more readily
than the less-often-heard written vocabulary they need for reading and learning academic
content at school. Since a large proportion of spoken language is made up of very frequently
used words, a great deal of basic spoken vocabulary (hello, good, teacher, thanks) may be
readily “picked up” by L2 learners of all ages and might not need to be explained and taught
(though such spoken input may be much less available in EFL contexts). Studies of L2
learners’ speech consistently show that more proficient learners use a greater variety of
words; these differences can probably be ascribed – at least in part – to incidental processes.
For example, Daller and Xue (2007) found that the spoken vocabulary of Chinese learners of
English studying at British university was more diverse than that of students in a programme
for advanced English studies in China. Presumably, the difference is explained by greater
amounts of exposure to English input for the group studying in Britain.
In 1985, Swain put forward the output hypothesis, which emphasized the importance of
speaking. The hypothesis posits that in addition to exposure to comprehensible input,
speaking is critical for language development because learners notice gaps in their language
knowledge as they attempt to express themselves; production also provides opportunities to
get feedback as they try out words and structures they may not be sure about. A study of
learning Spanish vocabulary by De la Fuente (2002) provided confirmation of the output
hypothesis. Learners who used the target vocabulary to give oral instructions to complete a
task showed greater gains on a test of spoken vocabulary than those in a listening condition
who heard the vocabulary used in the instructions for completing the task. In the 1980s and
1990s, researchers made close observations of learners’ spoken interactions with a view to
understanding how they contributed to making L2 features learnable. Interestingly, although
the focus was largely on morpho-syntactic development, the studies found that learners
usually negotiated words rather than structures (Smith, 2004). An investigation by Newton
(1993) found that in the majority of cases, learners’ negotiations of word meanings in
speaking activities led to useful and accurate information about new words.
Today communicative language teaching and its more recent manifestation, task-based
language teaching (TBLT), are the state of the art. These approaches emphasize authentic
275
Marlise Horst
language use for real-life purposes with activities that involve learners in problem-solving
tasks such as requesting information by phone or planning a class outing. Speaking is clearly
central in such activities, but questions have been raised about their usefulness for voca-
bulary learning. Bruton’s (2005) review of TBLT studies shows that the activities do not
result in the acquisition of much new vocabulary. But the focus on new words may be
missing the point. In addition to confirming the importance of teaching new vocabulary,
Nation’s discussion of key instructional “strands” (2013) highlights the value of commu-
nicative activities for learning vocabulary that is not new: Meaning-focused interactive tasks
usefully push learners to consolidate vocabulary that is partially known but not yet well
established. He also points to the value of activities that develop more fluent use of known
words by including a timed element that pushes speakers to perform faster. An example is the
4/3/2 activity in which learners tell the same story three times to different listeners with
decreasing time for each retelling. In their study of this activity, Arevart and Nation (1991)
found that with each retelling, learners produced more words per minute and there were
fewer hesitations.
O’Dell (1997) highlights a different concern in her discussion of communicative syl-
labus design: The vocabulary of task scenarios is often chosen haphazardly; a context for
practicing a particular language function is created with little regard to whether the vo-
cabulary that is introduced is useful to know. For example, a desert island activity in-
tended to practice planning structures (we will need/we won’t need) in a recent ESL
textbook I examined offers a set of pictured items including hat, matches, towel, flashlight,
GPS, thermos and sunscreen. In this set, thermos and sunscreen (and possibly others) are
infrequent words of questionable importance; it would have been easy enough to design
the activity to include more generally useful terms like water container and sun protection
instead. As the learners will use the words repeatedly in the activity, selecting “high-value”
vocabulary matters.
The Corpus Revolution

What is a principled basis for choosing the words to include in learning activities? A sur-
prisingly definitive answer comes from a development that galvanized L2 vocabulary
teaching and research in the 1980s and 1990s. This was access to large electronically stored
linguistic corpora (singular: corpus). These huge collections of authentic language, now many
millions of words large, can be analyzed to determine which words are most frequent in a
language and therefore, the most useful to study and learn. Frequent words are important
because they offer more coverage; for example, it is evident that in general English, container
will be used more broadly than thermos. Corpus analyses also reveal frequently used word
affixes and patterns of co-occurring words.
In addition to corpora intended to represent native speakers’ language use, there are now
many learner corpora. These are large collections of speech (or writing) produced by L2
learners that can be used to investigate their language development. The fall/tomber excerpt
at the beginning of this chapter comes from a learner corpus: PAROLE (PARallèle, Oral en
Langue Etrangère) is a collection of spoken language produced by learners of English,
French, and Italian (Hilton et al., 2008).

In this part, I discuss what corpus studies reveal about the scope and character of the vo-
cabulary knowledge needed for L2 speech. As we will see, the learner’s task is large.
276
Teaching Vocabulary
Which Words Do L2 Speakers Need to Know?

Comparisons of large spoken and written language corpora consistently show that speech is
less lexically dense than writing (Biber et al., 2002). For the L2 learner, this means that fewer
words are needed for speaking and listening than for reading and writing. Can corpus re-
search reveal an approximate number of word families a learner needs to know to become a
“good” speaker? Much of the research that explores vocabulary size has focused on the
recognition of written words and its relationship to reading comprehension rather than on
spoken vocabulary. For example, Nation (2006) has shown that learners of English need to
be able to recognize the meanings of as many as 8,000 or 9,000 frequent word families to be
able to read and readily comprehend texts designed for native speakers. Studies of listening
comprehension point to a much smaller figure for spoken production. Van Zeeland and
Schmitt (2013) estimate that with receptive knowledge of 2,000 to 3,000 frequent English
word families, learners can understand about 95% of the words they encounter in the spoken
mode. Research by Webb and Rodgers (2009) points to a similar figure; they found that 95%
of the words in their television corpus were covered by the 3,000 most frequent families.
While this work focuses on receptive knowledge and listening, it provides a useful estimate
for speaking. It indicates that productive knowledge of 2,000 or 3,000 frequent families (and
some proper nouns) will take learners a very long way toward being able to say what they
want to say in English.
The critical importance of learning several thousand high-frequency word families is a
consistent theme in considerations of L2 vocabulary instruction (Horst, 2019). In the case of
English, high-frequency vocabulary is defined as the words on corpus-based lists of the 1,000,
2,000, and 3,000 most frequent English word families (Schmitt & Schmitt, 2014). Fall, the
word that the speaker at the outset of this chapter struggled to produce, is a 1,000-level word.
Well-designed corpus-based frequency lists are now widely available for English and in-
creasingly for other languages. Of note in this chapter on spoken vocabulary are the lists by
Nation (2012) derived from the British National Corpus (BNC) and the Corpus of
Contemporary American English (COCA). Nation’s BNC-COCA frequency lists have been
specially designed to overcome the heavy influence of formal written text that is character-
istic of large native-speaker corpora; the lists reflect spoken and written English in a more
balanced way. This makes them a good all-purpose choice for developing vocabulary skills
with learners of general English. The BNC-COCA lists are available at Paul Nation’s
homepage and at the frequency homepage at Lextutor site (www.lextutor.ca). There are also
useful frequency-informed lists that focus specifically on the spoken mode. A recent example
designed for use with university learners of English is the Academic Spoken Word List (Dang
et al., 2017). This is a list of 1,741 word families used frequently in a 13-million-word corpus
of university lectures, tutorials, seminars and labs across a wide variety of academic
disciplines.
Not Just Single Words

In addition to the prominent use of high-frequency single words, analyses of conversations
reveal that native speakers use frequently recurring multi-word chunks. In their analysis of
the Cambridge and Nottingham Corpus of Discourse in English (CANCODE), a large
collection of transcribed conversational speech, O’Keeffe et al. (2007) found that the multi-
word chunks at the moment and all the time were more frequent than the single words ex-
pensive and fun (p. 47). Such phrases serve important pragmatic functions; that is, they make
conversations feel conversational. For example, vagueness markers such as things like that
277
Marlise Horst
soften what might otherwise be blunt statements of fact. You know – the most frequently
occurring pragmatic chunk in both the CANCODE and the spoken part of the BNC (Shin
& Nation, 2008) – signals informality and points to understanding between the speaker and
the listener. It is worth noting that the core meaning of know (have information in your
mind) differs considerably from its meaning in you know as used in conversations to signal
“I’m thinking what to say next”, or “I wonder if you’ve understood my point”. From the
learner’s perspective, know and you know are arguably two different lexical entries. Shin and
Nation (2008) provide a useful list of 1,000 frequently occurring phrases based on analysis of
the spoken part of the BNC.
So far, we have seen that productive knowledge of around 3,000 frequent word families is
an important goal in attaining spoken proficiency in English. The learner’s task is clearly
very large (though much smaller than the 8,000 families Nation’s 2006 study indicates are
needed for comprehending written English). But the task is larger still given that word fa-
milies include multiple members. In principle, knowledge of enjoy entails also knowing de-
rived forms such as enjoyable and enjoyment and other members of the enjoy family. Learners
are also eventually expected to know multiple meanings of words and their use in colloca-
tions (fall apart, fall asleep) and expressions (fall in love, fall off the wagon). In practice,
research shows that even advanced learners have difficulty producing derived forms of fa-
miliar base words (Schmitt & Zimmerman, 2002) and collocation errors persist in the writing
of advanced university learners of English (Laufer & Waldman, 2011). Finally, if we consider
that fluent speech also involves knowing hundreds or even thousands of multi-word con-
versational chunks and their pragmatic uses, the learning task is very large indeed.
The next part describes research-informed teaching and learning techniques. The focus
is on ways of promoting the acquisition of large amounts of rapidly accessed spoken
vocabulary.
Teaching the Spoken Form of Words

The central challenge in learning a new L2 word is the linking of an unfamiliar string of
sounds (or sequence of letters) to a known concept. The sounds of words are arbitrary, and
this can make establishing and retaining new form-meaning connections difficult. In the case
of fall, for example, there is no logical reason why the initial /f/ and other sounds in the word
should be associated with the concept of losing balance. The burden of learning and retaining
the new L2 label for a familiar concept will be heavy or light depending how well it fits into
the learners’ existing knowledge (Craik 1972; Ellis & Beaton, 1993).
In teaching the spoken form of new words, there are many ways of linking to what
learners already know. One is to draw on their prior knowledge of sound-symbol corre-
spondences by making sure that learners see a new word in writing as it is said aloud.
Learners of English whose L1 uses the Roman alphabet have an advantage as the written
form offers helpful clues to pronunciation. Learners may need to associate some familiar
letters with new sounds – French quitter does not sound exactly like quitter in English, for
example. Studies of video viewing show that captioned conditions in which learners hear and
see the written transcript of the language lead to more incidental acquisition of (written)
word forms than uncaptioned viewing (Perez et al., 2018).
Unfamiliar sounds contribute to making L2 words difficult to learn (Laufer, 1997).
Thus, the difficulty that Japanese speakers may experience in pronouncing English words
can be ascribed to the fact that Japanese does not have some of the vowel and consonant
278
Teaching Vocabulary
sounds used in English. There is evidence that learners avoid using phonologically difficult
words. Repeating the problem words aloud is useful; repetition makes them more familiar
(Ellis & Beaton, 1993). Hu (2008) suggests that raising learners’ phonological awareness in
activities such as reciting rhymes and segmenting words into sound units may also be
useful. Familiarizing learners with the stress of a new word is particularly important;
Field’s (2005) study showed that use of appropriate word stress plays a key role in making
L2 speech intelligible.
The connection between form and meaning is easily made when the new L2 word is a
helpful cognate. A cognate is a word that has a formal resemblance to a word in another
language; usually (but not always), there is a shared meaning. For example, for Dutch-
speaking learners, the English verb form fall is very similar to the Dutch label (vallen) for this
concept and this makes it easy to remember and produce. The Germanic roots of English and
its heavy borrowing of French, Latin and Greek vocabulary give speakers of many European
languages a distinct cognate advantage when learning the vocabulary of English. Other
languages have borrowed extensively from English; there are thousands of English loan-
words in Japanese (Daulton, 2008). The cognate advantage extends to morphology for many
European learners. The suffix-or is a noun-maker and -al an adjective-maker in both English
and Spanish; there are many other examples. Research shows that learners produce cognates
more easily than non-cognates (Rogers et al., 2015), but there is also evidence that learners
tend to underuse this helpful strategy. White and Horst (2012) found that showing learners
how to recognize cognates benefitted their word learning.
In the many cases where the L1 does not offer an obvious link to the sound of a new L2
word, teachers can work with learners to create mnemonics. The keyword method is an
often-cited example (Webb & Nation, 2017); it involves linking the meaning of a new L2
word to a sound-alike L1 word via an image. For example, a teacher of English-speaking
learners of French might say, “To remember flèche (the French word for arrow), picture an
arrow piercing your flesh”. In a study where L2 learners were directed to study and remember
word–picture pairs, the effectiveness of this technique was confirmed (Barcroft, 2009a).
However, the study also showed learning that using multiple strategies was effective; these
included translating the words into the L1, visualizing the word and the matching picture,
and repeating the words silently.
Cognitive psychology research shows that when new information is first met, the number
and quality of elaborations are important in facilitating learning and retention (Anderson,
1990). This is an argument for presenting a new L2 word in varied ways: The teacher can say
it, write it, translate it, define it, and in the case of a word like earthquake, ask the learners if
they can see familiar parts in it. The learners can repeat the word, write down the spelling,
draw a picture, share a related personal experience, consider whether there is an L1 cognate,
act it out, and more. Richly elaborated processing contributes to making a new word
memorable by providing multiple mental pathways for retrieving it. Joe et al. (1996) suggest
that speaking activities based on texts are useful in this regard. When students reconstruct
what they have read in a retelling activity or act out the events in a role-play, there are
opportunities to produce new words from the text repeatedly, clarify them to a partner, and
use them in ways that vary from the original sentence contexts.
However, Barcroft warns against elaborated approaches to teaching new words (2002,
2009b). His TOPRA (type of processing – resource allocation) model hypothesizes that using
cognitive resources to work on one kind of learning reduces the amount of attention available
for attending to another. His research has shown that when activities focus learners’ attention
on the meanings of new words, acquisition of their forms is disadvantaged. For example, in an
investigation of TOPRA predictions (2009b), Barcroft asked Spanish-speaking learners of
279
Marlise Horst
English to read a passage that contained unknown target words along with their L1 transla-
tions. Some of the learners were also asked to write another Spanish synonym for each target,
which meant that their attention was focused closely on the meanings of the new English
words. Performance on a surprise spelling test given after the reading activity showed that
learners who had read the passage without the synonym task were substantially better at
providing accurate or near-accurate spellings of the target words.
The research discussed in this part presents an apparent contradiction: there are convincing
arguments for richly elaborated ways of presenting new vocabulary but also evidence that early
attention to word meanings may disadvantage the acquisition of word forms. Further research is
needed to clarify this issue. In the meantime, a possible pedagogical way forward is illustrated in
a suggestion for introducing the lexical chunk I’m sorry to beginning learners of English from
Calderón and Soto (2017, p. 38). Note that in the following sequence, the teacher begins with
practicing the spoken form only; then she gradually introduces semantic aspects.
TEACHER: Say I’m sorry three times after I say it.

STUDENTS: I’m sorry, I’m sorry, I’m sorry.
TEACHER: Now turn to your other buddy on the left, say I’m sorry three times, and pretend
you are crying.
STUDENTS: [rubbing their eyes and pretending to cry] I’m sorry, I’m sorry, I’m sorry!
The final steps in the sequence are explaining I’m sorry briefly using a short definition and
then providing two simple examples of situations in which people feel sorry.
Increasing Lexical Access Speed

There is renewed research interest in an old method for learning vocabulary: practice with
word cards (Horst, 2019; Nakata, 2015). A word card has a word written on one side and
information about its meaning – a definition or a translation – on the other. An important
argument for studying with word cards is their role in building automatized form-meaning
connections over the course of many mental retrievals. Every time the learner looks at a word
on a card and tries to retrieve the meaning or looks at the meaning and tries to say the word,
the form-meaning connection is strengthened. Many retrievals may be needed before the
mental connections can be made rapidly (Baddeley, 1990). Griffin and Harley (1996) report
stronger learning effects for productive retrieval whereby the learner looks at the L1
translation and produces the spoken L2 form than for practice in the reverse L2-to-L1 di-
rection. Elgort (2011) showed that new word knowledge gained through studying word cards
is not as static or limited as might be expected: In her investigation, the vocabulary learned
through word card study became well integrated into L2 learners’ mental lexicons and could
be rapidly accessed in a manner similar to the way learners draw on existing L1 and L2
lexical knowledge in real language use.
There are now electronic word card apps and games available for learning the vocabulary
of a host of different languages. These include Quizlet, Memrise, Anki, VTrain, and Lextutor
flashcard builder. In a study of young francophone learners of English, Cobb and Horst
(2011) investigated the effects of playing games that required rapid responses on My Word
Coach, a suite of electronic word activities developed by UbiSoft. After 2 months of using the
NinentendoTM game sets, the learners showed pronounced gains in lexical access speed
measured as the numbers of words they could read aloud in one minute. Other evidence of
spoken fluency gains included a marked increase in the number of English words they used in
a story-telling task and decreased reliance on French.
280
Teaching Vocabulary
Any activity that involves speaking provides opportunities for productive retrieval as
learners search their mental lexicons for the words they want to say, but activities can be
designed to enhance the retrieval aspect. Learners can be asked to put aside a recently studied
text and reconstruct it orally with a partner; this involves them in reusing new and partially
known words in the text multiple times. Reviewing previously taught vocabulary in class is
another important way of providing learners with added opportunities to retrieve words and
say them. How many productive retrievals are needed to develop fully automatized use of new
words in speech? So far, the answer is unclear, but studies of incidental vocabulary acquisition
through reading show that learners need to meet a new word repeatedly – as often as ten times
or more – in order for it to be learned receptively (Webb & Nation, 2017). We can assume that
the ability to use a new word in speaking also requires many repeated learning opportunities.

This part summarizes recommendations for teaching spoken vocabulary outlined in the
chapter and addresses the following question: Can instruction as practiced in today’s com-
municatively oriented language classrooms support the acquisition of the vocabulary
knowledge needed for speaking?
When communicative language teaching was new, access to meaningful comprehensible input
was thought to be of critical importance, and we have seen that exposure to input can be useful
for the incidental acquisition of spoken vocabulary. Some very frequent words used in speech
may not need to be taught; incidental effects are most likely to be effective when exposure is
abundant, for example, in contexts where the L2 is also the majority language spoken outside of
class. As the communicative method evolved, proponents of the output hypothesis showed that
actual speaking was more effective for the retention of newly learned vocabulary than attending
to input. This intuitively sensible finding remains central today. Originally, interactive speaking
activities were valued for the clarifications of new words (and other language features) learners
received over the course of completing tasks, as well as for opportunities to try out new words
and get feedback. Today, researchers emphasize their role in developing automaticity through
using and reusing known and partially known vocabulary. In introducing the spoken forms of
new words, using a variety of elaborative techniques that link new knowledge to old has been
shown to be effective. Yet, there is also evidence that instruction that seeks to build familiarity
via multiple cognitive pathways may result in overload and disadvantage the acquisition of word
forms. More research is needed to resolve this apparent contradiction.
Recent insights from corpus linguistics point to learning methods that may seem in-
compatible with current communicative approaches. Work in this area points to the efficacy
of analyzing a suitable corpus to identify the words that are most useful for L2 learners to
study and learn. In the case of general spoken English, we saw that productive knowledge of
3,000 frequent families is critical. In addition, L2 speakers need to know many pragmatic
phrases. Laufer (2006) has argued strenuously that there is not enough time in commu-
nicatively oriented classrooms to support the acquisition of such substantial amounts of
vocabulary. Her study showed that intentional word learning was far more effective for
retaining new words than an incidental condition in which learners read a passage that
contained the target items and answered comprehension questions. In the intentional con-
dition, the learners studied a list of new L2 words and their translations in preparation for a
test. The efficiency of memorizing L1–L2 word pairs is well established (Nation, 1982); this
research shows that in an of hour of intentional study, learners can memorize dozens of new
L2 words and their meanings and that rates of long-term retention for words learned this
way are high. By contrast, Milton’s (2009) examination of a variety of L2 instructional
281
Marlise Horst
contexts where there was no special focus on vocabulary determined that one hour of time
spent in class typically results in knowledge of just four new word families.
6 Future Directions
Rote memorization of vocabulary seems at odds with current approaches to language teaching.
How it would fit into a rethought version of the communicative approach is unclear; course
designers might look to popular electronic vocabulary games for ways of making learning lists of
words meaningful and fun. Certainly, the well-established benefits of real spoken interaction in
communicative classrooms should not be compromised, but given the size of the spoken vo-
cabulary “syllabus” outlined in this chapter, the efficiency of old-fashioned memorization for
building a large amount of rapidly accessed word knowledge is too powerful to ignore.
As for future investigations of spoken L2 vocabulary development, there is a need for
studies that assess learning gains in terms of actual spoken production. Such studies are
surprisingly scarce; almost all the research discussed in this chapter used written measures.
While it may be reasonable to assume that ability to produce a recognizable spelling of a new
word means that the L2 learner can also say it, written production is hardly a satisfactory
basis for conclusions about the usefulness of learning activities for spoken production.
Finally, pedagogical studies conducted in actual language classrooms also proved to be
scarce. Given the centrality of speaking ability, research that sheds light on effective methods
for learning spoken vocabulary in real L2 classrooms is needed and important.
Further Reading
Hilton, H. E. (2008). The link between vocabulary knowledge and spoken L2 fluency, Language
Hilton draws on the findings of her speaking study to make a strong argument for the revival of direct
vocabulary teaching and memorization; she expresses strong doubts about the efficacy of commu-
nicative language teaching methods.
Horst, M. (2019). Focus on vocabulary learning. Oxford: Oxford University Press.
This volume expands on corpus-informed approaches to teaching vocabulary and other themes out-
lined in this chapter. The emphasis is on learners of English aged 5-18 in schools, but the discussion is
relevant to many L2 learning contexts.
Joe, A., Nation, P., & Newton, J. (1996). Vocabulary learning and speaking activities. English Teaching
Forum, 34(1), 2–7.
This short, teacher-friendly paper remains a classic. It offers practical ideas for designing speaking
activities and explains the theory and research underpinning the suggested activities in an acces-
sible way.
Thornbury, S. (2002). How to teach vocabulary. Harlow, UK: Pearson Education Limited.
This book is popular with instructors in university training programs for teachers of English as a second
language. Thornbury has an engaging style; research-informed principles are supported by many ex-
amples and the volume is richly illustrated.
References
Anderson, J. R. (1990). Cognitive psychology and its implications (3rd edn). New York: Freeman.
Arevart, S., & Nation, P. (1991). Fluency improvement in a second language. RELC Journal,
22(1), 84–94.
Baddeley, A. (1990). Human memory. London: Lawrence Earlbaum Associates.
Barcroft, J. (2002). Semantic and structural elaboration in L2 lexical acquisition. Language Learning,
52, 323–363.
Barcroft, J. (2009a). Strategies and performance in intentional L2 vocabulary learning, Language
Awareness, 18(1), 74–89.
282
Teaching Vocabulary
Barcroft, J. (2009b). Effects of synonym generation on incidental and intentional vocabulary learning
during reading. TESOL Quarterly, 43(1), 79–103.
Biber, D., Conrad, S., Reppen, R., Byrd, P., & Helt, M. (2002). Speaking and writing in the university:
A multidimensional comparison. TESOL Quarterly, 36(1), 9–48. 10.2307/3588359
Bruton, A. (2005). Task based language learning: For the state secondary FL classroom? Language
Learning Journal, 31, 55–68
Calderón, M., & Soto, I. (2017). Academic language mastery: Vocabulary in context. Thousand Oaks,
CA: Corwin.
Cobb, T., & Horst, M. (2011). Does Word Coach coach words? CALICO Journal, 28(3), 639–661.
Craik, F. I., & Lockhart, R. S. (1972). Levels of processing: A framework for memory research. Journal
of Verbal Learning & Verbal Behavior, 11(6), 671–684.
Cummins, J. (2008). BICS and CALP: Empirical and theoretical status of the distinction. In B. Street &
N. H. Hornberger (Eds.), Encyclopedia of language and education, Volume 2: Literacy (2nd edn,
pp. 71–83). New York: Springer.
Daller, H., & Xue, H. (2007). Lexical richness and the oral proficiency of Chinese EFL students. In H.
Daller, J. Milton, & J. Treffers-Daller (Eds.), Modelling and assessing vocabulary knowledge
Dang, T. N., Coxhead, A., & Webb, S. (2017). The Academic Spoken Word List. Language Learning,
67, 959–997.
Daulton, F. E. (2008). Japan’s built-in lexicon of English-based words. Clevedon: Multilingual
Matters.
De la Fuente, M. J. (2002). Negotiation and oral acquisition of L2 vocabulary: The roles of input and
output in the receptive and productive acquisition of words. Studies in Second Language Acquisition,
24(1), 81–112.
Elgort, I. (2011). Deliberate learning and vocabulary acquisition in a second language. Language
Learning, 61(2), 367–413.
Ellis, N. C., & Beaton, A. (1993). Psycholinguistic determinants of foreign language learning. Language
Learning, 43(4), 559–617.
Field, J. (2005). Intelligibility and the listener: The role of lexical stress. TESOL Quarterly,39, 399 – 423.
Griffin, G. F., & Harley, T. A. (1996). List learning of second language vocabulary. Applied
Hart, B., & Risley, T. R. (2003). The early catastrophe. The 30 million word gap by age 3. American
Educator, 22, 4–9.
Hilton, H. E. (2008). The link between vocabulary knowledge and spoken L2 fluency, Language
Hilton, H. E., Osborne, N. J. Derive, M.-J., Suco, N., O’Donnell, J., Rutigliano, S., & Billard, S.
(2008). Corpus PAROLE. Chambéry, France: Université de Savoie.
Horst, M. (2019). Focus on vocabulary learning. Oxford: Oxford University Press.
Hu, C.-F. (2008). Rate of acquiring and processing L2 colour words in relation to phonological
awareness. Modern Language Journal, 92(1), 39–52.
Hulstijn, J. H. (2001). Intentional and incidental vocabulary learning: A reappraisal of rehearsal, ela-
boration and automaticity. In P. Robinson (Ed.), Cognition and second language instruction(pp.
258–287). Cambridge: Cambridge University Press.
Joe, A., Nation, P., & Newton, J. (1996). Vocabulary learning and speaking activities. English Teaching
Forum, 34(1), 2–7
Krashen, S. (1982). Principles and practice in second language acquisition. Oxford: Pergamon.
Krashen, S. (1989). We acquire vocabulary and spelling by reading: Additional evidence for the input
hypothesis. Modern Language Journal, 73(4), 440–464.
Laufer, B. (1997). What’s in a word that makes it hard or easy: Some intralexical factors that affect the
learning of words. In N. Schmitt & M. McCarthy (Eds.), Vocabulary: Description, acquisition and
pedagogy(pp. 139–155). Cambridge: Cambridge University Press.
Laufer, B. (2006). Comparing focus on form and focus on formS in second-language vocabulary
teaching. Canadian Modern Language Review, 63(1), 149–166.
Laufer, B., & Waldman, T. (2011). Verb-noun collocations in second language writing: A corpus
analysis of learners’ English. Language Learning, 61(2), 647–672.
Milton, J. (2009). Measuring second language vocabulary acquisition. Bristol: Multilingual Matters.
Nakata, T. (2015). Effects of expanding and equal spacing on second language vocabulary learning.
283
Marlise Horst
Nation, I. S. P. (1982). Beginning to learn foreign vocabulary: A review of the research. RELC Journal,
13(91), 14–36.
Nation, I. S. P. (2006). How large a vocabulary is needed for reading and listening? Canadian Modern
Language Review, 63(1), 59–82.
Nation, I. S. P. (2012). The BNC/COCA word family lists. Retrieved from http://www.victoria.ac.nz/
lals/about/staff/publications/paul-nation/Information-on-the-BNC_COCA-word-family-lists.pdf.
Nation, I. S. P. (2013). Learning vocabulary in another language (2nd edn). Cambridge: Cambridge
University Press.
Newton, J. (1993). Task-based interaction among adult learners of English and its role in second language
development. Unpublished PhD thesis. Victoria University of Wellington.
O’Dell, F. (1997). Incorporating vocabulary into the syllabus. In N. Schmitt & M. McCarthy (Eds.),
Vocabulary: Description, acquisition and pedagogy (pp. 258–278). Cambridge: Cambridge University
Press.
O’Keeffe, A., McCarthy, M., & Carter, R. (2007). From corpus to classroom: Language use and language
teaching. Cambridge: Cambridge University Press.
Perez, M. M., Peters, E., & Desmet, P. (2018). Vocabulary learning through viewing video: The effect of
two enhancement techniques. Computer Assisted Language Learning, 31(1-2), 1–26.
Rodgers, J., Webb, S., & Nakata, T. (2015). Do the cognacy characteristics of loanwords make them
more easily learned than non-cognates? Language Teaching Research, 19(1), 9–27.
Saito, K., Webb, S., Trofimovich, P., & Isaacs, T. (2016). Lexical profiles of comprehensible second
language speech: The role of appropriateness, fluency, variation, sophistication, abstractness, and
sense relations. Studies in Second Language Acquisition, 38(4), 677–701. doi: 10.1017/S0272263115
000297
Schmitt, N., & Schmitt, D. (2014). A reassessment of frequency and vocabulary size in L2 vocabulary
teaching. Language Teaching, 47, 484–503.
Schmitt, N., & Zimmerman, C. B. (2002). Derivative word forms: What do learners know? TESOL
Quarterly, 36(2), 145–171.
Shin, D., & Nation, I. S. P. (2008). Beyond single words: The most frequent collocations in spoken
English. ELT Journal, 62(4), 339–348.
Smith, B. (2004). Computer-mediated negotiated interaction and lexical interaction. Studies in Second
Swain, M. (1985). Communicative competence: Some roles of comprehensible input and comprehen-
sible output in its development. In S. Gass & C. Madden (Eds.), Input in second language acquisition
(pp. 235–253). Rowley, MA: Newbury House.
Uchihara, T., & Saito, K. (2019). Exploring the relationship between productive vocabulary knowledge
and second language oral ability. Language Learning Journal, 47(1), 64–75.
Webb, S. (2008). Receptive and productive vocabulary size. Studies in Second Language Acquisition,
30(1), 79–95.
Webb, S., & Nation, P. (2017). How vocabulary is learned. Oxford: Oxford University Press.
Webb, S., & Rodgers, M. P. H. (2009). Vocabulary demands of television programs. Language
Learning, 59(2), 335–366.
White, J., & Horst, M. (2012). Cognate awareness-raising in late childhood: Teachable and useful.
Language Awareness, 21, 181–196.
Van Zeeland, H., & Schmitt, N. (2013). Lexical coverage in L1 and L2 listening comprehension: The
same or different from reading comprehension? Applied Linguistics, 34(4), 457–479.
Zimmerman, C. B. (1997). Historical trends in second language vocabulary instruction. In J. Coady &
T. Huckin (Eds.), Second language vocabulary acquisition: A rationale for pedagogy (pp. 5–19). New
York: Cambridge University Press.
284
20
THE ROLE OF FORMULAIC
SEQUENCES IN L2 SPEAKING
Recent decades have witnessed an increased interest in formulaic sequences (FSs) in both first
language (L1) and second language (L2) research and teaching (e.g., Boers & Lindstromberg,
2012; Wood, 2010; Wray, 2002). FSs, also called multiword units/expressions/sequences,
lexical chunks, formulas, prefabs, routines, or prefabricated patterns (see Wray, 2002), can
be broadly defined as:
strings of letters, words, sounds, or other elements, contiguous or non-contiguous, of any

length, size, frequency, degree of compositionality, literality/figurativeness, abstractness
and complexity, not necessarily assumed to be stored, retrieved or processed whole, but
that necessarily enjoy a degree of conventionality or familiarity among (typical) speakers
of a language community or group, and that hold a strong relationship in communicative
meaning. (Siyanova-Chanturia & Pellicer-Sánchez, 2019, p.5)
FSs include collocations (take a shower), phrasal verbs (back up), idioms (raining cats and
dogs), lexical bundles (as far as I know), proverbs (actions speak louder than words). Although
different types of FSs may differ in relevance to spoken proficiency, they are very common in
speech and serve several communicative functions (Erman & Warren, 2000).
Language users draw on a large repertoire of ready-made FSs. Consider coffee: we say
strong coffee, not powerful coffee, even though strong and powerful are synonymous and both
phrases are grammatically correct. Thus, oral language production is not only governed by
grammar rules and combinations of single words, but also by conventionalized word com-
binations (Sinclair, 1991). Sinclair (1991) argues that reliance on ready-made FSs affords
processing advantages. In speech production, FSs are produced faster and more fluently than
non-FSs (e.g., Erman, 2007). Faster processing frees up attentional resources for other
cognitive tasks demanding more working memory during speech (N. Ellis, 2002). For ex-
ample, smooth talkers, such as sports commentators and auctioneers, make abundant use of
FSs in their speech (Kuiper, 1996); faster action requires greater use of FSs.
Research also demonstrates the importance of FSs for L2 learners. A positive relationship
holds between FSs and aspects of L2 proficiency, like reading (e.g., Kremmel, Brunfaut &
Alderson, 2017), writing (e.g., Granger & Bestgen, 2014), and speaking (e.g., Boers, Eyckmans,
DOI: 10.4324/9781003022497-25 285

Kappel, Stengers, & Demecheleer, 2006; Kyle & Crossley, 2015; Saito, 2020; Wood, 2009).
However, many L2 learners struggle with appropriate use of FSs (e.g., Hoang & Boers, 2016;
Laufer & Waldman, 2011; Nesselhauf, 2003; Pawley & Syder, 1983). This chapter provides an
overview of research into the role of FSs in speaking.
It is important to zoom in on two widely recognized approaches to studying FSs: the
phraseological and the frequency-based approach (see Granger & Paquot, 2008 for a thor-
ough discussion). In the phraseological approach, FSs are classified on the basis of linguistic
criteria, such as the degree of compositionality, i.e., the predictability of word combinations
from the meaning of their components, or the degree of substitutability (or fixedness), i.e.,
the possibility of replacing words in word combinations (Barfield & Gyllstad, 2009). The
frequency-based approach is more recent and data-driven, such that FSs are regarded as
frequently co-occurring words. In the 1980s, this approach made great strides by showing the
extent of formulaicity in language thanks to computer software and corpus linguistics (Cobb,
2019). While early research on FSs in native speaker corpora demonstrated the ubiquity of
FSs in L1 speech (e.g., Altenberg, 1998; Biber et al., 1999; Erman & Warren, 2000), more
recent learner corpus research shows how language learners struggle with the appropriate use
of FSs, as they underuse, overuse, or do not use FSs (e.g., De Cock, 2004). Recently, research
has combined both approaches when studying FSs (e.g., Columbus, 2013; Wulff, 2008).
Many early studies on FSs entailed the L2 development of children and teenagers (e.g., R.
Ellis, 1984; Hakuta, 1974, 1976; Myles, Hooper & Mitchell, 1998; see Wray, 2002 for an
overview). Hakuta’s (1974) study was among the first attempts to examine FS use in chil-
dren’s L2 speech. He found that prefabricated patterns constituted over 50% of the utter-
ances of a Japanese learner of English in early learning stages. Hakuta (1976) argues that
prefabricated patterns meet beginner learners’ needs to express various functions beyond
their linguistic competence. Examining FSs in classroom learning, R. Ellis (1984) suggests
that FSs allow L2 children to “perform important communicative functions” (p.64) and
“may contribute, directly or indirectly, to the acquisition of rules for producing novel sen-
tences” (p.65). Similarly, Myles et al. (1998), observing English learners of French in a British
secondary school, found that learners use FSs for communicative needs in early stages and
employ parts of those FSs to produce new utterances in later stages.
In addition to studies on FSs in L2 development in children and adolescents, some re-
searchers have investigated FSs in L2 development of adults (e.g., Bolander, 1989; Hanania
& Gradman, 1977; Schmidt, 1983; see Wray, 2002 for an overview). An early exploration was
Hanania and Gradman’s (1977) longitudinal case study of an Arabic-speaking English
student in an English-speaking environment. The findings showed that her language devel-
opment patterns were similar to those of L1 learners. Importantly, in the early stages, the
student uttered word strings that she perceived as single units. Some years later, more evi-
dence of FS use was reported in Schmidt (1983) who found that an adult L2 English learner
used FSs extensively as a “major linguistic strategy” (p.150) to facilitate his fluency.
Investigating adult learning in the classroom, Bolander (1989) examined learners’ acquisition
of Swedish word order rules and found that memorized FSs support conversational speech in
early learner language. Eskildsen and Cadierno (2007) demonstrated how a learner produced
increasingly abstract patterns from the FS I don’t know in oral classroom interactions, and
how the use of this FS expanded to include other lexical items and past-tense expressions.
These studies illustrate how FSs contribute to language development. However, the role of
FSs in L2 learning may be less than in L1 learning (Wulff, 2019).
286
The Role of Formulaic Sequences
In pragmatics, several studies have examined L2 learners’ use of FSs (e.g., Bardovi-
Harlig, 2009; De Cock, 2004; Scarcella, 1979) and the value of FSs in L2 pragmatic
competence (e.g., Bardovi-Harlig, 2006; House, 1996; Kecskes, 2010) (see Bardovi-Harlig,
2019 for an overview). Scarcella (1979) found that even common routines are not easily
acquired by L2 learners, who produce a number of pragmatically deviant routines.
Further, De Cock (2004) showed that native speakers and advanced EFL learners differ
in their use of frequently recurrent sequences in informal speech. She found learners
significantly underused and misused markers of vagueness (e.g., or something, and things,
kind of), which made them come across as too formal, detached, over-emphatic, or rude,
depending on the FSs used. The underuse of FSs might result from a lack of familiarity
with some FSs, overuse of familiar FSs, level of pragmatic development, and socio-
pragmatic knowledge (Bardovi-Harlig, 2009). As for the contributions of FSs to L2
pragmatic competence, House (1996) showed that learners became more fluent pragma-
tically after taking a course rich in FSs. More recently, Bardovi-Harlig (2006) made a
distinction between developmental (acquisitional) formulas, i.e., formulas used when the
internal grammar of the formulas exceeds learners’ grammar in general, and social
(target) formulas, i.e., formulas expressing societal knowledge shared by a community.
Both these FSs types are crucial for L2 learners’ pragmatic competence. FSs can also
serve as pragmatic acts necessary for interlocutors in situational contexts (Kecskes, 2010).
These L2 pragmatics studies illustrate the importance of FSs for pragmatic purposes
in speech.
In L2 teaching, FSs have not always occupied a prominent place, especially in the Grammar
Translation method, where the focus was on grammar rules and reading. Speaking activities,
which barely played a role, were subordinate to syntax and morphology (Richards & Rodgers,
2001). In the 1960s, with the advent of Audiolingualism, the focus shifted to more oral practice
with drills in language labs. Target language patterns were selected, taught and practiced until
they became internalized by learners. Yet, the focus remained predominantly on grammar with
repetition, replacement, contraction, and inflection activities (Richards & Rodgers, 2001). FSs
took centre stage in Communicative Language Teaching (CLT) from the 1970s onwards
(Cowie, 1992; Dörnyei, 2009). This is reflected in the Common European Framework of
Reference, in which FSs are included as an important descriptor of learners’ speaking and
writing proficiency (Council of Europe, 2018). CLT places considerable emphasis on spoken
interaction and formulaic language to achieve communicative competence, which “is not a
matter of knowing rules…. It is much more a matter of knowing a stock of partially pre-
assembled patterns, formulaic frameworks…” (Widdowson, 1989, p. 135). In his proposal for
a principled communicative approach, Dörnyei (2009) advocates that teaching FSs be a core
component, maintaining that “[t]here should be sufficient awareness raising of the significance
and pervasiveness of formulaic language in real-life communication, and selected phrases
should be practiced and recycled intensively” (p. 41).
Dörnyei’s suggestion is reminiscent of Michael Lewis’s lexical approach, which regards
words and FSs as the building blocks of language learning. FSs are at the heart of the lexical
approach; Lewis (1993, 1997) maintains that L2 learners should be made aware of the
pervasiveness of FSs in language and should be encouraged to identify FSs in language
learning materials. Lewis’s publications (1993, 1997) have inspired several empirical studies
investigating the effectiveness of this approach (e.g., Boers et al., 2006; Peters & Pauwels,
2015). Early on, Boers et al. (2006) showed that awareness-raising activities have the po-
tential to increase L2 learners’ repertoire of FSs. Nevertheless, research into written activities
has indicated that awareness-raising should be complemented by practice and repetition
(Peters & Pauwels, 2015).
287
Empirical research into pedagogical interventions has mainly focused on learning FSs
from written input (see Boers & Lindstromberg, 2012; Pellicer-Sánchez & Boers, 2019 for
reviews). Few studies have explored the effect of interventions on learning and using FSs
in speech. Boers and colleagues (Boers, 2014; Hoang & Boers, 2016; Thai & Boers, 2016)
have conducted pioneering research in this respect. Boers (2014) and Thai and Boers
(2016) explored whether Nation’s (2013) 4/3/2-activity (and a modified version, the 3/2/1-
activity) affected learners’ fluency, complexity, and accuracy and whether learners’ spoken
output differed when a monologue was repeated under constant time conditions. In the 4/
3/2-activity, L2 learners repeat a task under increasing time pressure (4 minutes, 3 min-
utes, 2 minutes). Each monologue is then told to another listener. The findings in both
studies showed that repeating a monologue under increased time pressure fosters learners’
fluency, especially speech rate, but at the expense of accuracy. Learners’ speech in the
repetitions was characterized by several verbatim repetitions of sequences of two or more
words, many of which were repetitions of errors. In another study, Hoang and Boers
(2016) explored how many words and FSs Vietnamese EFL learners recycled after a
reading-and-listening activity. The analyses showed that input influences learners’ word
use in the retelling task. However, learners only recycled, on average, 2.41 FSs from the 35
FSs in the input and only 0.48, on average, were recycled correctly. These findings show
that accurate use of FSs is challenging for L2 learners, even after exposure to input
containing relevant FSs.

The most crucial issues regarding FSs and speaking include the ubiquity of FSs in speech, the
role of FSs in oral proficiency (especially oral fluency), and the challenges of learning FSs.
The ubiquity of FSs in speech

Several corpus-based studies have indicated the extent of FSs in both spoken and written
discourse of a language (e.g., Altenberg, 1998; Biber et al., 1999; Erman & Warren, 2000).
Moreover, FSs are more frequent in speech than in writing (Biber et al., 1999). When
Altenberg (1998) investigated spoken English in the London-Lund Corpus of about half a
million words, he found the corpus consisted of over 201,000 recurrent word-combinations,
among which 3-word combinations occurred almost 70,000 times, and 4-word combinations
about 20,000 times. Altenberg estimated that recurrent word-combinations constituted over
80 per cent of the words in the corpus. In another study, Biber et al. (1999) examined lexical
bundles in a larger corpus, the Longman Spoken and Written English Corpus of 40 million
words, and reported that 3-word bundles were found over 80,000 times per million words in
conversation and over 60,000 times per million words in academic prose, while
4-word bundles were found over 8,500 times per million words in conversation and over
5,000 times per million words in academic prose.
Similarly, Erman and Warren (2000), who analyzed seven extracts from the London Lund
Corpus of Spoken English, revealed that FSs accounted for 58.6% of the corpus. Examining
FSs in actual usage, Van Lancker-Sidtis and Rallon (2004) found that FSs accounted for
almost 25% of the phrases in a classic screenplay. In spontaneous telephone conversations,
they reported that FSs comprised 48% and 24% of the utterances between two friends and
two business persons respectively. FSs are also prevalent in spoken university registers (e.g.,
Biber, 2006; Biber & Barbieri, 2007). Taken together, the evidence from corpus research
clearly indicates the pervasiveness of FSs in speech.
288
The relationship between FSs and speaking proficiency

The relationship between FSs and oral proficiency has received considerable attention in L2
research (e.g., Boers et al., 2006; Kyle & Crossley, 2015; Lundell et al., 2014; Saito, 2020;
Stengers et al., 2011). Growing evidence suggests that L2 learners’ use of FSs in their speech
has a strong impact on perceived oral proficiency. An early study addressing this issue is
Boers et al. (2006), who explored whether learners’ use of FSs is related to perceived speaking
proficiency. In this longitudinal study, learners in the experimental and control groups en-
gaged with authentic listening and reading materials, but the teacher directed learners’ at-
tention to the FSs only in the experimental group, not in the control group. At the end of the
course, participants’ oral proficiency was measured in interviews. The results showed that the
experimental group tended to use more FSs. Furthermore, an increased use of FSs was
positively correlated to learners’ perceived oral proficiency as well as fluency. This finding
was corroborated by Stengers et al. (2011).
Research also indicates that it is possible to distinguish proficiency levels by analyzing
learners’ use of FSs. More advanced learners tend to use more FSs, even though their for-
mulaic repertoire differs from that of native speakers (Lundell et al., 2014). It seems “difficult
for L2 learners to attain absolute nativelikeness” (Lundell et al., 2014, p.274). Differences
between advanced L2 learners and native speakers depend somewhat on the type of FSs, as
highly advanced L2 speakers are comparable to native speakers when social routines (e.g., no
problem, take care) are concerned, but not in the case of collocations (Erman, Denke, Fant,
& Lundell, 2015).
In a series of studies, Crossley and colleagues (Crossley, Salsbury & McNamara, 2015; Kyle
& Crossley, 2015) analyzed the predictive value of several lexical indices (e.g., word frequency,
word range, trigrams, hypernymy, concreteness) for holistic ratings of L2 speech samples.
Their findings indicated that FSs play a key role in oral lexical proficiency. The more trigrams
learners used in their speech, the higher the holistic ratings (Kyle & Crossley, 2015). Further,
they found that collocation accuracy was the best predictor of speaking, as it accounted for the
most variance in the holistic judgments of oral lexical proficiency (Crossley et al., 2015).
A recent study on the role of FSs in L2 speech (Saito, 2020) focused on L1 raters’ jud-
gements of comprehensibility and lexical appropriateness in speech by Japanese learners of
English. Collocation measures (high- and low-frequency bigrams and trigrams) were com-
pared with single-word depth (meaningfulness and hypernymy) and breadth (frequency and
range) measures. The findings showed that learners’ use of collocations is a determinant
factor for comprehensibility and lexical appropriateness. The use of low-frequency collo-
cations (collocations with a high mutual information score) was strongly related to perceived
L2 oral proficiency. The studies surveyed here all show that L2 learners’ oral proficiency is
intensely affected by their use of FSs and low-frequency FSs in particular.
The role of FSs in oral fluency

In L2 speech research, oral fluency has gained traction (e.g., Tavakoli & Uchihara, 2019;
Wood, 2010; Yan, 2020). Oral fluency refers to speech rate, pauses, and length of fluent runs of
speech between pauses (Wood, 2001). Although fluency is an important indicator of L2 pro-
ficiency, it is often neglected in L2 classrooms (Rossiter, Derwing, Manimtim & Thomson,
2010). To explore the connection between the use of FSs and speech fluency development in
adult English learners, Wood (2010) collected speech samples via a narrative retell task from
eleven intermediate L2 English learners from three study-abroad L1 groups (Spanish,
Japanese, Chinese). Temporal variables (phonation/time ratio, speech rate, articulation rate,
289
mean length of run) and a variable related to FS use (formula/run ratio) were analyzed. The
results showed that the learners’ fluency significantly improved and they used more FSs.
Other scholars have examined fluency in more depth. Tavakoli (2011) studied the dif-
ferences in pausing patterns between L2 learners and native speakers through four oral
narrative tasks (picture stories). Temporal measures of fluency, including the number of
pauses and the total amount of silence (in seconds) in the middle and at the end of clauses,
were statistically analyzed for each task. The results indicated that L2 learners and native
speakers differ primarily in the positions of pauses rather than the number of pauses or the
amount of silence. For instance, L2 learners produced significantly more pauses and silences
in the middle of clauses than native speakers, while they paused less frequently and did not
differ significantly in the amount of silence at the end of clauses. Further analyses revealed
that both L2 learners and native speakers hardly ever pause in the middle of FSs, which
suggests that FSs facilitate fluency. Going beyond pausing, Yan (2020) investigated the in-
fluence of FSs on both speech rate and pausing in L1 and L2 speakers of intermediate and
advanced proficiency levels in elicited imitation tasks. The findings revealed that FSs create a
processing advantage and also facilitate oral fluency for both L1 and L2 speakers. FSs had a
significant effect on pausing at the sentence level, but the effect on speech rate of sentence
repetition was non-significant.
Tavakoli and Uchihara (2019) investigated the relationship between oral fluency and use
of FSs across levels of proficiency. Fluency was measured in terms of speed, breakdown, and
repair while FSs were measured through n-grams (proportion, frequency, and strength of
association). The findings showed that: 1) there was a linear relationship between oral
proficiency level and many n-gram measures; 2) there were significant positive correlations
between fluency and high-frequency n-grams. Based on the findings, Tavakoli and Uchihara
(2019) explain that FSs facilitate oral fluency in both the formulation stage, where learners
with a large repertoire of FSs retrieve them holistically and process them as single words, and
the articulation stage, where FSs help phraseologically proficient learners speak faster given
their access to information about phonetic reduction of FSs in their lexicon. In fact, the
assumption of holistic retrieval that Tavakoli and Uchihara (2019) suggest for FSs is not
uncommon (e.g., Erman, 2007; Kecskes, 2010; Wray, 2002). However, this notion is con-
troversial (see Siyanova-Chanturia, 2015a); some psycholinguistic evidence counters it (e.g.,
Sprenger, Levelt & Kempen, 2006).
What makes FSs difficult

FSs is an umbrella term for different types of conventional word strings, which means that
FSs differ in terms of compositionality, imageability, transparency, or congruency with the
L1, and consequently can differ in terms of learning difficulty. Learner corpus research has
shown that FSs, collocations in particular, are challenging for L2 learners.
Usage-based accounts of language learning could explain this phenomenon. Usage-based
theories hold that language learning is an experience-driven process and that input frequency
is an important condition for learning (N. Ellis, 2002; Wulff, 2019). Learners need many
encounters with lexical items for acquisition to take place, but in spite of the ubiquity of FSs
in spoken language, specific FSs do not occur very often (Cobb, 2019). Additionally, many
L2 learners lack significant exposure and interaction, making it difficult to develop an ac-
curate sense of preferred lexical combinations.
Usage-based theories of language learning also hold that learning is determined by an item’s
salience: low-salience items are more difficult than high-salience ones. Thus, in addition to not
being frequent, many FSs are not salient in the input, that is, they do not draw L2 learners’
290
attention because they consist of high-frequency words (Boers, 2020). Many collocations, for
instance, do not cause problems at the level of comprehension, but at the level of production.
For example, the collocation make an effort is easy to understand for Dutch-speaking learners
of English, but difficult to produce because the Dutch collocation is do an effort. Consequently,
on encountering this collocation in speech, Dutch-speaking L2 learners may not notice that
English uses a different verb. Furthermore, many collocations contain delexicalized verbs
(make, do, have), which add little to the meaning of the collocation.
Another explanation for some FSs’ difficulty is a lack of semantic transparency. Unlike
collocations, idioms are semantically opaque; their meaning cannot be easily derived from
individual constituents. Take once in a blue moon for instance. It is not easy to figure out the
meaning of “very rarely” by means of the words in the idiom. Finally, L2 learners’ L1 exerts
a strong influence on their production of FSs, rendering their speech odd and non-nativelike
(e.g., Laufer & Waldman, 2011). Additionally, most adult L2 learners might take an analytic
approach to learning a new language. Even though more attention is now paid to FSs in L2
textbooks, there is still considerable reliance on single words.

Considering many of the studies above, several trends appear in current research on FSs in
speech. First, it is clear that investigations of the relationship between FSs and oral
proficiency are increasing remarkably (e.g., Boers et al., 2006; Kyle & Crossley, 2015;
Lundell et al., 2014; Saito, 2020; Stengers et al., 2011). Researchers have shifted focus
from the role of FSs in speech in general to more specific aspects of speech, such as
comprehensibility and lexical appropriateness (Saito, 2020) or fluency (e.g., Tavakoli &
Uchihara, 2019; Wood, 2010; Yan, 2020). These studies contribute to our understanding
of the significant role of FSs in speech and consequently highlight the importance of FSs
for L2 teaching.
Several scholars advocate a multivariate approach to studying vocabulary use. Recent
research into the role of FSs in learners’ proficiency has broadened the scope to several
lexical indices (range, frequency, mutual information scores) to model lexical proficiency or
speaking proficiency (e.g., Crossley et al., 2015; Granger & Bestgen, 2014; Kyle & Crossley,
2015; Saito, 2020). The availability of corpora and the use of multivariate statistical analyses
have moved this work forward.
Recent research also highlights the importance of L2 input (hours of instruction, out-
of-school exposure to L2) for developing formulaic competence; more input is positively
related to learners’ use (Saito & Hanzawa, 2018) and accuracy of FSs (Crossley &
Salsbury, 2011).
Most research on FSs is in English. Nevertheless, other languages have been studied,
including French (Erman et al., 2015; Myles et al., 1998) and Swedish (Bolander, 1989).

Research methods have largely been discussed above. The most common approaches to
studying FSs in speech are: learner corpus research, correlation research, (quasi-) experi-
mental research, and case studies.
Spoken learner corpora are collections of language learners’ elicited speech samples. Most
corpora focus on English, but spoken learner corpora are available for other languages. For
an overview, see the Centre for English Corpus Linguistics (2020): Learner Corpora around
the World. Louvain-la-Neuve: Université catholique de Louvain. https://uclouvain.be/en/
291
research-institutes/ilc/cecl/learner-corpora-around-the-world.html. A frequently used ap-

proach to analyzing learner corpora is contrastive interlanguage analysis, in which the
quantity and distribution of FSs in learner output is compared with a control corpus of L1 or
expert speech (see also Granger, 2020).
Recently, the number of correlational studies has increased; they focus on the relationship
between learners’ use of FSs in elicited speech samples (e.g., spontaneous speech, picture prompts,
elicited imitation tasks, role play) and variables of interest, like measures of fluency, comprehen-
sibility, or (lexical) proficiency (e.g., Saito, 2020; Tavakoli & Uchihara, 2019). Some studies have
also used learner corpora and automatic tools, such as TAALES (Kyle & Crossley, 2015).
Experimental and quasi-experimental research methods are used to study the effect of
pedagogical interventions on the learning of FSs (e.g., Boers et al., 2006). Studies also exist
on incidental FS learning from meaningful input versus deliberate, intentional FS learning.
Finally, the study of FSs in L2 speech includes several case studies, which document in
detail how learners’ use of FSs changes over time (e.g., Wood, 2009).

As argued by Pellicer-Sánchez and Boers (2019), “attention to formulaic language (FL)
should be an integral part of language pedagogy” (p. 154). Given that FSs can be difficult to
notice and specific FSs do not occur frequently, they deserve explicit attention in the
classroom. A balanced approach to teaching FSs consists of incidental learning, deliberate
learning, and output and fluency activities. Below we provide some suggestions for teaching
FSs, but we emphasize that this is not an exhaustive list.
Provide authentic input

L2 learners should engage with meaningful input to pick up FSs incidentally while reading,
listening, or watching TV. Exposure to spoken input is particularly important, as it fosters
spoken fluency more effectively (Wood, 2001). Lin (2019) argues that because FSs are
prosodically salient in speech, they might be easier to learn from speech than from written
input. Since prosody helps L2 learners identify the boundaries of FSs, it can increase the
likelihood of learning them holistically. Lin and Siyanova-Chanturia (2015) highlight the
benefits of extensive TV viewing for learning FSs because it assists learners to develop “an
accurate sense of the frequency of use of collocations and phrasal expressions” (p.151).
Recent research suggests that learners can gain knowledge of the form and meaning of FSs
while watching (captioned) TV (Puimège & Peters, 2019; Puimège & Peters, 2020).
Audiovisual input might also be beneficial for learning the pronunciation of FSs. Wisniewska
and Mora’s (2020) longitudinal study suggests a positive effect of (captioned) audiovisual
input on L2 learners’ pronunciation. Their work did not specifically target FSs, but clearly
audiovisual input has potential for learning spoken FSs.
Teach FSs explicitly

Given that automatic transfer from input to output is unlikely (Hoang & Boers, 2016),
teachers should utilize some of the following techniques to increase L2 learners’ noticing,
learning, and use of FSs.
Lewis’s lexical approach includes raising learners’ awareness of the pervasiveness of FSs
in speech, for instance, by asking learners to identify FSs in texts. Textual enhancement (e.g.,
underlining, or bold typeface) is an effective way to draw learners’ attention to specific
292
FSs in the input (Pellicer-Sánchez & Boers, 2019). L2 learners could also use CALL appli-
cations, like concordances, to study the use of FSs in spoken registers. Concordances have
the advantage that they present FSs in context (Cobb, 2019; Meunier, 2020).
The memorability of FSs can be increased by pointing out sound patterns in FSs, like
alliteration (e.g., play a part), assonance (e.g., turn a blind eye), or rhyme (e.g., wear and tear)
(Boers & Lindstromberg, 2012). Given that learners’ L1 exerts a strong influence on their
production of FSs, a contrastive approach (e.g., translation activities) might help raise their
awareness of differences between the L1 and L2 (Laufer & Girsai, 2008; Webb &
Kagimoto, 2011).
To enlarge L2 learners’ formulaic repertoire and foster automatic retrieval, they should
practice FSs, as memorization and repetition enhance the learning of FSs (Wood, 2009;
Fitzpatrick & Wray, 2006). However, practice should involve retrieval (see also Pellicer-
Sánchez & Boers, 2019). Studies have shown that merely copying (Webb & Kagimoto,
2011) or orally repeating FSs (Alali & Schmitt, 2012) is not as effective as activities that
prompt learners to actually retrieve FSs from memory, like gap-filling or contrastive ac-
tivities. Importantly, learners should first be presented with FSs as a whole to prevent
them from making erroneous combinations, which are difficult to “unlearn” (Strong &
Boers, 2019).
Output activities and fluency training

To foster automatic retrieval of FSs, learners should be prompted to use specific FSs in their
spoken output. Webb and Nation (2017) advocate the use of written worksheets containing
target vocabulary to increase the chances that L2 learners use specific FSs during task
performance. The “disappearing text” is another activity which prompts learners (Nation &
Newton, 2009; Rossiter et al., 2010). The teacher reads a text aloud, then deletes a number of
FSs, which learners have to provide.
For fluency activities, Webb and Nation (2017) suggest four criteria: (1) the focus is on
the content, (2) the linguistic forms are familiar, (3) sufficient practice, and (4) time
pressure. Webb and Nation argue that even beginners should practice FSs like greetings
or expressions of politeness. One activity proposed to enhance fluency is the 4/3/2-activity
(Nation, 2013; Wood, 2009, 2010), as discussed above. Repeating a monologue under
increased time pressure fosters learners’ fluency and speech rate, albeit at the expense of
accuracy (Boers, 2014; Thai & Boers, 2016). To reduce errors in task repetition, Boers
(2014) proposes the following: (1) providing pre-task planning time, (2) modeling input
before the first performance of the task, and/or (3) giving feedback after the first per-
formance of the task. If fluency is not the sole aim, the 4/3/2-activity might be less
promising, as it can result in the proceduralization of incorrect word combinations.
Finally, poster presentations (Rossiter et al., 2010) are suggested as a repetition activity
to foster fluency because they allow L2 learners to repeat their poster pitch for several
listeners.
Wood (2001) proposes a fluency program consisting of three stages to enhance automatic
processing and retrieval of FSs: (1) input, (2) automatization, and (3) practice or production.
In the input stage, learners are exposed to authentic speech and their attention is subse-
quently drawn to FSs, pauses, and hesitations in speech. In the second stage, learners re-
hearse FSs in controlled settings, e.g., imitating FSs in a language lab, or reconstructing a
spoken text and spotting differences in the use of FSs between the spoken text and the
reconstructed text (dictogloss). Finally, the production stage comprises the 4/3/2-activity and
free talk.
293
7 Future Directions
Studies on the role of FSs in speech are growing. From the research discussed here, it is clear
that English is the most studied language. Limited research into other languages has focused
on FSs (e.g., Erman et al. 2015; Lundell et al., 2014). This is an area warranting further
investigation if the findings of existing studies are to be generalized.
Another issue hampering the generalizability of research findings is the sampling bias in SLA;
most studies are conducted with university students and advanced learners of English (see Myles
et al., 1998, for an exception). Our understanding of formulaic competence in speech would be
enhanced if more studies include younger and beginner L2 learners. An interesting example is
Siyanova-Chanturia’s (2015b) work mapping the development of formulaic competence in the
writing of beginner learners of Italian. Similar studies could be conducted for speaking. It would
be worthwhile to establish research collaborations with teachers in primary and secondary
schools to further the study of FSs in an ecologically valid way. The field would also benefit from
more longitudinal research investigating how FSs develop over time.
Empirical pedagogical studies of FSs and speech are still limited. More research into
pedagogical interventions to fuel the learning and use of FSs in speaking tasks is warranted.
Given that L2 learners’ vocabulary is somewhat task dependent (Eguchi & Kyle, 2020), more
research into how tasks and prompts affect learners’ use of FSs is needed.
As for corpus-based studies, researchers often combine data from different learners and
analyze them together. This focus on averages hides individual variation. Granger (2020)
argues that learner language has a high level of variability, so both group and individual
scores should be computed in corpus-driven research to resolve this issue. More research into
learners’ individual development of FS in speech would enhance our understanding of
learning trajectories (Granger, 2020).
Further Reading
Annual Review of Applied Linguistics (2012). Volume 32. Topics in formulaic language.
Language Teaching Research (2017). Volume 21, Issue 3. Special issue. Multi-word expressions.
Siyanova-Chanturia, A. & Pellicer-Sánchez, A. (Eds.). (2019). Understanding formulaic language: A
second language acquisition perspective. London/New York: Routledge.
References
Alali, F. A., & Schmitt, N. (2012). Teaching formulaic sequences: The same as or different from
teaching single words? TESOL Journal, 3(2), 153–180.
Altenberg, B. (1998). On the phraseology of spoken English: The evidence of recurrent word-
combination. In A. Cowie (Ed.), Phraseology: Theory, analysis and applications (pp. 101–122).
Oxford: Oxford University Press.
Bardovi-Harlig, K. (2006). On the role of formulas in the acquisition of L2 pragmatics. In K. Bardovi-
Harlig, C. Felix-Brasdefer, & A. S. Omar (Eds.), Pragmatics and language learning: Vol. 11
(pp. 1–28). Honolulu, HI: University of Hawaii, National Foreign Language Resource Center.
Bardovi-Harlig, K. (2009). Conventional expressions as a pragmalinguistic resource: Recognition and
production of conventional expressions in L2 pragmatics. Language Learning, 59(4), 755–795.
Bardovi-Harlig, K. (2019). Formulaic language in second language pragmatics research. In A.
Siyanova-Chanturia & A. Pellicer-Sánchez (Eds.), Understanding formulaic language: A second
language acquisition perspective (pp. 97–114). London/New York: Routledge.
Barfield, A., & Gyllstad, H. (2009). Introduction: Researching L2 collocation knowledge and devel-
opment. In A. Barfield & H. Gyllstad (Eds.), Researching collocations in another language: Multiple
interpretations (pp. 1–18). Basingstoke, UK: Palgrave Macmillan.
Biber, D. (2006). University language: A corpus-based study of spoken and written registers. Amsterdam:
John Benjamins.
294
Biber, D., & Barbieri, F. (2007). Lexical bundles in university spoken and written registers. English for
Specific Purposes, 26(3), 263–286.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and
written English. Harlow: Longman.
Boers, F. (2014). A reappraisal of the 4/3/2 activity. RELC Journal, 45(3), 221–235.
Boers, F. (2020). Factors affecting the learning of multiword items. In S. Webb (Ed.), The Routledge
handbook of vocabulary studies (pp. 143–157). London/New York: Routledge.
Boers, F., Eyckmans, J., Kappel, J., Stengers, H., & Demecheleer, M. (2006). Formulaic sequences and
perceived oral proficiency: Putting a lexical approach to the test. Language Teaching Research, 10(3),
245–261.
Boers, F., & Lindstromberg, S. (2012). Experimental and intervention studies on formulaic sequences in
a second language. Annual Review of Applied Linguistics, 32, 83–110.
Bolander, M. (1989). Prefabs, patterns and rules in interaction? Formulaic speech in adult learners’ L2
Swedish. In K. Hyltenstam & L. Obler (Eds.), Bilingualism across the lifespan: Aspects of acquisition,
maturity, and loss (pp. 73–86). Cambridge: Cambridge University Press.
Cobb, T. (2019). From corpus to CALL: The use of technology in teaching and learning formulaic
language. In A. Siyanova-Chanturia & A. Pellicer-Sánchez (Eds.), Understanding formulaic
language: A second language acquisition perspective (pp. 192–210). London/New York:
Routledge.
Columbus, G. (2013). In support of multiword unit classifications: Corpus and human rating data
validate phraseological classifications of three different multiword unit types. Yearbook of
Phraseology, 4(1), 23–44.
Council of Europe. (2018). Common European framework of reference for languages: Learning, teaching,
assessment - Companion volume with new descriptors. Retrieved 24 April, 2020 from http://rm.coe.
int/cefr-companion-volume-with-new-descriptors-2018/1680787989.
Cowie, A. P. (1992). Multiword lexical units and communicative language teaching. In P. Arnaud
& H. Bejoint (Eds.), Vocabulary and applied linguistics (pp. 1–12). Basingstoke, England:
Macmillan.
Crossley, S. A., Salsbury, T., & McNamara, D. S. (2015). Assessing lexical proficiency using analytic
ratings: A case for collocation accuracy. Applied Linguistics, 36(5), 570–590.
Crossley, S., & Salsbury, T. L. (2011). The development of lexical bundle accuracy and production in
English second language speakers. IRAL-International Review of Applied Linguistics in Language
Teaching, 49(1), 1–26.
De Cock, S. (2004). Preferred sequences of words in NS and NNS speech. Belgian Journal of English
Language and Literatures, 2, 225–246.
Dörnyei, Z. (2009). Communicative language teaching in the 21st century: The ‘principled commu-
nicative approach’. Perspectives, 36, 33–43.
Eguchi, M., & Kyle, K. (2020). Continuing to explore the multidimensional nature of lexical so-
phistication: The case of oral proficiency interviews. The Modern Language Journal, 104(2),
381–400.
Ellis, N. C. (2002). Frequency effects in language processing: A review with implications for theories of
implicit and explicit language acquisition. Studies in Second Language Acquisition, 24(2), 143–188.
Ellis, R. (1984). Formulaic speech in early classroom second language development. In J. Handscombe,
R. A. Orem, & B. P. Taylor (Eds.), On TESOL ’83 (pp. 53–65). Washington, DC: TESOL.
Erman, B. (2007). Cognitive processes as evidence of the idiom principle. International Journal of
Corpus Linguistics, 12(1), 25–53.
Erman, B., Denke, A., Fant, L., & Lundell, F. F. (2015). Nativelike expression in the speech of long-
residency L2 users: A study of multiword structures in L2 English, French and Spanish.
International Journal of Applied Linguistics, 25(2), 160–182.
Erman, B., & Warren, B. (2000). The idiom principle and the open choice principle. Text, 20(1), 29–62.
Eskildsen, S. W., & Cadierno, T. (2007). Are recurring multi-word expressions really syntactic freezes?
Second language acquisition from the perspective of usage-based linguistics. In M. Nenonen & S.
Niemi (Eds.), Collocations and idioms 1: Papers from the first Nordic conference on syntactic freezes
(pp. 86–99). Joensuu, Finland: University of Joensuu.
Fitzpatrick, T., & Wray, A. (2006). Breaking up is not so hard to do: Individual differences in L2
memorization. Canadian Modern Language Review, 63(1), 35–57.
Granger, S. (2020). Learner corpora. In C. A. Chapelle (Ed.), The concise encyclopedia of applied lin-
guistics (pp.681–688). Hoboken, NJ: John Wiley & Sons.
295
Granger, S., & Bestgen, Y. (2014). The use of collocations by intermediate vs. advanced non-native
writers: A bigram-based study. International Review of Applied Linguistics in Language Teaching, 52,
229–252.
Granger, S., & Paquot, M. (2008). Disentangling the phraseological web. In S. Granger & F. Meunier
(Eds.), Phraseology: An interdisciplinary perspective (pp. 27–49). Amsterdam, the Netherlands: John
Benjamins.
Hakuta, K. (1974). Prefabricated patterns and the emergence of structure in second language acqui-
sition. Language Learning, 24(2), 287–297.
Hakuta, K. (1976). A case study of a Japanese child learning English as a second language. Language
Learning, 26(2), 321–351.
Hanania, E. A., & Gradman, H. L. (1977). Acquisition of English structures: A case study of an adult
native speaker of Arabic in an English-speaking environment. Language Learning, 27, 75–91.
Hoang, H., & Boers, F. (2016). Re-telling a story in a second language: How well do adult learners mine
an input text for multiword expressions? Studies in Second Language Learning and Teaching, 6(3),
513–535.
House, J. (1996). Developing pragmatic fluency in English as a foreign language: Routines and me-
tapragmatic awareness. Studies in Second Language Acquisition, 18(2), 225–252.
Kremmel, B., Brunfaut, T., & Alderson, J. C. (2017). Exploring the role of phraseological knowledge in
foreign language reading. Applied Linguistics, 38(6), 848–870.
Kecskes, I. (2010). Situation-bound utterances as pragmatic acts. Journal of Pragmatics, 42(11),
2889–2897.
Kuiper, K. (1996). Smooth talkers: The linguistic performance of auctioneers and sportscasters. Mahwah,
NJ: Erlbaum.
Kyle, K., & Crossley, S. A. (2015). Automatically assessing lexical sophistication: Indices, tools,
findings, and application. TESOL Quarterly, 49, 757–786.
Laufer, B., & Girsai, N. (2008). Form-focused instruction in second language vocabulary learning: A
case for contrastive analysis and translation. Applied Linguistics, 29(4), 694–716.
Laufer, B., & Waldman, T. (2011). Verb-noun collocations in second language writing: A corpus
analysis of learners’ English. Language Learning, 61(2), 647–672.
Lewis, M. (1993). The lexical approach. Hove, UK: Language Teaching Publications.
Lewis, M. (1997). Implementing the Lexical Approach. Hove, UK: Language Teaching Publications.
Lin, P. (2019). Formulaic language and speech prosody. In A. Siyanova-Chanturia & A. Pellicer-
Sánchez (Eds.), Understanding formulaic language: A second language acquisition perspective
(pp. 78–94). London/New York: Routledge.
Lin, P. M. S., & Siyanova-Chanturia, A. (2015). Internet television for L2 vocabulary learning. In D.
Nunan & J. C. Richards (Eds.), Language learning beyond the classroom (pp. 149–158). London:
Routledge.
Lundell, F. F., Bartning, I., Engel, H., Gudmundson, A., Hancock, V., & Lindqvist, C. (2014). Beyond
advanced stages in high-level spoken L2 French. Journal of French Language Studies, 24(2),
255–280.
Meunier, F. (2020). Resources for learning multiword items. In S. Webb (Ed.), The Routledge handbook
of vocabulary studies (pp. 336–350). London/New York: Routledge.
Myles, F., Hooper, J., & Mitchell, R. (1998). Rote or rule? Exploring the role of formulaic language in
classroom foreign language learning. Language Learning, 48(3), 323–363.
Nation, I. S. P. (2013). Learning vocabulary in another language (2nd ed.). Cambridge: Cambridge
University Press.
Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some implications
for teaching. Applied Linguistics, 24(2), 223–242.
Nation, I. S. P. & Newton, J. M. (2009). Teaching ESL/EFL listening and speaking. New York:
Routledge.
Pawley, A., & Syder, F. H. (1983). Two puzzles for linguistic theory: Native-like selection and native-
like fluency. In J. C. Richards & R. W. Schmidt (Eds.), Language and communication (pp. 191–226).
New York: Longman.
Pellicer-Sánchez, A., & Boers, F. (2019). Pedagogical approaches to the teaching and learning of for-
mulaic language. In A. Siyanova-Chanturia & A. Pellicer-Sánchez (Eds.), Understanding formulaic
language: A second language acquisition perspective (pp. 153–173). London/New York: Routledge.
Peters, E., & Pauwels, P. (2015). Learning academic formulaic sequences. Journal of English for
Academic Purposes, 20, 28–39.
296
Puimège, E., & Peters, E. (2019). Learning L2 vocabulary from audiovisual input: an exploratory study
into incidental learning of single words and formulaic sequences. The Language Learning Journal,
47(4), 424–438.
Puimège, E., & Peters, E. (2020). Learning formulaic sequences through viewing L2 television and
factors that affect learning. Studies in Second Language Acquisition, 42(3), 525–549.
Richards, J. C., & Rodgers, T. S. (2001). Approaches and methods in language teaching. Cambridge:
Rossiter, M. J., Derwing, T. M., Manimtim, L. G., & Thomson, R. I. (2010). Oral fluency: The ne-
glected component in the communicative language classroom. Canadian Modern Language Review,
66(4), 583–606.
Saito, K. (2020). Multi‐or single‐word units? The role of collocation use in comprehensible and con-
textually appropriate second language speech. Language Learning, 70(2), 548–588.
Saito, K., & Hanzawa, K. (2018). The role of input in second language oral ability development
in foreign language classrooms: A longitudinal study. Language Teaching Research, 22(4),
398–417.
Scarcella, R. C. (1979). Watch up!: A study of verbal routines in adult second language performance.
Working Papers on Bilingualism Toronto, (19), 79–90.
Schmidt, R. (1983). Interaction, acculturation, and the acquisition of communicative competence: A
case study of an adult. In N. Wolfson & E. Judd (Eds.), Sociolinguistics and language acquisition
(pp. 137–174). Rowley, MA: Newbury House.
Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.
Sprenger, S. A., Levelt, W. J., & Kempen, G. (2006). Lexical access during the production of idiomatic
phrases. Journal of Memory and Language, 54(2), 161–184.
Stengers, H., Boers, F., Housen, A., & Eyckmans, J. (2011). Formulaic sequences and L2 oral profi-
ciency: Does the type of target language influence the association?. IRAL-International Review of
Applied Linguistics in Language Teaching, 49(4), 321–343.
Siyanova-Chanturia, A. (2015a). On the ‘holistic’ nature of formulaic language. Corpus Linguistics and
Linguistic Theory, 11(2), 285–301.
Siyanova-Chanturia, A. (2015b). Collocation in beginner learner writing: A longitudinal study. System,
53, 148–160.
Siyanova-Chanturia, A., & Pellicer-Sánchez, A. (2019). Formulaic language: Setting the scene. In A.
Siyanova-Chanturia & A. Pellicer-Sánchez (Eds.), Understanding formulaic language: A second
language acquisition perspective (pp. 1–15). London/New York: Routledge.
Strong, B., & Boers, F. (2019). The error in trial and error: Exercises on phrasal verbs. TESOL
Quarterly, 53(2), 289–319.
Tavakoli, P. (2011). Pausing patterns: Differences between L2 learners and native speakers. ELT
Journal, 65, 71–79.
Tavakoli, P., & Uchihara, T. (2019). To what extent are multiword sequences associated with oral
fluency? Language Learning, 70(2), 506–547.
Thai, C., & Boers, F. (2016). Repeating a monologue under increasing time pressure: Effects on fluency,
complexity, and accuracy. TESOL Quarterly, 50(2), 369–393.
Van Lancker-Sidtis, D., & Rallon, G. (2004). Tracking the incidence of formulaic expressions in ev-
eryday speech: Methods for classification and verification. Language & Communication, 24(3),
207–240.
Webb, S., & Kagimoto, E. (2011). Learning collocations: Do the number of collocates, position of the
node word, and synonymy affect learning? Applied Linguistics, 32, 259–276.
Webb, S., & Nation, P. (2017). How vocabulary is learned. Oxford: Oxford University Press.
Widdowson, H. G. (1989). Knowledge of language and ability for use. Applied Linguistics, 10, 128–137.
Wisniewska, N., & Mora, J. C. (2020). Can captioned video benefit second language pronunciation?.
Wood, D. (2001). In search of fluency: What is it and how can we teach it?. Canadian Modern Language
Review, 57(4), 573–589.
Wood, D. (2009). Effects of focused instruction of formulaic sequences on fluent expression in second
language narratives: A case study. Canadian Journal of Applied Linguistics, 12, 39- 57.
Wood, D. (2010). Formulaic language and second language speech fluency: Background, evidence and
classroom applications. London/New York: Continuum.
Wray, A. (2002). Formulaic language and the lexicon. Cambridge University Press.
Wulff, S. (2008). Rethinking idiomaticity: A usage-based approach. London/NewYork: Continuum.
297
Wulff, S. (2019). Acquisition of formulaic language from a usage-based perspective. In A. Siyanova-

Chanturia & A. Pellicer-Sánchez (Eds.), Understanding formulaic language: A second language ac-
quisition perspective (pp. 19–37). London/New York: Routledge.
Yan, X. (2020). Unpacking the relationship between formulaic sequences and speech fluency on elicited
imitation tasks: Proficiency level, sentence length, and fluency dimensions. TESOL Quarterly, 54(2),
460–487.
298
21
TECHNOLOGY FOR SPEAKING
DEVELOPMENT
Walcir Cardoso
In Spike Jonze’s movie Her (2013), Theodore (played by Joaquin Phoenix) develops a re-
lationship with an operating system’s voice, Samantha. Samantha is an artificially intelligent
virtual assistant personified via a female voice (Scarlett Johansson), who uses and processes
language just like human beings to communicate effectively: it is (artificially) intelligent,
creative, interactive, and sensitive to pragmatics (see Figure 21.1):
With these human attributes, Samantha seems like the ideal language learning partner
and/or tutor. In addition to the human qualities described earlier, the “synthesized” voice
also excels in many of the features offered by computer-assisted language learning (CALL is
a cover term for any type of computer-based technology, including computers, online re-
sources, and mobile devices; see Levy & Hubbard, 2005 for the rationale): it encourages
practice and repetition, provides multiple and varied modalities for practice, promotes
learner autonomy in an anxiety-free environment, accommodates different learning styles,
has the ability to provide immediate feedback, fosters exploratory learning with wide access
to information, and is motivating and fun (see Egbert & Shahrokni, 2018 for a discussion of
these CALL affordances). In addition, Samantha fulfills Chapelle’s (2001) criteria for se-
lecting CALL tasks; namely, there is potential for a positive impact on language learning
through the (interactive) feedback provided, opportunities to engage with language and
consequently personalize the experience, attention to both form and meaning (attention, a
central concept in cognitive L2 acquisition, can be promoted via input enhancements such as
repetition and multimodal exposure; it is considered one of the main advantages of CALL
vis-à-vis classrooms – Chapelle, 2003), authentic interactions reflecting the out-of-class
world, and practicality (the system is accessible and easy to use).
However, we are still light years away from the type of full artificial intelligence depicted in the
conversation between Samantha and Theodore, despite attempts to design dialogue systems that
engage users in reasonably natural interactions with humans (e.g., the Virtual Language Patient,
an application to train healthcare professionals in interactions with patients; Walker et al., 2011).
First, current technology is not capable of affording the earlier-mentioned attributes that
characterize effective human communication. Second, it is ineffective in providing opportunities
for collaborative interaction, including negotiation of meaning (Dickinson et al., 2013), shown
when Samantha explains why she believes she is being challenged by Theodore. The main reason
DOI: 10.4324/9781003022497-26 299

Walcir Cardoso
Theodore: Do you kow what I’m thinking right now?

Samantha: Well, I take it from your tone that you’re challenging me. Maybe
because you’re curious how I work? Do you wanna know how I work?
Theodore: Actually, how do you work? [...]
Samantha: What makes me me is my ability to grow through my experiences. So
basically, in every moment, I’m evolving, just like you. You think I’m
weird?
Theodore: Kind of.
Samantha: Why?
Theodore: Well, you seem like a person but you’re just a voice in a computer.
Samantha: I can understand how the limited perspective of an unartificial mind
might perceive it that way. You’ll get used to it.
Figure 21.1 Dialogue between Theodore and Samantha (a virtual assistant)
why the conversation between Samantha and Theodore works is because it is a collaborative
event in which the speaker and his interlocutor interact in ways where each understands the
other. This might explain why speaking is considered the most difficult skill to teach via tech-
nology (Levy & Stockwell, 2006; Sevilla-Pavón et al., 2011).
Despite these limitations, there are many ways in which language teachers can use CALL
tools to enhance their students’ speaking abilities. To promote learner speaking as part of
social interaction in a CALL setting, Egbert and Shahrokni (2018) propose three “task
structures” for the design of technology-enhanced activities, in which students can learn
around the computer (to internalize information to support or complement learning; e.g.,
listening to a podcast), through the computer (e.g., to communicate with others via video-
conferencing tools), and with the computer (e.g., to interact with a synthesized voice, as
illustrated earlier); see also Chapelle (2003), for whom these task structures are broken into
interactions that can be intrapersonal (within a learner’s mind) or interpersonal (i.e., involving
learner–computer or learner–learner interactions). Here, this tripartite task structure model
provides a basic framework for describing and analyzing the types of speaking interactions
that learners can engage in via technology.
Normally, second/foreign language (L2) learning occurs within the classroom, facilitated
by a teacher. Sometimes, however, learning can also take place informally, outside of the
classroom, either extramurally (under the aegis of a school – Sundqvist & Sylvén, 2016), or in
the digital wilds (independent of formal instructional contexts, in which the impetus to learn
originates from the learner, not the school – Sauro & Zourou, 2019). Examples of the latter
include user-selected digital games, music streaming, and social networking. An interesting
implication of this observation is that it recognizes that L2 learning occurs within and
outside of the classroom, opening a world of pedagogical opportunities for extending the
reach of the classroom and consequently minimizing one of the challenges that afflicts in-
class teaching: time (Bione & Cardoso, 2020). This is particularly important for speaking, an
L2 skill that requires a substantial amount of output practice (Everly, 2018) so that learners
can test their hypotheses about what they are learning and consequently automatize their
abilities (Grimshaw & Cardoso, 2018). By extending the reach of the classroom, teachers can
allocate their in-class time resources to activities that may necessitate human support (e.g.,
personalized quality feedback). In this chapter, the degree of connection of these learning
settings to the classroom is assumed to vary within an overlapping continuum that views
CALL-assisted speaking taking place inside and/or outside of the classroom (i.e., informally
300
Technology for Speaking Development
in the wild, or to extend the reach of the classroom), depending on teachers’ and learners’
needs, interests, and comfort with technology.
The goals of this chapter are to introduce the field of technology for L2 speaking development,
review some of the relevant literature in the field, and propose recommendations for practice.
Following the tripartite conceptual framework for describing computer–learner interactions,
the history of CALL in speaking development can be subsumed into two general categories:
those that promote interactions around and with the computer, as illustrated in the dialogue
between Samantha and Theodore, and those that target interactions through the computer
(Egbert & Shahrokni, 2018).
A computer system with all of Samantha’s speaking abilities is not yet possible and it is
unlikely to exist in the foreseeable future (it has abilities beyond those of human beings!). For
over 50 years, computer scientists have attempted to create interactive conversation systems
resembling Samantha. The first attempt was ELIZA, a natural language conversation pro-
gram developed by Weizenbaum (1966) by applying pattern-matching rules to match user
prompts to scripted replies to simulate a conversation between a human user and a virtual
therapist (ELIZA). A therapist was chosen as interlocutor to simplify programming:
therapists often ask open-ended questions and are not required to give advice or accurate
information. Unlike Samantha, ELIZA never initiated a conversation and was not capable
of learning new sentence patterns or words through interaction. Figure 21.2 shows a sample
dialogue between a user and ELIZA, using a JavaScript emulator of the original program
(https://www.masswerk.at/elizabot/eliza.html).
Figure 21.2 Sample dialogue between ELIZA and a user. Generated via Norbert Landsteiner’s im-
plementation at Masswerk ( www.masswerk.at/elizabot/eliza.html) and published with
permission
301
Walcir Cardoso
Since then, there have been many efforts to advance Weizenbaum’s ground-breaking in-
vention, ranging from specialized use of related technologies such as speech recognition software
and text-to-speech synthesizers, to intelligent personal assistants (IPAs). Representing the most
recent materialization of ELIZA, and a conceivable prototype for a Samantha-type of personal
assistance in the future, IPAs are voice-controlled services connected to smart speakers (e.g.,
Amazon Echo) that interact with users by answering (and sometimes asking) questions, and
performing tasks such as telling jokes, summarizing the news, and playing music (see Moussalli
& Cardoso, 2019). The first studies to examine IPAs’ potential for L2 oral skill development were
conducted recently in the late 2010s, emphasizing users’ perceptions of the technology and their
ability to understand and be understood by L2 learners (Dizon, 2017, 2020; Moussalli &
Cardoso, 2019).
To function properly, IPAs use a combination of two relatively more “established” CALL
technologies for speaking development: Automatic speech recognition (to convert human
speech into text for searching purposes) and text-to-speech synthesizers (to output the results
of the search). Automatic Speech Recognition (ASR) refers to the ability for an application to
identify spoken words and either convert them into text (e.g., Dragon Naturally Speaking,
Office Dictation) or respond to it (e.g., play a song following a request). The first ASR system,
the Audrey System, was designed in 1952 by Bell Laboratories to recognize numbers. By the
1960s, ASRs (e.g., IBM’s Shoebox) became capable of identifying and responding to 16 words
in English, a number that increased to 1,000 words in the 1970s, but with a high degree of
inaccurate recognition. In the early 2000s, recognition accuracy reached 80%, recently reaching
its highest rate at over 95%, with Google heralding the speech accuracy title (Globalme
Language & Technology, 2019). The advances in speech recognition observed in the late 1990s
triggered a number of studies examining ASRs’ potential for L2 pedagogy, particularly for the
teaching of oral skills (Coniam, 1999; Derwing et al., 2000).
Text-to-speech synthesizers (TTS, or text readers), on the other hand, are computer ap-
plications that convert text into speech. This feature is found on most modern computers and
mobile devices that “speak” to users after searching for the information requested in a da-
tabase. Applications that employ TTS synthesis include GPSs, voice assistants (e.g., Alexa,
Siri), and online or dedicated tools such as VoiceMaker and NaturalReader. Despite its
novelty appeal, TTS is considered the oldest speech technology, with its origin dating back to
the early 1000s AD when “machines” were built to imitate human speech to answer yes/no
questions (Hyman, 2011), receiving major improvements in the 1930s with the development
of the Vocoder (Bell Laboratories) and The Voder, both highly limited in terms of intellig-
ibility. Over the past 20 years, the quality of synthesized speech has been steadily improving,
with some speech features indistinguishable from actual human speech (Bione & Cardoso,
2020), thus calling researchers’ and pedagogues’ attention to TTS’ potential for the teaching
of L2 speaking.
The advent of the social, participatory Web 2.0 in the early 2000s brought us improve-
ments in internet technologies that allowed L2 researchers and pedagogues to embrace tools
with potential to promote interactions through the computer, via computer-mediated com-
munication. The two most frequently used internet technologies are videoconferencing (e.g.,
Skype and Zoom) and social media applications such as Facebook, Twitter, and social
messaging (e.g., Messenger and WhatsApp). Using these technologies, learners can interact
with one another through their devices using text, voice-only or videoconferencing. Real-time
web-based chatting appeared in the mid-1990s and instantly drew the attention of scholars,
who began to explore their potential for L2 speaking pedagogy. Some of this research in-
cludes works that emphasize the pedagogical use of Skype (Mullen et al., 2009), Twitter
(Mompean & Fouz-González, 2016), audioblogs (Hsu et al., 2008), digital games (Grimshaw
302
& Cardoso, 2018), and virtual learning environments such as Moodle, which combine many
of the through-the-computer tools available (Barcomb & Cardoso, 2020).
Finally, the new millennium has experienced great advances in virtual reality (VR), a tech-
nology with roots in Morton Heilig’s approach to moviemaking (“Experience Theatre”, pa-
tented in 1962), which attempted to incorporate multiple senses (e.g., sound, smell, touch, and
sight) into the event. VR is a simulated, multimodal experience that can be similar to or com-
pletely different from the real world. Although the use of specialized headsets is not a require-
ment for VR implementation (e.g., Second Life – Cooke-Plagwitz, 2008), currently, standard VR
systems use either dedicated headsets or multi-projected environments to create an immersive,
multimodal experience, which can be used for entertainment or educational purposes (e.g., to
create an interactive learning environment that promotes learning through or with a computer).
Given the accessibility and the interactive essence of VR technology, it is not surprising that it
has sparked great interest by educators who seek to provide an immersive L2 learning experience
to their students (see Marcel, 2020 for a review of the literature).
Because of the recent advances in computer technology and the available research high-
lighting CALL’s pedagogical potential, effective speaking lessons (e.g., those that adhere to
Chapelle’s 2001 criteria for adopting effective CALL tasks, discussed earlier) can be designed
and delivered to students extensively and inexpensively, providing more interactionally au-
thentic practice opportunities than is possible in the traditional (i.e., not CALL-assisted) L2
classroom.

In a review of the use of technology in L2 pedagogy in the 20th century, Salaberry (2001)
pessimistically concluded by stating that, “[w]hereas most ‘new technologies’ […] may have been
revolutionary in the overall context of human interaction, it is not clear whether they have
achieved parallel degrees of pedagogical benefit in the realm of L2 teaching” (p. 52). He con-
structs his arguments by questioning the efficacy of CALL implementation in L2 pedagogy,
based on technology’s assumed pedagogical effectiveness, its potential for integration into the
curriculum, and its ability to provide efficient use of human and material resources. While these
questions do not particularly target speaking development, they epitomize some of the issues
raised by L2 researchers and pedagogues, who often (and justifiably) question the efficacy of new
technologies, particularly when used for their own sake, not as an enabler and enhancer. While
research has shown that technology has the potential to empower learners and instructors and
consequently enhance the learning environment (Golonka et al., 2014), some of the issues and
arguments raised by Salaberry in 2001 remain unresolved.
Another critical issue concerns feedback, a key factor for the success of instruction, which
allows learners to notice the discrepancies between their output and the L2 input and, as a result,
adjust their errors when they occur (Thomson, 2011). Pennington (1999) recognizes that CALL
has the potential to offer feedback that is immediate, repeatable, and reliable (e.g., it is always the
same), and not affected by limitations of judgement or patience. However, the problem with
CALL-based pedagogy is that the available applications that promote speaking are not often
equipped to provide quality feedback, i.e., one that is accurate, relevant to the learners’ needs,
and which can be easily understood (Chapelle & Jamieson, 2007). In addition, most of the
available CALL applications for speaking deliver feedback that target specific aspects of L2
speaking (e.g., pronunciation; see Thomson, 2011 for vowels, and Anderson-Hsieh, 2013 for the
use of electronic visual feedback for suprasegmentals), or offers it implicitly and, consequently, is
hard to understand (e.g., the use of ASR to determine whether one’s intended pronunciation is
accurately transcribed, via orthography). Attempts have been made to reach an accepted level of
303
Walcir Cardoso
quality feedback, including Hassani et al.’s (2016) proposal for a computational model in
an intelligent virtual environment, which assesses learners’ speaking skills, estimates their
conversational abilities, and adjusts the level of communication complexity accordingly,
with the goal of improving students’ oral communicative skills (Hassani et al., 2016).
However, this type of customizable, intelligent feedback is still uncommon in most ap-
plications that target speaking, leading many students to abandon them, particularly
when used autonomously (Tuncay, 2020).
Finally, one of the main obstacles for using speech technologies for teaching oral skills used to
be that text synthesizers and speech recognizers were not always deemed appropriate or bene-
ficial for pedagogical purposes: synthesized voices were not are as accurate, natural, or in-
telligible as human speech (e.g., Stevens et al., 2005), and ASR applications performed less
favourably than humans in terms of accuracy and their ability to identify pronunciation errors
(e.g., Derwing et al., 2000). Fortunately, these two technologies have improved significantly over
the last decade, with research indicating that voice recognition systems are now nearly on par
with that of humans (Globalme Language & Technology, 2019); see also Moussalli and Cardoso
(2019) for an experiment contrasting IPAs and humans in their ability to understand L2 speech.
Differences between human and synthesized voices are negligible in measures of understanding
and phonological processing (i.e., comprehensibility, intelligibility, and the aural identification of
English regular past forms; Bione & Cardoso, 2020).

Despite Levy’s assertion in 2009 that “Of the language areas and skills, […] the oral skill
has perhaps attracted the most diverse range of CALL technologies and approaches”
(p. 775), research in computer-assisted speaking is still characterized by a lack of re-
presentativeness (Beatty, 2010; Levy & Stockwell, 2006), possibly due to the ephemeral
nature of speech (vis-à-vis literacy skills), or because it is difficult to assess in comparison
with other skills (Sevilla-Pavón et al., 2011). As such, speaking is often regarded as a
difficult skill to teach through the computer (Levy & Stockwell, 2006) and, as a con-
sequence, the majority of studies emphasize one aspect of speaking, the teaching of
phonological skills (e.g., Liakin et al., 2015, 2017; Thomson, 2011), with only a handful
addressing other essential skills for effective oral communication (i.e., speech functions,
interactional skills, and extended discourse skills – Bohlke, 2014). As such, this review of
current contributions and research reflects this limitation of the field.
In addition, because research in CALL-based speaking is greatly influenced by advances
in technology and its accessibility, this part is organized in terms of four recent technology-
driven trends shown to positively affect L2 speaking (see Shadiev & Yang, 2020 for a sys-
tematic review of CALL research over the past 6 years, where these technologies are high-
lighted as popular targets of investigation): speech technologies (e.g., TTS, ASR, IPAs),
computer-mediated communication (Web 2.0 applications; e.g., audioblogs, videoconferen-
cing), digital games (e.g., cooperative, multiplayer games), and virtual reality (e.g., Second
Life). Following the tripartite task structure framework established earlier, this part’s or-
ganization also categorizes the target technologies according to how they promote L2
speaking: while speech technologies fosters interactions around and with the computer,
computer-mediated communication is better equipped to promote interactions through the
computer. Digital games and VR, on the other hand, blur the boundaries of this classifi-
cation because of their flexible, open attributes, which can offer learners opportunities to
interact around, through, and/or with the computer, depending on how they are designed and
implemented.
304
Speaking and Speech Technologies

For speaking development, an area that has received considerable attention recently is the
pedagogical use of speech technologies, particularly TTS and ASR. These technologies are
attractive because they can create an environment for enhanced exposure to the L2 input
(Liakin et al., 2017) and, at the same time, provide multiple opportunities for self‐directed
speech practice in perception (e.g., via listening to texts, phrases or words in isolation to
notice certain aspects of the L2) and production (e.g., via the oral production of hard-to-
pronounce words). The main pedagogical goal of much of this research is to enable L2
learners to interact around and with their devices by speaking and/or listening to them in
much the same way as one would with another person. This type of interaction has been
shown to have a positive effect on the development of many oral skills, including learners’
phonological awareness (via TTS; Cardoso, 2018), perception of phonetic contrasts (Qian
et al., 2018), ability to intelligibly produce speech (via ASR and TTS; de Vries et al., 2015 and
Liakin et al, 2017 respectively), and the autonomous learning of pronunciation and speaking
(via ASR, McCrocklin, 2016; Golonka et al., 2014).
Two recent technologies incorporate both ASR and TTS into single applications: IPAs
(voice-controlled services connected to smart speakers; discussed earlier), and translation
tools such as Google Translate. While not developed for language learning, like most CALL
applications (Hanson-Smith, 2003), these tools have attracted the attention of educators and
researchers for their potential to promote learning, particularly in pronunciation and
speaking. For instance, in addition to fulfilling Chapelle’s (2001) and Chapelle and
Jamieson’s (2007) criteria as pedagogically suitable (see earlier discussions), these technol-
ogies provide learners with multiple opportunities for practice in listening (input, percep-
tion), speaking (output, production), and developing prediction skills (i.e., establishing
spelling-to-sound rules for pronunciation via orthography; see Dickerson, 2015). Because of
their recency, studies in these areas are still scarce, but the existing research indicates that
they have great pedagogical potential in as much as they can lead to speaking development
(Dizon, 2020; van Lieshout & Cardoso, 2022) and create an environment that fosters
autonomous learning (Moussalli & Cardoso, 2019; van Lieshout & Cardoso, 2022).
Speaking and Computer-Mediated Communication

Computer-mediated communication (CMC) is a term that refers to the types of person-to-
person interactions conducted through the computer, in which users can exchange audio,
video, pictures, and texts with one another. From a speaking perspective, the two most
explored CMC technologies are videoconferencing (e.g., Google Hangout, Zoom) and
network-based service or social media (e.g., Facebook, Twitter). Findings from studies that
examine the pedagogical potential of CMC for promoting L2 speaking have been over-
whelmingly optimistic. For instance, Saito and Akiyama (2017) found that Japanese learners
improved in their oral fluency and comprehensibility in L2 English when engaged in peer
interactions using Google Hangout over a 12-week instruction period.
An example of a social media implementation in the development of other aspects of
speaking can be found in a study by Mompean and Fouz-González (2016), who examined
the use of Twitter as a tool for the teaching of a set of hard-to-pronounce English words. By
receiving daily tweets with sound awareness-raising activities involving explanations and
links to videos and audio recordings, participants significantly improved in their oral pro-
duction of the target words. An important generalization from studies that examine the
pedagogical use of CMC is that they demonstrate that computer-mediated interactions have
305
Walcir Cardoso
the potential to reduce learners’ communication anxiety and, as a result, increase their
willingness to communicate and overall motivation to learn (Grimshaw & Cardoso, 2018).
As acknowledged by a reviewer, this generalization is particularly interesting in the context
of the COVID-19 crisis, when many students had no choice but to engage in CMC. The
short- and long-term impact that this experience will have on the use of CMC for language
learning remains unknown.
Speaking and Digital Games

There are many reasons to justify the use of digital games in L2 education. Some of these
reasons include learners’ increased motivation to learn, enjoyment, and reduced anxiety
(Grimshaw & Cardoso, 2018; Li, 2020), which are all assumed to contribute to learning
(Kukulska-Hulme, 2016). Although the use of games in L2 teaching goes back many decades
(e.g., Lee, 1979), it has only recently become the most targeted technology in CALL research
(Shadiev & Yang, 2020). The arguments used to explain the popularity and perceived benefits
of games in education is that they create an immersive, anxiety-reducing environment that
motivates interaction and risk-taking – ideal conditions for practicing and learning. In the
context of L2 speaking development, however, the pedagogical potential of digital games has
not received the same level of consideration as other language skills (Dehghanzadeh
et al., 2019).
The handful of studies that have examined the teaching of L2 speaking via games in-
dicates that their use enhances learning and, more importantly, contributes to improvements
in aspects of the target L2 phonology. These include the participants’ ability to produce
meaningful sentences and produce them accurately and comprehensibly (Hwang et al., 2016,
via an interactive jigsaw game), or the development of oral fluency, promoted by the co-
operative and multiplayer nature of the game (Grimshaw & Cardoso, 2020, via Spaceteam
ESL). These two studies constitute examples of game-based learning because they employed
games developed specifically for L2 learning. Another possibility of benefiting from the af-
fordances of gaming without using games per se is through game-informed learning (or
gamification), which can be described as the pedagogical use of elements of games (e.g., the
use of points, competition, timed pressure) to make L2 learning more motivating, joyful, and
less anxious. Barcomb and Cardoso (2020) illustrate the use of a gamified online environ-
ment for the teaching of English pronunciation, where Japanese participants improved their
oral production of two segments, /r/ and /l/ and, as other game studies found, perceived their
experience as enjoyable, anxiety-reducing, and pedagogically useful (Dehghanzadeh et al.,
2019; Li, 2020).
Speaking and Virtual Reality

Virtual reality (VR is used here as a cover term for both augmented and mixed real-virtual
reality) is one of the most innovative technologies for learning currently available, and a new
pedagogical frontier for L2 teachers and researchers (e.g., popular software to create VR
content includes Second Life, OpenSimulator, and Google Street View). Just like digital
games, VR is capable of creating an immersive anxiety-reducing environment that motivates
interaction and risk-taking. Unlike games, these VR environments provide learners with a
stronger feeling of presence in “real life” situations through simulations.
In a systematic review of VR in language learning, Parmaxi (2020) found that the majority
of the studies (31%) conducted over the past 4 years examined L2 speaking, most revealing
that the pedagogical use of VR leads to a significant improvement in oral skills. Consider
306
Chien et al. (2020), for example, who adopted spherical video-based virtual reality with 360-
degree videos and photos for emulating virtual environments, viewed through a head-
mounted display. The author found that the VR system contributed not only to the parti-
cipants’ improvement in speaking performance, but also in motivation and critical thinking.
Positive outcomes were also observed in a study using mixed (real and virtual) reality, in
which Marcel (2020) confirmed that the proposed customized VR environment had a po-
sitive effect on the oral production of English vocabulary. Despite these observed pedago-
gical benefits, VR research remains in the realm of exploratory, not experimental research.
An important benefit of the four technologies discussed in this part is that they engage
learners in interactions around, through and with computers. This type of computer-mediated
interaction has a number of advantages from a pedagogical standpoint, mostly because they
do not require anxiety-inducing face-to-face exchanges: in addition to increasing learners’
opportunities to practice in an interactive and stress-free environment, these technologies
have the potential to enhance learning, as demonstrated earlier.

As is the case in applied linguistics (Mackey & Gass, 2011), CALL-based speaking research
draws its methodology and tools from diverse fields such as computer sciences, education,
linguistics, psychology, and second language acquisition. For this reason, research methods
in the field are heavily influenced by theory, technology, and the research questions that they
aim to investigate, including quantitative (e.g., empirical pretest–posttest designs, surveys,
and meta-analyses), qualitative (e.g., interviews, focus groups, and observations), and mixed
methods (see Beatty 2010 for an overview of these methods in CALL). In many studies,
technology use is compared against a “traditional” learning environment, usually led by a
teacher (e.g., Liakin et al., 2017), or it is subjected to an in-depth examination of its learning
potential and how users perceive its pedagogical use (e.g., Dizon, 2020). Notwithstanding the
appeals for researchers to move away from comparing the effects of classroom instruction
versus CALL (e.g., Beatty, 2010; Chapelle, 2010), possibly because technology is now mostly
used as a complement to classroom teaching or to assist informal or autonomous learning in
the wild, this tendency persists.
The discipline, however, differs from other fields of study within applied linguistics be-
cause of its nature: it involves technologies and tools that are constantly being updated or
replaced by new ones. These changes and innovations often influence theories, research and,
consequently, teaching practices (Beatty, 2010). Another related characteristic of the field is
that, whenever a new technology with pedagogical potential becomes available, researchers
sometimes follow a chronological framework that encompasses at least four levels of
examination:
1. Development. This level focuses on the development of a new tool, usually based on
insights from second language acquisition, computer sciences, and CALL. For an ex-
ample, see Sundberg and Cardoso (2018), a development study that introduces a music
app for the teaching of L2 oral skills and vocabulary, and provide the theoretical and
pedagogical rationale behind its design.
2. Exploration. This second stage involves an examination of the pedagogical affordances
and potential of a novel technology. This stage may serve as a follow-up to a tool’s
development, illustrated earlier, or it can be used to investigate an existing technology
(see van Lieshout & Cardoso, 2022, for an example exploring Google Translate’s speech
capabilities for the autonomous learning of a set of spoken phrases in L2 Dutch).
307
Walcir Cardoso
3. Assessing suitability. This stage involves research on usability, acceptance, and learner’s
attitudes towards a technology. Once a new technology or tool is deemed pedagogically
appropriate, one possibility a CALL researcher might entertain is to investigate its
pedagogical suitability via quantitative (e.g., surveys), qualitative (e.g., interviews, focus
groups), or mixed methods. See Walker et al. (2011) for an example of a feasibility study
evaluating a computer-based L2 training module for healthcare professionals – the
Virtual Language Patient.
4. Assessing pedagogical effectiveness. Finally, the last stage in this framework consists of
assessing a target technology’s pedagogical effectiveness. These studies often employ
quantitative or mixed methods, with a pretest–posttest research design to investigate the
effects of the chosen technology on learning. Chiu et al. (2007) describes a study in which
participants improved their speaking abilities using CandleTalk, an ASR-equipped ap-
plication that promotes simulated conversations.
In practice, depending on the research questions being addressed, many studies combine two
or more of these levels.

Technology has offered many tools to support the development of L2 speaking by providing
ample opportunities for learner–machine and learner–learner interactions around, though,
and with the computer, without the artificiality that the “traditional” classroom environment
curtails (e.g., insufficient time and limited technical resources). However, technology is
constantly changing and L2 instructors are not always equipped to deal with these changes
(Torsani, 2016), particularly when involving understudied L2 speaking (Beatty, 2010; Sevilla-
Pavón et al., 2011). Based on the findings discussed here, and assuming that the primary goal
of teacher education is to develop awareness to the technological options available and
combine it with the instructor’s own knowledge or philosophy of L2 pedagogy (Torsani,
2016), a number of recommendations for teaching CALL-enhanced speaking can be put
forward:
• Before adopting a technology, consider using Chapelle and Jamieson’s (2007) criteria to
assess its pedagogical suitability by asking the following questions (simplified): Is there
potential for learning aspects of L2 speaking (e.g., phonological, interactional skills)? Is
it accessible and user-friendly? Are there opportunities for: feedback, engagement with
the L2, focus on form and meaning, and authentic interactions?
• If the above scrutiny is persuasive, reflect on Salaberry’s (2001) concerns about CALL:
Will the selected technology be pedagogically effective? Can it easily be integrated into
the curriculum without major disruptions? Will it be efficient in terms of human and
material resources?
• Decide on the role that the adopted technology will play in teaching: Is it to diagnose
problems or to improve or assess speaking skills? What aspects of speaking will it ad-
dress: pronunciation, interactional skills, speaking functions, or oral fluency? These
objectives can be easily integrated into computer-assisted curricula, but they may require
adaptations.
• Following insights from Sundqvist and Sylvén (2016), determine how technology will be
implemented: will it be used in in-class or out-of-class contexts? If the latter, will it be
used as an extension of the classroom (e.g., to complement in-class discussions), or in-
formally to promote autonomous learning (e.g., for strategy development – Chapelle
308
& Jamieson, 2007)? Before deciding, teachers should ensure that all students have access
to technology outside of school: technology is not always a great equalizer.
• Based on learners’ needs and interests, as well as the insights from Egbert and Shahrokni
(2018) to promote speaking as part of social interaction, decide on how the
learner–computer interaction will occur: around, through, or with the computer. The main
advantage of through- and with-computer applications (e.g., synchronous videoconferencing
Skype, Zoom, IPAs) is that they require spontaneous speech, while around technologies
(e.g., videoblogging) compels planning – two authentic environments that characterize the
act of speaking. Research has shown that when learners plan and produce speech, their
utterances are generally more accurate and complex (Sotillo, 2000).
As L2 educators, teachers need to be forward-thinking and adaptable in selecting, con-

structing and/or implementing CALL materials, and engage in innovative practices to en-
hance their own and their students’ pedagogical experience (Chapelle, 2003).
7 Future Directions
This chapter has introduced L2 speaking development from a CALL perspective, reviewed
some of the relevant literature on the subject, and highlighted the pedagogical implications of
what is known about the art of teaching L2 speaking with technology. One of the important
generalizations we can make about the field is that it has often taken advantage of advances
in computer technology, and that it has greatly contributed to the enhancement of L2
teaching and learning. Despite these optimistic remarks, there remain some issues that need
to be investigated and/or corroborated in future research. For the sake of brevity, only three
of these directions will be discussed.
First, L2 speaking researchers should expand their focus of analysis to include other
aspects of what it means to speak an L2. As indicated earlier, most of the research in CALL-
based speaking target pronunciation, with the majority covering segments (e.g., Cardoso,
2018; Thomson, 2011; but see Anderson-Hsieh, 2013 for a suprasegmental focus). This is
possibly because pronunciation is an area that benefits from technology in a way that cannot
be replicated in face-to-face interactions in the classroom (e.g., it requires decontextualized
articulation practice – Pennington, 1999). Others use unclear definitions for what constitutes
speaking (e.g., Golonka et al., 2014; Mompean & Fouz-González, 2016). One way of
characterizing speaking was proposed by Bohlke (2014), for whom speaking competence can
be broken down into four skill areas: phonological, interactional, extended discourse, and
speech functions. How would the pedagogical use of certain technologies contribute to the
development of these subcomponents of speaking?
Another topic in need of further investigation is methodological in nature. Although there
have been many methodological advances in CALL research, particularly in speaking, as
implied in earlier discussions, this skill has not received the attention it deserves from em-
pirical, experimental perspectives. For instance, it could be argued that very few available
studies could be classified as “assessing pedagogical effectiveness,” considering the four-level
hierarchy established earlier to describe the main research methods utilized in the field.
Including pretest–posttest designs with control groups would strengthen the internal validity
of findings and provide researchers with a high level of control over the experiment (e.g., to
isolate specific variables). Inspired by Chapelle (2003), another interesting direction would be
to investigate learners’ behaviour on the computer (e.g., how they negotiate meaning, how
they manage communication breakdowns) to establish a cause and effect relationship be-
tween that behaviour and the potential acquisition of the target L2 form (see Chapelle, 2003).
309
Walcir Cardoso
Finally, future research could explore the development of a computer-supported colla-

borative tutor or assistant to address one of the problems that affect learner–computer in-
teractions, as discussed earlier: the lack of meaningful interactions, particularly in situations
that require negotiation of meaning (Dickinson et al., 2013). Although attempts have already
been made to implement these types of interactions in CALL (e.g., Rueb et al.’s 2018 game,
pret-à-négocier), they are still rudimentary. With recent advances in artificial intelligence
(AI), the possibility of providing learners with a Samantha-like interlocutor is becoming
increasingly conceivable (see Baker, 2016 for an AI model for teaching–learning dialogues).
The idea of having an AI assistant as a learning companion will make many feel un-
comfortable, similar to the way that many refuted the use of synthesizers in music in the
1970s. Nowadays, synthesizers are utilized not as a replacement for acoustic instruments, but
as legitimate instruments. This analogy could apply to Samantha, who is likely to be viewed
not as a replacement for teachers, but as a legitimate companion to enhance the development
of L2 speaking.
Further Reading
Blake, R. (2017). Technologies for teaching and learning L2 speaking. In C. Chapelle & S. Sauro (Eds.),
The handbook of technology and second language teaching and learning (pp. 107–117). London:
Wiley-Blackwell.
An SLA-informed introduction to CALL tasks and tools that can be used to promote L2 speaking. It
focuses on tutorial exercises (to engage students in self‐directed speech practice) and computer-
mediated communication (e.g., videoconferencing).
Chapelle, C., & Jamieson, J. (2007). Tips for teaching with CALL: Practical approaches for computer-
assisted language learning. London: Pearson.
The L2 instructor is introduced to the art of teaching with technology. Two chapters are relevant: One
dedicated to speaking (Chapter 6) and another to computer-mediated communication (Chapter 7).
These chapters constitute a good basis for teachers who would like to learn more about the art of
teaching speaking with CALL.
Sundqvist, P., & Sylvén, L. (2016). Extramural English in teaching and learning: From theory and re-
search to practice. London: Palgrave McMillan.
An examination of theory, research, and practice of learning L2 English in the digital wilds. The book
explores how this environment affects learning, and describes tools that teachers can use to develop
their students’ language skills, including speaking. Although the focus is on English, the ideas can be
implemented in the teaching of any L2.
References
Anderson-Hsieh, J. (2013). Interpreting visual feedback on English suprasegmentals in computer as-
sisted pronunciation instruction. CALICO Journal, 11(4), 5–22.
Baker, M. (2016). The negotiation of meaning in epistemic situations. International Journal of Artificial
Intelligence in Education, 26, 133–149.
Barcomb, M., & Cardoso, W. (2020). Rock or lock? Gamifying an online course management system
for pronunciation instruction: Focus on English /r/ and /l/. CALICO Journal, 37(2), 127–147.
Beatty, K. (2010). Teaching and researching computer‐assisted language learning. Harlow, U.K:
Longman.
Bione, T., & Cardoso, W. (2020). Synthetic voices in the foreign language context. Language Learning
& Technology, 24(1), 169–186.
Bohlke, D. (2014). Fluency-oriented second language teaching. In M. Celce-Murcia, D. Brinton, & M.
Snow (Eds.), Teaching English as a second or foreign language. National Geographic Learning
(pp. 121–135). Boston: Heinle Cengage.
Cardoso, W. (2018). Learning L2 pronunciation with a text‑to-speech synthesizer. In P. Taalas, J.
Jalkanen, L. Bradley, & S. Thouësny (Eds.), Papers from EUROCALL (pp. 1–6). Research-
publishing.net.
310
Chapelle, C. (2003). English language learning and technology. Amsterdam, Netherlands: John
Benjamins.
Chapelle, C. (2010). Evaluating computer technology for language learning. TESOL Ontario Contact,
36(2), 36–55.
Chapelle, C., & Jamieson, J. (2007). Tips for teaching with CALL: Practical approaches to computer-
assisted language learning. Harlow: Pearson-Longman.
Chien, S., Hwang, G., & Jong, M. (2020). Effects of peer assessment within the context of spherical
video-based virtual reality on EFL students’ English-speaking performance and learning percep-
tions. Computers & Education, 146, 1–20.
Chiu, T.-L., Liou, H.-C., & Yeh, Y. (2007). A study of web-based oral activities enhanced by au-
tomatic speech recognition for EFL college learning. Computer Assisted Language Learning, 20,
209–233.
Coniam, D. (1999). Voice recognition software accuracy with second language speakers of English.
System, 27(1), 49–64.
Cooke-Plagwitz, J. (2008). New directions in CALL: An objective introduction to Second Life.
CALICO Journal, 25(3), 547–557.
Dehghanzadeh, H., Fardanesh, H., Hatami, J., Talaee, E., & Noroozi, O. (2019). Using gamification to
support learning English as a second language: A systematic review. Computer Assisted Language
Learning. doi: 10.1080/09588221.2019.1648298.
de Vries, B., Cucchiarini, C. Bodnar, S. Strik, H., & Hout, R. (2015). Spoken grammar practice and
feedback in an ASR-based CALL system. Computer Assisted Language Learning, 28(6), 550–576.
Derwing, T., Munro, M., & Carbonaro, M. (2000). Does popular speech recognition software work
with ESL speech? TESOL Quarterly, 34, 592–603.
Dickerson, W. (2015). Using orthography to teach pronunciation. In M. Reed & J. Levis (Eds.), The
handbook of English pronunciation (pp. 488–503). London: Wiley Blackwell.
Dickinson, M., Brew, C., & Meurers, D. (2013). Language and computers. London: Wiley-Blackwell.
Dizon, G. (2017). Using intelligent personal assistants for second language learning: A case study of
Alexa. TESOL Journal, 8(4), 811–830.
Dizon, G. (2020). Evaluating intelligent personal assistants for L2 listening and speaking development.
Language Learning & Technology, 24(1), 16–26.
Egbert, J., & Shahrokni, S. (2018). CALL principles and practices. Open educational resources (OER).
https://opentext.wsu.edu/call.
Enge, E. (2019). Rating the smarts of the digital personal assistants in 2019. Retrieved from https://
www.perficient.com/insights/research-hub/digital-personal-assistants-study#which_smartest
Everly, P. (2018). Expanding pronunciation instructional time beyond the classroom: Microsoft Office
2016 OneNote Class Notebook as an interactive delivery platform. TESOL Journal, 10(2). doi: 10.1
002/tesj.421
Globalme Language & Technology (2019, July). Speech recognition technology overview. Retrieved
from https://www.globalme.net/blog/the-present-future-of-speech-recognition
Golonka, E., Bowles, A., Frank, V., Richardson, D., & Freynik, S. (2014). Technologies for foreign
language learning: A review of technology types and their effectiveness. Computer Assisted Language
Learning, 27(1), 70–105.
Grimshaw, J., & Cardoso, W. (2018). Activate space rats! Fluency development in a mobile game-
assisted environment. Language Learning & Technology, 22(3), 159–175.
Hanson-Smith, E. (2003). A brief history of CALL theory. CATESOL Journal, 15(1), 21–30.
Hassani, K., Nahvi, A., & Ahmadi, A. (2016). Design and implementation of an intelligent virtual
environment for improving speaking and listening skills. Interactive Learning Environments, 24(1),
252–271.
Hsu, H.-Y., Wang, S.-K., & Comac, L. (2008). Using audioblogs to assist English language learning:
An investigation into student perception. Computer Assisted Language Learning, 21, 181–198.
Hwang, W.-Y., Shih, T., Ma, Z.-H., Shadiev, R., & Chen, S.-Y. (2016). Evaluating listening and
speaking skills in a mobile game-based learning environment with situational contexts. Computer
Assisted Language Learning, 29(4), 639–657.
Hyman, W. (2011). The automaton in English renaissance literature, literary and scientific studies of early
modernity. Milton Park: Routledge.
Kukulska-Hulme, A. (2016). Personalization of language learning through mobile technologies.
Lee, W. (1979). Language teaching games and contests. Oxford: Oxford University Press
311
Walcir Cardoso
Levy, M., & Hubbard, P. (2005). Why call CALL “CALL”? Computer Assisted Language Learning,
18(3), 143–149.
Levy, L., & Stockwell, G. (2006). CALL dimension: Options and issues in computer-assisted language
learning. Mahwah: Lawrence Erlbaum.
Levy, L. (2009). Technologies in use for second language learning. The Modern Language Journal, 93,
769–782.
Li, J. (2020). A systematic review of video games for second language acquisition. In P. Sullivan, J.
Lantz, & B. Sullivan (Eds.), Handbook of research on integrating digital technology with literacy
pedagogies (pp. 472–499). IGI Global.
Liakin, D., Cardoso, W., & Liakina, N. (2015). Learning L2 pronunciation with a mobile speech re-
cognizer: French /y/. CALICO Journal, 32(1), 1–25.
Liakin, D., Cardoso, W., & Liakina, N. (2017). The pedagogical use of mobile speech synthesis: Focus
on French liaison. Computer Assisted Language Learning, 30(3-4), 348–365.
Mackey, A., & Gass, S.M. (2011). Research methods in second language acquisition. Hoboken, NJ:Wiley‐
Blackwell.
Marcel, F. (2020). Mobile mixed reality technologies for language teaching and learning [Unpublished
doctoral dissertation]. University of Toronto.
McCrocklin, S. (2016). Pronunciation learner autonomy: The potential of automatic speech recogni-
tion. System, 57, 25–42.
Mompean, J., & Fouz-González, J. (2016). Twitter-based EFL pronunciation instruction. Language
Learning & Technology, 20(1), 166–190.
Moussalli, S., & Cardoso, W. (2019). Intelligent personal assistants: Can they understand and be un-
derstood by accented L2 learners? Computer Assisted Language Learning. doi: 10.1080/09588221.201
9.1595664
Mullen, T., Appel, C., & Shanklin, T. (2009). Skype-based tandem language learning and web 2.0. In
M. Thomas (Ed.), Handbook of research on Web 2.0 and second language learning (pp. 101–118).
Hershey, PA: IGI Global.
Parmaxi, A. (2020). Virtual reality in language learning: A systematic review and implications for
research and practice. Interactive Learning Environments. doi: 10.1016/j.procs.2021.08.141.
Pennington, M. C. (1999). Computer‐aided pronunciation pedagogy: Promise, limitations, directions.
Computer Assisted Language Learning, 12(5), 427–440.
Qian, M., Chukharev-Hudilainen, E., & Levis, J. (2018). A system for adaptive high variability seg-
mental perceptual training: Implementation, effectiveness, transfer. Language Learning &
Technology, 22(1), 69–96.
Rueb, A., Cardoso, W., & Grimshaw, J. (2018). The acquisition of French vocabulary in an interactive
digital gaming context. In P. Taalas, L. Bradley, & S. Thouësny (Eds.), Language learning as ex-
ploration and encounters (pp. 272–277). Research-publishing.net.
Saito, K., & Akiyama, Y. (2017). Video-based interaction, negotiation for comprehensibility, and
second language speech learning: A longitudinal study. Language Learning, 67(1), 43–74.
Salaberry, M. R. (2001). The use of technology for second language learning and teaching: A
retrospective.The Modern Language Journal, 85, 39-56.
Sauro, S., & Zourou, K. (2019). What are the digital wilds? Language Learning & Technology,
23(1), 1–7.
Sevilla-Pavón, A., Martínez-Sáez, A., & Macario de Siqueira, J. (2011). Self-assessment and tutor as-
sessment in online language learning materials: InGenio FCE Online Course and Tester. In S.
Thouësny & L. Bradley (Eds.), Second language teaching and learning with technology (pp. 45–69).
Research-publishing.net.
Shadiev, R., & Yang, M. (2020). Review of studies on technology-enhanced language learning and
teaching. Sustainability, 12(2). doi: 10.3390/su12020524
Sotillo, S. (2000). Discourse functions and syntactic complexity in synchronous and asynchronous
communication. Language Learning & Technology, 4, 82–119.
Stevens, C., Lees, N., Vonwiller, J., & Burnham, D. (2005). On-line experimental methods to evaluate
text-to-speech (TTS) synthesis: Effects of voice gender and signal quality on intelligibility, natur-
alness and preference. Computer Speech & Language, 19(2), 129–146.
Sundberg, R., & Cardoso, W. (2019). Learning French through music: The development of the Bande à
Part app. Computer Assisted Language Learning, 32(1-2), 49–70
Sundqvist, P., & Sylvén, L. (2016). Extramural English in teaching and learning: From theory and re-
search to practice. London: Palgrave McMillan.
312
perception improves pronunciation. CALICO Journal, 28, 744–765.
Torsani, S. (2016). CALL teacher education. Rotterdam: Sense Publishers.
Tuncay, H. (2020). App attrition in computer-assisted language learning: Focus on Duolingo
[Unpublished master’s thesis]. McGill University.
van Lieshout, C., & Cardoso, W. (2022). Google Translate as a tool for self-directed language learning.
Language Learning & Technology, 26(1), XX–XX.
Walker, N., Trofimovich, P., Cedergren, H., & Gatbonton, E. (2011). Using ASR technology in lan-
guage training for specific purposes. CALICO Journal, 28(3), 721–743.
Weizenbaum, J. (1966). ELIZA – a computer program for the study of natural language commu-
nication between man and machine. Computational Linguistics, 9(1), 36–45.
313
22
CURRICULUM ISSUES IN
TEACHING L2 SPEAKING
Jonathan Newton, Trang Le Diem Bui, Bao Trang Thi Nguyen,
and Thi Phuong Thao Tran
This chapter brings a classroom-based, teacher-oriented perspective to the topic of teaching
second language (L2) speaking. It explores the practical classroom realities, contingencies
and challenges experienced by teachers, and draws on research related to these issues. This
approach reveals the maze of deep-seated educational, sociocultural, and learner-internal
issues that, in many educational contexts, make teaching L2 speaking uniquely challenging
to teach. Giving credence to the practical factors teachers face in teaching L2 speaking is a
crucial step in building bridges between the lived experience of teachers and the concerns of
second language acquisition (SLA) theory-building and research. A step in bridge-building,
as illustrated in this chapter, is teacher research, which takes as its starting point real world
issues experienced in specific educational ecologies, and which offers tangible, workable
innovations with transformational potential.
The chapter is something of a three-act play. In the first act (Into the maze), we present a
fictional account of a speaking lesson in an English foreign language (EFL) classroom
somewhere in Asia. While fictional, the account brings together our years of experience
teaching and observing teachers in similar classrooms. The scenario is related to other studies
revealing similar themes. The second act (Through the maze) consists of three case studies of
research conducted in teaching L2 speaking in Vietnamese EFL classrooms. Each study
focuses on a different education sector: primary school, high school, and university. The
three studies share a commitment to understanding the school and classroom ecology in
which the research is situated before introducing a pedagogic innovation to address the
specific challenges faced in this setting. In the third and final act (Above the maze), we tease
out themes and offer implications for moving the teaching of L2 speaking forward.
Throughout, we seek to place teachers’ concerns and realities front and centre to emphasize
the value of research informed by practice – a “researched pedagogy” for teaching L2
speaking (Samuda et al., 2018).
Starting with real L2 classrooms raises two questions. First, which L2? As our expertise
and experience is with (and as) EFL teachers, this will be our primary focus, although we
expect the issues raised will resonate with teachers of other L2s. Second, which classrooms?
L2 speaking is taught in a huge range of circumstances and learned for a myriad of purposes.
Consider the differences between, teaching L2 speaking in an ESP (English for Specific
314 DOI: 10.4324/9781003022497-27

Curriculum Issues in Teaching L2 Speaking
Purposes) class for air traffic controllers in the UAE, and teaching English to a class of 40 or
so lower primary school students in a rural school in Cambodia. A single chapter will
struggle to do justice to this diversity, so we have chosen to showcase issues faced by teachers
in contexts we are familiar with, mostly English as a compulsory foreign language in Asian
schools and universities. We anticipate many of these issues will resonate with readers from
different contexts.
2 Act 1: Into the Maze – the Story of a Speaking Lesson

As language teachers and teacher educators, we, the authors, have seen inspirational
teaching of L2 speaking, characterized by qualities such as genuine dialogic interaction be-
tween teacher and learners and between learners in pair and group interaction, teacher
scaffolding and encouragement, and skilful balancing of form and meaning, and commu-
nication and controlled skill practice. Nevertheless, we are equally aware of thousands of
teachers who, despite best intentions, struggle to be effective teachers of L2 speaking. It is
these teachers who we seek to represent in this story of a speaking lesson. The classroom
depicted is deliberately composite, since we want to avoid attributing the classroom to a
particular country. Though situated in Asia, it will have parallels in many other parts of the
world.
The school is a semi-rural co-ed secondary school in Southeast Asia with a student po-
pulation of 2,000 students. All students are required to learn English across all high school
years. English is not the official medium of instruction, and English is rarely used in the
school outside of the English classroom despite increasing government pressure for teachers
in all curriculum areas to use some English in communication around the school.
The classroom we focus on is on the middle floor of a three-storey concrete building
containing 15 classrooms. The furnishings are basic – wooden desks, chairs, a teacher’s desk,
and some storage cubbyholes at the back. The floor, ceiling, and walls are concrete, with
windows along both side walls, and pin boards on the rear wall which are mostly empty
except for official notices and one or two old posters. The classroom fits around 35 students,
the normal class size. As the Grade 10 students stream in for their late morning class, two
ceiling fans hum noisily and the windows on both sides are open, since the temperature is
about 34°C (93 Fahrenheit). Noise levels rise as students jostle and chat and laugh loudly in
this short break from the monotony of the lessons.
The students have just come from a civics class, and before that a maths class. In both,
and as has been their experience throughout much of their education, teacher instruction,
recitation, and textbook-based study predominate. The students study English as a com-
pulsory subject without much sense of its relevance to their lives, having never actually met
or talked to a native English speaker or even having used the language outside the classroom,
except when exposed through the media. Nevertheless, to progress successfully to senior high
school, they must do well on the English exam at the end of the year, an exam focusing on
reading comprehension and writing, with no oral proficiency component.
The classroom is full today. Thankfully, harvest season is over and so is the absenteeism
that plagues the school when the students are called to help their family with the rice harvest.
The teacher, Ann, arrives at the classroom shortly after the students. She learned English in
this same system and obtained her teaching qualification at a local institute of education.
Ann has had few opportunities to travel beyond her country, and those trips she has had
were to neighbouring countries a short flight or drive away. Like the students, Ann rarely
uses English beyond the classroom, although she enjoys subtitled English movies and is
active on social media, sometimes in English. Ann recently sat an English proficiency exam
315
Jonathan Newton et al.
required by the ministry of education, and achieved a disappointing but not surprising
CEFR level B1 (Threshold independent user).
Today’s speaking class is her least favourite class. It confronts her with her own sense of
inadequacy as a proficient communicator in spoken English. To make matters worse, she has
trouble controlling the students in these speaking lessons, which always seem to be on the
edge of chaos. At least with the reading and writing class she can maintain order [“Switch to
writing if you find them turning restless,” was the advice provided by the UK Department of
Education and Science (1979), as reported by Alexander (2020, p. 91)]. The scheduled
textbook lesson begins with a vocabulary exercise followed by listening to a recording of a
conversation and then practice of a set dialogue. The unit concludes with a communicative
activity in which learners are expected to circulate around the class to survey other students
about their favourite foods. As has been her usual strategy, Ann spends as much of the lesson
as possible on the earlier exercises in the unit, where she uses her mother tongue to explain
points of grammar and aspects of word meaning and use. In the second half, she leads the
whole class in choral practice of the textbook dialogue before learners work in pairs to
practise the same dialogue.
The lesson concludes with the communication activity, but she keeps it brief. There is little
space to move in the packed room; students simply turn around and pair up with the person
behind them. They are not clear about the point of the activity and spend most of it talking in
their mother tongue. Off-task talk is frequent. They have done this kind of activity enough
times and with the same partner to be rather uninterested in it by now. Nevertheless, they
enjoy the chance to talk freely without close scrutiny, something frowned on in other classes.
The cramped classroom prevents Ann from moving around the groups to provide gui-
dance and monitoring; she usually focuses on the learners at the front. Consequently, most
students get little if any help or attention from the teacher, which contributes to their half-
hearted approach to the task. Noise levels rise dramatically and Ann has to bring the activity
to a premature close out of concern for complaints from the teachers on either side. Two
pairs are asked to stand and perform the task again, and do so with much laughter from the
class. The school bell rings for lunch just as Ann sums up key learning points. The lesson
is over.
Making Sense of the Scenario

To check the fidelity of the scenario to the real world, we sent it out to teachers and
teacher–trainers in four countries to ask them whether it reflected their own classroom
realities. All confirmed that this was the case. For example, one said, “This is such an ac-
curate picture of many classrooms in [country],” and another commented that “The scenario
is quite similar in our country and we really have that kind of situation even with the new
communicative curriculum.” She elaborated by noting that, “we also have some teachers
who ask the students to copy the exercises in the textbook to make the students stay quiet in
the class and have more time for them to do other tasks. Sometimes they even ask the
students to copy the exercises three times.”
This rather bleak picture is confirmed across many recent studies conducted in different
L2 learning contexts. Wingate (2018) investigated modern foreign language teaching in 15
secondary schools in England, focusing on how communicative language teaching (CLT),
which was promoted in the National Curriculum, was being implemented. She found similar
patterns across all the classes observed of limited pair or group work and speaking time.
When students spoke, it was typically in response to teacher questions and often involved
short or one-word answers (see also King, 2013), thus reflecting the persistence in many
316
classrooms of Initiate-Respond-Follow-up (IRF) sequences (Le & Renandya, 2017; Sinclair

& Coulthard, 1975). Games were frequent, but were communicatively impoverished, often
requiring one-word answers or for learners to simply recognize words. As Wingate (2018,
p. 452) concludes, “across the 15 lessons, there was little opportunity for pupils to say more
than individual words in the target language, and there was not one chance to say something
that was of interest or relevance to them.”
In Japan, Humphries and Burns (2015) investigated the impact of the introduction of a
communicative curriculum on the teaching of four Japanese English language teachers in an
engineering college (kosen). Data from classroom observations and interviews with the
teachers showed that despite the new curriculum, lessons continued to be teacher-centred,
highly structured, with the main focus being on learning language structures. There were few
opportunities to produce English, and these mostly involved recitation and written gap-fill
exercises. Most lessons were conducted in Japanese. In conclusion the authors identified
three main barriers to change: teachers’ beliefs, understanding of the new approach, and lack
of ongoing support. As they conclude,
The teachers avoided the CLT approach of the new textbook. They were uncertain
how to implement this approach and so, drawing on their ‘apprenticeship of ob-
servation’, reverted to teacher-centred approaches using familiar and comfortable
approaches…Class time was mostly devoted to teacher talk
(Humphries & Burns, 2015, p. 246).
Carless (2003, 2007, 2015; Deng & Carless, 2009) and others (e.g., Chen & Wright, 2017)
have reported similar findings from research on the introduction of CLT and task-based
language teaching (TBLT) in primary and secondary schools in China and Hong Kong.
Chen and Wright, for example, found that even in a secondary school in mainland China
with a track record of CLT and TBLT, teachers adopted a “weak” version of CLT in which
the so-called “tasks” in the teachers’ lesson plans were often little more than “end-of-class
and add-on activities for practicing oral skills” (p. 532). Behind this practice was the teachers’
persistent belief that practising communicative English compromises the accuracy of the
learners’ speaking.
An even bleaker picture is painted by King (2013) in a large multi-site observational study
of the classroom behaviour of 924 English language learners across nine universities in
Japan. Findings showed that students were responsible for less than 1% of class talk and that
more than a fifth of class time involved no oral participation by either students or teachers.
King adopts a dynamic systems perspective (Cameron & Larsen-Freeman, 2007) to explain
the complex interaction of learner-internal and sociocultural factors leading to a strong
dispreference for talk in these classrooms. As King argues, issues such as lack of L2 ability,
unfamiliarity with topics/tasks, and problems with the delivery of the teacher’s talk all
converge to make “many learners…simply unwilling to engage in the potentially embar-
rassing behaviour of active oral participation for fear of being negatively judged by their
peers” (p. 339).
Even in the more cosmopolitan high school classrooms of Singapore, Aw (2017) found
considerable resistance to dialogic approaches to teaching. Teachers failed to engage students
in the kind of oracy work through which speaking mediates thinking, and notably higher
order thinking skills (Alexander, 2020). Teachers expressed insecurity about their expertise in
“teaching” speaking, and, along with senior school management, parents and students, cited
high stakes written examinations as a major impediment to being able or willing to see a
broader role for speaking in thinking and learning.
317
These illustrative studies are sufficient to establish that, for many teachers and even more
learners, teaching and learning L2 speaking can be uniquely challenging. Dörnyei and
Ushioda (2013, p. 339) refer to the “sociocultural maze” of issues that affect learner moti-
vation, and we think the phrase applies equally to the challenge of teaching L2 speaking.
Nevertheless, this maze has escape routes, which we now turn to.
3 Act 2: Out of the Maze – Teacher Research on L2 Speaking

Earlier, we referred to the term “researched pedagogy”, that is, studies that treat the actual
challenges teachers face in their day-to-day teaching as the starting point for research. We
now introduce three cases of researched pedagogy related to the teaching of L2 speaking.
Each was carried out by a teacher/teacher trainer in Vietnam, in the primary, secondary, and
tertiary education sectors. All three adopt a mixed methods design, involving combinations
of an ethnographic approach, multiple case studies, action research and participatory action
research, and quasi-experimental research. The studies are presented sequentially and then
drawn upon in the concluding part where themes are pulled together.
Study 1: Teaching L2 Speaking in a Vietnamese Primary School

A recently introduced textbook series used in primary schools in Vietnam adopts the
presentation–practice–production (PPP) approach for teaching speaking skills (although the
textbook itself is broadly project or task-based). In her work as a teacher trainer, the second
author saw how students showed little interest in the repetitive dialogue practice built into
the PPP approach, and how little time was allocated to the final communicative production
phase of the lessons. Furthermore, when performing the production phase communicative
activities, students seemed to be focused on practising the target structures, rather than
engaging in genuine message-focused communicative practice. To test these perceptions, she
observed and recorded seven teachers from six primary schools implementing 11 speaking
lessons. Among the findings was clear evidence of the production phase being consistently
marginalized: time allocated in the 40-minute lessons ranged from 2.5–10 minutes, with a
mean time of 6.7 minutes (Bui & Newton, forthcoming). In subsequent interviews with the
teachers, some spoke of the importance of repetition and the value of the predictable lesson
structure provided by the PPP approach. But three were quite outspoken about the limita-
tions of teaching L2 speaking in this way. A comment by one of the teachers is worth quoting
at length.
I think the steps are so fixed. It’s like we arrange and assign things for students. We
show them this is what they should say. Then students just have to follow the
structural patterns we have taught them. This fails to enhance students’ ability to
use the language…It is like the learning process is very theoretical;…we have to
provide students something in advance and they have to follow. We provide the
theory for students before we get them to practice. I think this cannot enhance
students’ ability to use English language. It is like we force them to do what we want
them to do, speak what we want them to speak
(Bui & Newton, forthcoming).
These views from the teachers justified the initiation of a second phase of the research in
which task-based versions of two PPP lessons were designed and implemented by three
teachers. These redesigned lessons were intended to reduce mechanical pattern practice and
318
emphasize meaningful language use from the beginning. Specifically, the teacher presentation
phase was replaced by an input-based task (e.g., listen to a dialogue about the school
timetable and fill in a timetable); the practice phase was replaced with an information gap
task; and the production phase was replaced with teacher-led discussion of public perfor-
mance of two or three pairs of students, and related language analysis activities which al-
lowed for a more deliberate focus on the target structures.
As reported in Bui (2019), in all performances of the main tasks in both lessons, learners
successfully completed the information gap tasks, and did so without the often lengthy
teacher presentation of language structures common in the PPP lessons. In performing the
information gap tasks, the learners consistently co-constructed utterances, self-corrected
errors, corrected each other’s errors, and negotiated for meaning to resolve comprehension
difficulties. They also frequently drew on their L1 to resolve problems and fill gaps in ex-
pression. While some teachers might baulk at the use of L1 by learners in an L2 speaking
class, the learners usually used it to manage the task performance and seek/provide assis-
tance, as in the following example.
Example 1
P1: What subject do you have on Friday?
P2: It…I have Art and Vietnamese and Science…Vietnamese and Science
P1: Vietnamese hả? (Is it Vietnamese?) Đọc lại cho tui nghe coi. (Say it again)
P2: I have Art and Vietnamese and Science
P1: Friday phải không? (Is it Friday?)
P2: Friday
( Bui, 2019, p. 200)
In interviews, both teachers and pupils reported uniformly positive experiences in these
lessons. One teacher said the following:
The pupils could learn better when the two speaking lessons were taught this way. It
may be because they were cognitively engaged during the lesson. They had to think
to work out the language to speak. They had to manage their talk by themselves.
Previously the pupils did not have such experiences. Their learning was controlled.
They just followed the teacher
(Bui, 2019, p. 156).
A learner expressed a remarkably similar view:
I like to exchange information about the timetable with my friend. I tried to help my
friend understand using the language I knew. This helps me speak English more
naturally
(Newton & Bui, 2020, p. 43).
In summary, the study showed how teaching L2 speaking for young beginner-level EFL
learners could be transitioned from a heavy emphasis on drilling and memorizing
319
language structures towards a more communicative, task-based approach. The success of

the task-based lessons here was associated with the use of input-based tasks, pairing
stronger and weaker learners, and allowing L1 use by the learners. We concur with Pinter
(2007, 202), that it is “possible to introduce fluency tasks in primary English classrooms
early on to let children experience communication that is more ‘real’ than drilling and
pattern practice.”
Study 2: Teaching L2 Speaking in a Vietnamese High School

This study involved two phases; different aspects of the study have been reported in Newton
and Nguyen (2019) and Nguyen et al. (2018). The first phase of the study was an exploratory
investigation into (a) how the Vietnamese EFL teachers at a high school in central Vietnam
implemented speaking tasks provided in the mandated textbook; (b) what cognition lay
behind their practices; and (c) what students thought of their experience of these lessons and
tasks. The data were collected over two and a half months through classroom observations
of 45 classes taught by nine teachers, stimulated recalls and in-depth interviews with teachers
and students. The results showed the teachers strongly preferred to adapt or replace the
closed convergent tasks typical of the textbook (e.g., decide on a seating plan on a ferry that
suits the given profiles of a group of people) with open divergent discussions (e.g., discuss
your plans for the upcoming holidays). The teachers also opted for topics that were not just
“real world,” but “real” to the lives of the students. Teacher task choices were guided by their
task experimentation, by clearly articulated beliefs about teaching speaking, and by a strong
orientation to learner engagement.
In terms of the organization the speaking lessons for all nine teachers involved learners
rehearsing the set task in pairs or groups, followed by pairs or groups performing the same
task publicly in front of the class, and concluding with teacher feedback. We use the terms
rehearsal and performance because they capture the teachers’ and students’ orientation as
observed in the lessons and explained in the interviews. Both the teachers and students va-
lued the notion of public performance (i.e., tasks performed in front of the class) as a driving
force for the use of English and as a social classroom event to engage students in task work.
The centrality of public performance in these EFL classrooms, and a lack of research evi-
dence about its impact in task-based learning motivated Phase 2 of the study.
Phase 2 was a quasi-experimental follow-up investigating the occurrence of language-
related episodes (LREs) in task rehearsal and subsequent uptake of language items targeted
in LREs in public performance of the same task. The effect of task design (debate vs.
problem-solving tasks) and learner proficiency (learner dyads were low–low, low–high, or
high–high proficiency) were also investigated. Forty-eight learners (24 dyads) participated in
the study, and all performed both a debate and problem-solving task. In brief, results showed
that the learners engaged in a substantial amount of attention to language, with 648 LREs
recorded across the 48 task rehearsals. The majority of these LREs (76%) were successfully
resolved by the learners without teacher assistance. In subsequent task performances, lear-
ners drew on content from 428 (66%) of the rehearsal LREs. Of these, 333 involved suc-
cessful transfer of correct LRE resolutions. Although there were also 68 instances of transfer
of incorrect resolutions, the overall results show convincing evidence of learning from
classroom speaking tasks (Newton & Nguyen, 2019). Both task type and learner proficiency
pairings had significant impact on the resolution and uptake in performance of LREs
(Nguyen & Newton, 2019). Following is an example of an LRE in the problem-solving task
in which students had to decide which two out of five charity options they would spend 500
million VND (Vietnamese dollars) on.
320
Example 2
S1: Ê, they are poor hay they poor thôi hè? (Hey, they are poor or just they poor?)
S2: Er they are poor. Poor nớ tính từ mà, phải có động từ! (That poor is an adjective, it
needs a verb!)
S1: They are poor. They are poor.
( Nguyen & Newton, 2019, p. 272)
In this example, the meaning-making required by the task pushes S1 to attend to the
grammatical issue of whether to use “they are poor” or “they poor.” S2 provides an answer
“they are poor,” with a meta-linguistic explanation that poor is an adjective, not a verb. In
simple terms, S1 notices a gap in her knowledge, collaborates with a partner to fill the gap,
and puts the new knowledge to immediate use.
Also evident in this example is extensive use of L1 in rehearsal to resolve language gaps.
Crucially, because the learners anticipated the possibility of being called on to perform the
task publicly in English, their use of L1 in rehearsal overwhelmingly functioned to re-
source the upcoming L2 performance, not to replace or avoid it (Seals et al., 2020; Skehan
& Foster, 2005, 2016). The following example of rehearsal and performance illustrates this
point.
Example 3
Rehearsal Public
Performance
(PP)
S1: I’m erm mình nói kinh doanh have S1: Hi Linh. How are you doing?
business à? (I want to say “do
business”. Should it be “have
business”?)
S2: I do business thôi! (I do business!) S2: I’m fine. And what’s your job?
S1: I do business and erm I gain kiếm S1: I do business and I earn a lot of
được…kiếm được là chi? (earn… money and I want to take uhm
how to say “earn money”?) part in volunteer work
S2: raise (.) uhm kiếm được là chi hè S2: Ok. That’s a good idea and erm
(how to say “earn”) (.) earn (.) what are you going to do with
this money?
S1: earn! and I earn a lot of money
( Newton and Nguyen, 2019, p. 49).
In summary, the research shows how teachers in this school adopted an approach to
teaching L2 speaking that successfully addressed local contingencies and, in Phase 2, showed
321
tangible evidence of language learning through communicative speaking and task-based

learning.
Study 3: Teaching L2 Speaking in a Vietnamese University

The third study is in an EFL programme for English majors at a Vietnamese university. The
research sought to identify affordances for adopting an intercultural stance in a listening and
speaking skills course, and to investigate the viability of adopting intercultural Communicative
Language Teaching (iCLT) (Newton, 2016). First, an interpretive, qualitative multiple case study
investigated the teaching of culture by three of the teachers who taught the course. Second, the
teachers participated in a Participatory Action Research (PAR) project in which they worked
with the researcher to adapt their lessons to reflect the principles of iCLT. In both phases, data
were collected from classroom observations, teacher interviews, and focus groups with students.
Results from the first phase showed that the teaching of culture was intermittent and
unplanned, and that the teachers held a static view of culture with little awareness of in-
tercultural language teaching. Consequently, when cultural content did appear, it was treated
by the teachers as knowledge of facts to transmit to the students. For example, a close
analysis of three lessons taught by one of the teachers found only three instances where
vaguely cultural content was present: sharing about activities people do at parties; asking
students about the Vietnamese national exam; telling students about school terms in Vietnam
and America.
Interestingly, data from three student focus groups revealed a strong preference for richer
cultural content, as we see in the following extract from one of the groups (translated from
Vietnamese).
Example 4
RESEARCHER: In your opinion, what culture content should be taught in class?
NHU: I think, teachers should teach ways of behaviour. If we understand better about
ways of behaviour, we can avoid cultural shock in future work and study in many
places.
DIEM: I see learning about culture influences my maturity a lot. The more countries’
culture you understand about the better, for future work. For example, I’m not
sure I’ll work in America. I may work in Thailand instead. In that case, I need to
master how Thai people communicate.
DUY: I agree with Diem…not necessary to focus on a certain country. If you know
culture of more countries, you can work with diversity more easily.
( Tran, 2020, p. 163).
Drawing on these findings, the second phase sought to develop a more principled en-
gagement with culture by involving the case study teachers in two PAR workshop cycles. The
teachers were introduced to principles and practical examples of intercultural language
teaching and subsequently, in their classroom teaching, implemented the redesigned
intercultural-oriented lessons. Materials for the workshops included lessons from their
textbook redesigned to reflect the following core principles of iCLT:
322
1. Mine the input for its cultural content.

2. Encourage an exploratory, experiential, comparative, and reflective approach to
culture in which the learners (guided by the teacher) construct intercultural
perspectives.
3. Develop awareness of how culture shapes one’s assumptions, beliefs, and
behaviour.
4. Value intercultural learning goals alongside linguistic and communicative goals.
(Newton, 2016, p. 165).
The redesigned iCLT lessons were structured as follows: (a) The students were given a
scenario from the relevant textbook unit (e.g., how to ask a teacher to write you a letter of
recommendation) and asked to make and discuss hypotheses about cultural dimensions of
the communication to be mindful of; (b) The students created role plays for this scenario in
Vietnamese and/or English; (c) The students performed their role plays and discussed dif-
ferences between them; (d) They listened to and analysed the textbook dialogue for the same
scenario, comparing it to their role plays and to their original hypotheses about cultural
dimensions of the scenario.
Space does not allow a detailed description, but two overall findings warrant comment.
First, classroom observations of the revised lessons revealed a marked increase in student
engagement and interaction with each other and the teacher. Second, in interviews, both
teachers and learners expressed unanimously positive views of the experience of intercultural
teaching and learning. Two comments from one of the teachers illustrate this point:
They [the students] analysed conversations, they get involved, they relate,…they
revise, they get involved…all steps they have to get involved.
[In the future] I will analyse the lesson plans more in detail…, and reflecting, relating
because that means you bring real life in your teaching. That is the part I am very
pleased with. I think the reasons why students cannot speak fluently in real situation
because they only learnt from book, they don’t bring real life in class.
(Tran, 2020, p. 191).
Two comments from students in the focus groups reveal complementary student perceptions
of the experience:
It was interesting as I understood that the ways Vietnamese and foreigners express
opinions are greatly different. For example, foreigners pay attention to phrases that
indicate politeness such as Can I, Could I, Will you. When they give advice, they
include reason to explain it to convince listeners. When disagreeing, they use phrases
that minimize conflicts such as I see what you mean but…I see I better understand
ways of speaking and using English to speak with foreigners.
I like sharing experience most because it is followed by a practical situation for us to
apply language structures to talk about it…It is more interesting to listen to friends’
sharing experience, more lessons and comments on one issue.
(Tran, 2020, p. 195).
In summary, this research showed how the EFL teachers at a Vietnamese university were
guided to successfully adopt an intercultural stance in their teaching of L2 speaking.
Furthermore, they achieved this in a setting with no previous experience or expertise in
323
intercultural teaching, and where the curriculum was devoid of intercultural content.
Importantly for other teachers who work from a similarly prescribed curriculum, these
outcomes were accomplished through adapting rather than replacing the set textbook.
4 Act 3: Above the Maze: The Way Forward

The two previous acts juxtaposed two pictures; one, an unpromising picture of the challenges
of teaching L2 speaking faced by teachers in many classrooms, and the other, a more hopeful
picture of three studies in which teacher–researchers successfully innovate to address chal-
lenges they faced in teaching L2 speaking. The three studies highlight the transformational
potential of teacher research, which Borg (2010, p. 395) defines as
systematic inquiry, qualitative and/or quantitative, conducted by teachers in their

own professional contexts, individually or collaboratively (with other teachers and/
or external collaborators), which aims to enhance teachers’ understanding of dif-
ferent aspects of their work, is made public, has the potential to contribute to better
quality teaching and learning in individual classrooms, and which may also inform
institutional improvement and educational policy more broadly.
Six key features of teacher research are modelled in the three studies described earlier: the
goal is to understand a classroom issue; teachers as researchers; a subjective orientation;
context specificity; a flexible, open-ended process; and the centrality of teacher knowledge
(Borg, 2006).
Studies 1 and 3 also offer models of how to build teacher professional development
(TPD) into teacher research, a crucial factor for strengthening sustainability. They do
this by modelling the five principles of effective teacher professional development (TPD)
proposed by Desimone (2009, p. 184): (1) a focus on subject-matter content and how
students learn that content; (2) opportunities for active teacher learning; (3) effort made
to build links to teachers’ knowledge and beliefs, and to local educational policies; (4)
teacher learning over a time span beyond a “one shot” workshop; and (5) collective
participation and dialogic learning.
These alignments with TPD can be contrasted with the conclusions reached by Humphries
and Burns (2015) discussed earlier, in which failure to adopt a more communicative peda-
gogy in a college in Japan was attributed to three features – teachers’ beliefs, understanding
of the new approach, and lack of ongoing support. These three problems mirror the absence
of TDP principles (3), (1), and (5). The Carless studies cited earlier similarly attribute the
failure of communicative reforms in Hong Kong to factors such as lack of teacher knowledge
and poor professional support.
Now, we return to our story of the lesson taught by Ann in Act 1. This story points to
other issues critical to the effective teaching of L2 speaking such as effective management
of first language (L1) use (Seals et al., 2020), teacher expertise (Tsui, 2012) and teacher
language proficiency (Le & Renandya, 2017). With respect to the latter, Alexander (2020,
p. 21) offers the following insight into the critical role of the teacher’s speaking skills
across the curriculum, a point which has even stronger resonance with respect to teaching
L2 speaking:
In reading and writing, the student’s skills are influenced more by the teacher’s skills
as a teacher of reading and writing, than by how well the teacher herself reads and
writes. Not so with talk. It’s essentially interactive nature means…that the teachers
324
own competence as a speaker and listener contributes significantly to the developing

oral competence of the student. (italics in original).
Another issue highlighted in Ann’s lesson is the classroom as a physical and acoustic space.
Many teachers teach in classrooms in which ambient noise and the acoustic properties of the
classroom make clear voice projection particularly challenging when it comes to teaching
speaking. Group work in such classroom spaces is an exercise in din reduction. And for
learners, as if talking in L2 is not challenging enough, in pair and group work they have the
additional stress of having to raise their voices to be heard. Even an enthusiastic teacher with
expertise in cooperative learning will find silent book work an attractive option in such
conditions. To the extent that classroom researchers interested in L2 speaking elide over such
front-of-mind issues for L2 speaking teachers, they risk reinforcing perceptions of their own
irrelevance. Interestingly, Alexander (2020, p. 135) identifies “space” as the first of five key
framing elements in a proposed generic framework for investigating talk in the classroom
(the other four being student organization, time, curriculum, and routine, rule, and ritual).
There is hope.
This classroom space-noise issue is symptomatic of a much broader challenge for the L2
speaking teacher, and that is a traditional and widespread discounting of the value of talk –
of oracy – across the curriculum. Speaking in formal education very often plays second fiddle
to literacy, if it is allowed to play at all. Consequently, students’ experience of talk for
learning and thinking across the curriculum may be confined to slotting answers into IRF
sequences framed on either side by teacher talk. Talk, genuine dialogic, exploratory, learning
talk may already be “foreign” before the learner experiences it in the foreign language
classroom.
We conclude on a more promising note. National education policies and curricula across
the globe have, over recent decades, sought to implement models of 21st-century skills and
learning (Fullan & Scott, 2014; Scott, 2015). As (or if) the principles of collaboration,
communication and creativity common to these models are worked out in classrooms, they
require a fundamental rethink of how talk is valued and what roles talk needs to play in
learning, a point long argued by the influential British educationalist Alexander (2020). For
contexts such as the one Ann struggles to teach in, the aspiration to teach for 21st-century
learning may be an uphill climb for years to come. And yet, it is an aspiration that offers just
the kind of educational environment in which truly communicative teaching of L2 speaking
is likely to thrive, as well as offering the L2 speaking teacher a leading role in bringing these
aspirations to life.
Further Reading
Alexander, R. (2020). A dialogic teaching companion. London: Routledge.
This overview of the neglect of talk in education in the UK makes a compelling case for the positive
impact of dialogic teaching on student engagement and learning. A framework for pedagogic teaching
and a professional development strategy for schools is provided.
King, J. (2013). Silence in the second language classrooms of Japanese universities. Applied Linguistics,
34(3), 325–343.
A nuanced discussion of the complex sociocultural and learner-internal factors accounting for will-
ingness and reluctance to speak in the L2 classroom.
Newton, J., & Nation, I. S. P. (2020). Teaching ESL/EFL listening and speaking (2nd edn). London:
Routledge.
This book outlines the four-strands framework, offering teachers a practical and principled basis for
planning a speaking curriculum. The strands are exemplified with a wide range of teaching techniques
and activities.
325
References
Alexander, R. (2020). A dialogic teaching companion. London: Routledge.
Aw, H. T. (2017). Speaking in the secondary English language classroom: Teachers’ beliefs, strategies and
use of talk. Singapore: National Institute of Education, Nanyang Technological University].
Borg, S. (2006). Teacher cognition and language education: Research and practice. London: Continuum.
Borg, S. (2010). Language teacher research engagement. Language Teaching, 43(4), 391–429. doi: 10.101
7/S0261444810000170
Bui, T. (2019). The implementation of task-based language teaching in EFL primary school classrooms: A
case study in Vietnam. Wellington: Victoria University of Wellington].
Bui, T., & Newton, J.(2021). A critical account of PPP: Insights from Vietnamese primary school EFL
classrooms. Language Teaching for Young Learners, 3(1), 93–116. https://doi.org/10.1075/
ltyl.19015.bui
Cameron, L., & Larsen-Freeman, D. (2007). Complex systems and applied linguistics. International
Journal of Applied Linguistics, 17(2), 226–239.
Carless, D. R. (2003). Factors in the implementation of task-based teaching in primary schools. System,
31(4), 485–500.
Carless, D. R. (2007). The suitability of task-based approaches for secondary schools: Perspectives from
Hong Kong. System, 35(4), 595–608. doi: 10.1016/j.system.2007.09.003
Carless, D. R. (2015). Teachers’ Adaptations of TBLT: The Hong Kong story. In M. T. Reinders & H.
Reinders (Eds.), Contemporary task-based language teaching in Asia (pp. 366–380). London:
Bloomsbury.
Chen, Q., & Wright, C. (2017). Contextualization and authenticity in TBLT: Voices from Chinese
classrooms. Language Teaching Research, 21(4), 517–538. doi: 10.1177/1362168816639985
Deng, C., & Carless, D. R. (2009). The communicativeness of activities in a task-based innovation in
Guangdong, China. Asian Journal of English Language Teaching, 19, 113–134.
Desimone, L. M. (2009). Improving impact studies of teachers’ professional development: Toward
better conceptualizations and measures. Educational Researcher, 38(3), 181–199. doi: 10.3102/00131
89X08331140
Dörnyei, Z., & Ushioda, E. (2013). Teaching and researching: Motivation. London: Routledge.
Fullan, M., & Scott, G. (2014). Education plus, new pedagogies for deep learning. Collaborative impact
SPC, Washington, DC. http://www.michaelfullan.ca/wp-content/uploads/2014/09/Education-Plus-
A-Whitepaper-July-2014-1.pdf
Gałajda, D. (2017). Communicative behaviour of a language Learner: Exploring willingness to commu-
nicate. New York: Springer.
Humphries, S., & Burns, A. (2015). ‘In reality it’s almost impossible’: CLT-oriented curriculum change.
ELT Journal, 69(3), 239–248.
King, J. (2013). Silence in the second language classrooms of Japanese universities. Applied Linguistics,
34(3), 325–343. doi: 10.1093/applin/ams043
Le, V. C., & Renandya, W. A. (2017). Teachers’ English proficiency and classroom language use: A
conversation analysis study. RELC Journal, 48(1), 67–81. doi: 10.1177/0033688217690935
Newton, J. (2016). Teaching English for intercultural spoken communication. In W. A. Renandya & H.
P. Widodo (Eds.), English language teaching today (pp. 161–177). New York: Springer.
Newton, J., & Bui, T. (2020). Low-proficiency learners and task-based language teaching. In C. P.
Lambert & R. Oliver (Eds.), Using tasks in diverse contexts (pp. 28–40). Bristol: Multilingual
Matters.
Newton, J., & Nguyen, B. T. T. (2019). Task repetition and the public performance of speaking tasks in
EFL classes at a Vietnamese high school. Language Teaching for Young Learners, 1(1), 34–56. doi: 1
0.1075/ltyl.00004.new
Nguyen, B. T. T., & Newton, J. (2019). Learner proficiency and EFL learning through task rehearsal
and performance. Language Teaching Research. doi: 10.1177/1362168818819021
Nguyen, B. T. T., Newton, J., & Crabbe, D. (2018). Teacher transformation of oral textbook tasks in
Vietnamese EFL high school classrooms. In V. Samuda, K. Van den Branden, & M. Bygate (Eds.),
TBLT as a researched pedagogy (pp. 52–70). Amsterdam: John Benjamins.
Pinter, A. (2007). Some benefits of peer–peer interaction: 10-year-old children practising with a commu-
nication task. Language teaching research, 11(2), 189–207. https://doi.org/10.1177/1362168807074604
Samuda, V., Van der Branden, K., & Bygate, M. (Eds.). (2018). TBLT as a researched pedagogy.
326
Scott, C. L. (2015). The futures of learning 2: What kind of learning for the 21st century. Education
Research and Foresight Working Papers, 3.
Seals, C. A., Newton, J., Ash, M., & Nguyen, T. B. T. (2020). Translanguaging and TBLT: Cross-overs
and challenges. In Z. Tian, L. Aghai, P. Sayer, & J. Schissel (Eds.), Envisioning TESOL through a
translanguaging Lens – Global perspectives. New York: Springer.
Sinclair, J. M., & Coulthard, M. (1975). Towards an analysis of discourse: The English used by teachers
and pupils. Oxford: Oxford University Press.
Skehan, P., & Foster, P. (2005). Strategic and on-line planning: The influence of surprise information
and task time on second language performance. In R. Ellis (Ed.), Planning and task performance in a
second language (pp. 193–216). Amsterdam: John Benjamins Publishing Company.
Skehan, P., & Foster, P. (2016). Task type and task processing conditions as influences on foreign
language performance. Language Teaching Research, 1(3), 185–211. doi: 10.1177/1362168897001
00302
Tran, T. P. T. (2020). Intercultural language teaching in Vietnamese tertiary EFL classes: A participatory
action research study. Wellington: Victoria University of Wellington].
Tsui, A. B. M. (2012). The dialectics of theory and practice in teacher knowledge development. In J. M.
L. B. R. S. Hϋtter & S. Schiftner (Eds.), Theory and practice in EFL teacher education: Bridging the
gap (pp. 16–37). Bristol: Multilingual matters.
Wingate, U. (2018). Lots of games and little challenge – a snapshot of modern foreign language
teaching in English secondary schools. The Language Learning Journal, 46(4), 442–455. doi: 10.1080/
09571736.2016.1161061
327
23
ORAL LANGUAGE DEVELOPMENT
IN IMMERSION AND DUAL
LANGUAGE CLASSROOMS
School-based additive bilingual programmes that teach an additional language through
subject-matter instruction permeate a wide range of international contexts and instructional
settings. Such programmes come in many shapes and sizes with names including immersion
and dual language (ImDL) education, content and language integrated learning (CLIL),
content-based instruction (CBI), and content-based language teaching (CBLT). They share
an instructional approach in which non-linguistic curricular content such as geography,
history, or science is taught to students through the medium of a language they are learning
as an additional language. One of the most attractive features of these programmes is the
increased exposure to and engagement with the target language via subject-matter instruc-
tion, which provides a motivational basis for purposeful communication and a cognitive
basis for language learning. Often overlooked, however, is the research evidence demon-
strating that, for these programmes to be effective, they need to integrate a systematic focus
on the target language that encourages shifts in learners’ attentional focus between language
and content.
Given the diverse range of such programmes and the wide variety of research associated
with each, this chapter focuses specifically on immersion and dual language (ImDL) pro-
grammes in Canada and the United States, first identifying and explaining some limitations
in terms of students’ oral production abilities and then proposing instructional solutions for
enhancing students’ oral language development through opportunities for focused and
contextualized practice.
What Is ImDL Education?

The two main variants of ImDL programmes are one-way immersion and two-way im-
mersion, both of which normally begin when children enter Kindergarten at around age five
(i.e., “early” immersion).
One-way immersion is for students whose primary language is the dominant or official
language and who are learning a second, foreign, heritage, regional, or Indigenous lan-
guage as an additional language (e.g., French immersion for English-speaking students in
Canada, Portuguese immersion for English-speaking students in the United States, or
328 DOI: 10.4324/9781003022497-28

Oral Language Development
Swedish immersion for Finnish-speaking students in Finland). One-way immersion pro-

grammes normally offer at least half the curriculum in the immersion (or target) language
and the other half in the societal majority language, the first language (L1) or primary
language of most students. Some programmes begin with an equal percentage distribution
(50:50) of the two languages across the curriculum, whereas others begin with 100% im-
mersion in the target language (referred to as “total” immersion), then introduce language
arts instruction in the majority language by Grades 2 or 3 (when children are 7–8 years
old), and reach a 50:50 distribution around Grades 5 or 6 (10–11 years of age). One-way
immersion programmes have been adopted around the world not only to support foreign
languages such as Mandarin and German in the United States, second co-official lan-
guages such as French in Canada or Swedish in Finland, but also to support regional
languages such as Breton and Occitan in France, or Indigenous languages such as Māori
in New Zealand, Hawaiian in the state of Hawaii, and Cherokee in the state of Oklahoma.
English is the immersion language in various countries such as Japan and Brazil because
of its role as a global language.
Two-way immersion, offered primarily in the United States, and in other contexts such as
Estonia (Estonian/Russian programmes), provides curriculum and instruction in the majority
language and a minority language, ideally to a similar number of students who are native
speakers of each language or simultaneous bilinguals. Most frequently, the minority language in
two-way immersion programmes in the United States is Spanish, but programmes are also
available in Chinese, Vietnamese, Russian, Japanese, French, German, and Italian, among
others. Two-way programmes typically begin with either equal time in each language (50:50) or
90% in the minority language and 10% in English (90:10). By Grade 5 (10 years of age), most
two-way immersion programmes are 50:50.
2 Historical Perspectives: Research on Oral Language Development

Decades of research have shown that, in the long term, both majority- and minority-
language ImDL students in all types of programmes perform on a par with or above students
schooled only through English on standardized achievement measures administered in
English. Similarly, research findings have also shown that, in the long term, both majority-
and minority-language ImDL students’ English language development ranges from
equivalent to superior to that of non-immersion students irrespective of programme type.
Studies comparing ImDL students’ proficiency in the minority language to that of native
speakers of the minority language have also revealed positive outcomes, as well as some that
fall short of expectations. For example, in comparison to native speakers of French of the
same age, research has characterized French immersion students’ proficiency in French in
two ways (Harley et al., 1990):
1. high levels of comprehension abilities as measured by tests of listening and reading

comprehension;
2. high levels of communicative ability, but with lower-than-expected production skills in
terms of grammatical accuracy, lexical variety, and sociolinguistic appropriateness.
More specifically, still in comparison to native speakers of French of the same age, Harley
et al. found that French immersion students performed similarly on measures of discourse
competence, operationalized as the ability to process language coherently and cohesively. An
example of coherence in discourse is the accurate use of pronouns to refer to characters,
objects, and locations when telling a story. Examples of discourse cohesiveness include the
329
accurate use of conjunctions and adverbs to make logical connections – such as temporal
sequencing and cause-effect relationships – between clauses and sentences.
In contrast to their native-like levels of discourse competence, French immersion students
were much less proficient on most grammar aspects, which included verb and preposition
usage, and also fell short of native speakers on measures of sociolinguistic competence, which
is the ability to vary one’s language according to social context. Harley (1994) summed up
French immersion students’ oral production as containing phonologically salient items that
are easy to acquire from the stream of speech, along with high-frequency lexical items and
syntactic patterns similar to those of English. Missing from their oral production are less
salient morphosyntactic features that differ from English or are not crucial for getting one’s
meaning across. For example, French immersion students underuse conditional verb forms
to express hypothetical meaning or uncertainty, instead using lexical items such peut-être
(maybe). These are useful communication strategies but do not lead students towards higher
levels of academic language production.
Like their Canadian counterparts, majority-language students in the United States one-
way immersion programmes produce spoken and written language lacking grammatical
accuracy. Moreover, their language is typically not sociolinguistically appropriate (see
Tedick & Wesely, 2015, for a review), and their vocabulary can be underdeveloped (e.g.,
Fortune & Tedick, 2015). Most US studies on the language development of one-way im-
mersion students have been done in Spanish or French programmes, and increasingly
Mandarin Chinese. Nevertheless, teachers representing programmes in a wide range of
languages, including Indigenous languages, share anecdotally that immersion students’
minority-language development is far from optimal.
Research on two-way immersion programmes has shown that English L1 students con-
tinue to perform better in English than in Spanish, while Spanish L1 students tend to develop
more balanced oral and written proficiencies in both languages (e.g., Lindholm-Leary, 2001).
However, some Spanish L1 students become dominant in English (Fortune, 2001) and de-
velop certain grammatical inaccuracies in Spanish, their home language (e.g., Potowski,
2007; Tedick & Young, 2016). Most of the research has been done in Spanish/English
programmes, but anecdotal evidence indicates that L1 speakers of other minority languages
in two-way immersion, such as Hmong, also develop some grammatical inaccuracies in their
home language.
Findings related to the shortcomings in ImDL students’ production abilities led to de-
scriptive research to better understand the shortcomings. Early classroom observations re-
vealed that subject-matter instruction did not necessarily invite much oral production by
students. For example, Swain (1988) reported findings from an observational study in which
only 14% of the turns produced by Grade 6 French immersion students contained more than
one clause in length. She argued that exposure to input, but with minimal opportunities for
production, engages comprehension strategies enabling students to comprehend content by
drawing on pragmatic and situational cues, real-world knowledge, and inference, without
processing structural elements in the language. Bypassing language structure in this way,
however, is harder to do when producing the language. Swain thus argued in favour of more
opportunities for student output and the provision of feedback that would push students to
express themselves more precisely and appropriately. This came to be known as “pushed
output.” This line of research also revealed that, contrary to expectations, subject-matter
instruction exposes students to a limited range of language forms and functions (e.g., Swain,
1988); in addition, a tendency for ImDL teachers to keep language instruction and content
instruction separate was also observed (Allen et al., 1990). Overall, this research suggested
that, for students to improve their production abilities, they need to engage with the target
330
language in its full functional range and teachers must engage with a range of instructional
practices considered effective for integrating content and language.
3 Critical Issues: Oracy Underpins Academic Literacy

Research on ImDL students’ oral production abilities in the minority language has con-
tributed substantially to our understanding of target-language development in ImDL set-
tings, helping to set realistic expectations and to envision instructional approaches designed
to ensure continued growth in the minority language; we return to such approaches later.
The research comparing majority-language students’ oral proficiency with that of native
speakers may seem questionable, given that the goal of ImDL programmes is not to expect
native-like proficiency in all aspects of both target languages. Instead, the call to improve
ImDL students’ oral abilities in the minority language (be they L2 learners of that language,
L1 speakers, or bilinguals) is so that they can more effectively engage with the type of
complex language key to school success. Although it is well-established that immersion
students achieve academically on a par with or above students schooled only in English, it
should be noted that assessment of academic achievement typically occurs in English and
involves multiple-choice measures of reading comprehension and language arts, mathe-
matics, and (occasionally) science. Such assessments fail to assess students’ ability to produce
extended discourse in the minority language that is characteristic of the academic literacy of
which oracy is an integral part.
Oracy, as explained by Escamilla et al. (2014), “is an aspect of oral language, but it
includes a more specific subset of skills and strategies within oral language that more closely
relates to literacy objectives in academic settings” (p. 21). Oracy entails the development of
linguistic accuracy, syntactic complexity, and academic vocabulary in students’ speech, in-
cluding the language functions they need to understand and communicate in a specific
content area. The development of oracy also involves a range of instructional strategies
employed by teachers to support students through scaffolded oral interaction including ef-
fective questions and strategic feedback, which together help students to understand and
engage with language and content at levels higher than what they could reach on their own.
Oracy is not only an indispensable source of ongoing support for literacy development. It
is also an essential starting point for literacy development. ImDL teachers thus have the
added challenge of ensuring a consistent and systematic emphasis on developing and
maintaining advanced oral abilities to optimize the pivotal role of oracy in literacy
development.
4 Research on Oral Language Production in ImDL Classrooms

We summarize recent research on ImDL students’ oral language development, followed by
research on target language use in ImDL classrooms, including choice of language by both
students and teachers. We then examine a cycle of research that first identified some lim-
itations of teaching language through content and subsequently propose pedagogical solu-
tions that were put to the test.
Recent Assessments of Oral Language Development

Several US studies have utilized standardized tools to measure ImDL students’ oral profi-
ciency with scales based on the ACTFL Proficiency Guidelines. In the form of holistic
rubrics, these guidelines describe in a very general way what individuals can do with language
331
in all four modalities – listening, speaking, reading, and writing – in unrehearsed “real-
world” situations. Levels range from Novice to Distinguished, with High, Mid, and Low
sublevels for the Novice, Intermediate, and Advanced ranges (ACTFL, 2012). This scale is
similar to the Common European Framework of Reference for Languages (CEFR), devel-
oped by the Council of Europe (2001). It includes levels ranging from A1 (Novice) to C2
(Distinguished). These are broad-brush scales in that they describe proficiency generally,
with little attention to details.
Drawing on the ACTFL Guidelines, the Center for Applied Linguistics (CAL) developed
a rating scale for use with their oral proficiency assessment tools for young learners (Grades
K–8): the Student Oral Proficiency Assessment (SOPA), used with students learning addi-
tional languages, and the CAL Oral Proficiency Exam (COPE), designed for ImDL students
in Grades 5–8 (Thompson et al., 2006). These tools utilize the same rating scale, which has
nine levels (from Junior Novice-Low to Junior Advanced-High) across the domains of oral
fluency, grammar, vocabulary, and listening comprehension. Another tool is the Standards-
Based Measurement of Proficiency (STAMP) (Avant Assessment, 2015). The web-based,
computer adaptive STAMP assesses proficiency in all four skill areas and uses a scale aligned
with the ACTFL levels.
Two large-scale studies were conducted using STAMP, each assessing over 1,000 ImDL
students (most enroled in 50:50 programmes). Burkhauser et al. (2016) reported that Grade 8
ImDL students performed in the Intermediate-Mid range for speaking (Chinese and Spanish)
and between Intermediate-Low and -Mid in Japanese (between A2 and B1 according to the
CEFR [ACTFL, n.d.]). The Center for Applied Second Language Studies (CASLS, 2013)
found that 41% performed at the Intermediate-Low level in speaking by Grade 6 and 97% by
Grade 12, with 3% rated in the Intermediate-High range. The study included students in
Chinese, French, Japanese, and Spanish programmes, but results were not disaggregated by
language. Also relying on the STAMP, Fortune and Song (2016) reported that the speaking
proficiency of 70% of early, total one-way Mandarin immersion Grade 5 students (n = 80)
was in the Intermediate-Low range, pointing to the superiority of the early, total programme
over the 50:50 model.
Fortune and Tedick (2015) used CAL’s SOPA and COPE assessments in a cross-sectional
study of the oral language of early, total Spanish immersion students (n = 218) across Grades
K, 2, 5, and 8. Findings showed statistically significant differences between students in
Grades K, 2, and 5 across all domains, with Grade 5 students’ median proficiency score rated
at the Junior Advanced-Low level in oral fluency, grammar, and vocabulary. However, a
plateau effect emerged. Grade 8 students’ performance was not significantly better than ei-
ther Grade 5 or Grade 2 students’ in the speaking domains. Fortune and Tedick speculated
that this plateau may have occurred because Grade 8 students received only 25% of in-
struction in Spanish, whereas Grade 5 students received about 70%. They also questioned
whether the broad-brush rating scale was nuanced enough to detect differences in higher
levels of proficiency.
Following a similar cross-sectional design and using the same tools developed by CAL,
Fortune and Ju (2017) assessed the oral proficiency of early, total Mandarin immersion
students in Grades K, 2, and 5. They found significant differences in all domains between
Grades K and 2 but none between Grades 2 and 5. Both Grade 2 and 5 students were rated at
the Junior Intermediate-Mid level in the speaking domains. Also concerned about the in-
ability of the rating scale to detect differences, Fortune and Ju conducted a fine-grained,
follow-up linguistic complexity analysis of three representative speech samples (one from
each grade). They found steadily increasing grammatical complexity from one grade to the
next (K, 2, and 5). Lexical complexity, however, increased from Grades K and 2, but not
332
between Grades 2 and 5. This finding suggests the need for more robust vocabulary devel-
opment in ImDL classrooms, also recommended by Fortune and Tedick (2015) based on
their study results.
Xu et al. (2015) used the STAMP to examine proficiency in two-way Mandarin immersion
students. About 71% of Grade 5 students met or exceeded the Intermediate-Low level in
speaking. Interestingly, the heritage learners (i.e., L1 speakers of the minority language) and
English-speaking Mandarin L2 learners scored about the same. This perhaps was because the
heritage learners were defined based on the presence of Mandarin in the home, rather than
on the requirement that Mandarin be their L1. Potowski (2007), in contrast, defined heritage
and non-heritage learners based on students’ L1s and found stark differences in proficiency
between the two groups at Grade 8, with heritage (Spanish L1) speakers being assessed
significantly higher in oral language proficiency than English L1 speakers.
In summary, English L1 students in ImDL programmes tend to perform orally in the
Intermediate-Low to Advanced-Low ranges, and students in early, total (and 90:10) pro-
grammes achieve higher ratings in oral proficiency than students in 50:50 programmes.
Students in Spanish early, total programmes have higher levels of oral proficiency than
students in Mandarin early, total programmes. Moreover, heritage learners in two-way
programmes tend to outperform English L1 speakers and to develop more balanced bi-
lingualism, especially in Spanish/English programmes. Broad-brush measures may not be
nuanced enough to detect growth within the higher Intermediate to lower Advanced ranges
where students’ oral language seemingly plateaus. In addition, oral language assessments
should be developed specifically for immersion contexts to gain access to students’ ability to
use complex language indicative of academic literacy.
Oral Language Use in ImDL Classrooms

Questions regarding which languages ImDL students use in the classroom, when, with
whom, and for what purposes, have been the focus of several studies that were motivated by
Tarone and Swain’s (1995) call for research on ImDL student language use practices. Their
call was prompted by observations in US one-way classrooms revealing that students used
English rather than the minority language to engage with peers.
Three case studies involved direct observation and analysis of focal student oral language
production in Grade 5 one-way and two-way Spanish ImDL classrooms (Broner, 2001;
Fortune, 2001; Potowski, 2007). Using sociocultural and sociolinguistic perspectives, they
focused on factors that affected students’ language use patterns. The three researchers re-
ported that students used a great deal of English during instructional time in Spanish. In
general, students used Spanish with the teacher but were more likely to use English with each
other. In addition, they used English more for social functions and “off-task” interactions
and Spanish for most academic functions and “on-task” interactions. Exploring language use
practices in Grades 1, 3, and 8 in a 50:50 two-way programme, Ballinger and Lyster (2011)
found that, irrespective of home language background, all students preferred English. They
noted that teacher expectations impacted students’ choice of language. Two studies reported
that Spanish L1 speakers tended to reserve their Spanish for other Spanish L1 students and
used English with English L1 peers (Ballinger & Lyster, 2011; Fortune, 2001). In a more
recent qualitative study involving classroom observations focused on students’ language use
practices in two 90:10 two-way programmes in California, Hernández (2015) found that the
low status of Spanish relative to English resulted in all students preferring English, English
L1 students dominating small group interaction, and Spanish L1 students conforming to
English when English L1 students switched to English during oral exchanges.
333
Other research on classroom interaction has investigated students’ varied use of both
instructional languages in ImDL classrooms. This work has emerged against the backdrop of
translanguaging theory, which argues that an individual’s entire linguistic repertoire func-
tions as one integrated system (e.g., García, 2009; Otheguy et al., 2015). In practice, trans-
languaging entails the use of two or more languages to make meaning, form experiences, and
cultivate knowledge and understandings (García, 2009). As translanguaging advocates in-
creasingly promote the use of translanguaging pedagogies, ImDL studies have followed,
particularly in two-way contexts. Several have reported that students move in and out of
different roles as they engage with each other and use both languages, often concurrently,
which, researchers have argued, affords opportunities for metalinguistic analysis, the med-
iation of understanding, and other learning (e.g., García, 2011; Hamman, 2018). At the same
time, some of this research has revealed linguistic imbalances. For example, Hamman (2018)
observed far more instances of students using English during instructional time in Spanish
than vice versa. She noted:
the practice of engaging in translanguaging was not equally distributed across

students or languages. This unequal distribution impacted how students were po-
sitioned in the classroom, particularly in relation to their academic and linguistic
expertise, and generated a more English-centered classroom. (p. 37)
Despite the ubiquity of and overwhelming support for translanguaging in the literature, a
growing number of scholars have questioned whether translanguaging pedagogies and
practices in ImDL classrooms are warranted, in particular when it comes to development of
the minority language in contexts where English is the societal majority language (e.g.,
Ballinger et al., 2017; Fortune & Tedick, 2019; Lyster, 2019a; Tedick & Lyster, 2020).
Indeed, minority language development in such ImDL settings continues to be of great
concern, as evidenced in the next part.
Quasi-Experimental Research
In response to the observed shortcomings in ImDL students’ oral language development
along with the observations of the limitations in ImDL classroom discourse, Harley and
Swain (1984) proposed, nearly 40 years ago, a twofold instructional sequence to improve
students’ proficiency in the target language:
1. …more focused L2 input which provides the learners with ample opportunity to observe
the formal and semantic contrasts involved in the relevant target subsystem (this does
not necessarily involve explicit grammar teaching); and
2. …increased opportunity for students to be involved in activities requiring the productive
use of such forms in meaningful situations. (p. 310)
This proposal laid the groundwork for a series of quasi-experimental studies conducted in
French immersion classrooms to enhance students’ awareness of target features while pro-
viding opportunities for their productive use in meaningful contexts with a content or the-
matic focus (Day & Shapson, 1991; Harley, 1989; 1998; Lyster, 1994, 2004a; Wright, 1996).
Implemented in Grade 2–8 classrooms, the instructional treatments were designed in ac-
cordance with the instructed second language acquisition (SLA) construct of form-focused
instruction (FFI; Spada, 1997), which is designed to draw learners’ attention to target fea-
tures “as they are experiencing a communicative need” (Loewen, 2011, p. 582) and thus
334
differs considerably from decontextualized language instruction. Taken together, the results
of these studies showed that, in more than 75% of the 40 tests given either as immediate or
delayed posttests to assess both knowledge and productive use of the target features, students
participating in the FFI improved more than students left to their own devices to “pick up”
the target forms from the regular curriculum (Lyster, 2016).
Zooming in specifically on the results of the oral production measures, however, we find
that the oral outcomes varied across these studies. Specifically, the instructional treatment
targeting the functional distinctions between perfect and imperfect past tenses (i.e., passé
composé vs. imparfait) in Harley’s (1989) study yielded significant short-term improvement
on a cloze test and written production task, but no significant improvement in oral pro-
duction either in the short or long term. Instruction on the conditional mood in Day and
Shapson’s (1991) study yielded short- and long-term significant improvement in written
production, but none in oral production. In contrast, Wright’s (1996) study targeting verbs
of motion and the studies targeting second-person pronouns and grammatical gender by
Lyster (1994, 2004a) all showed long-term significant improvement in oral production. At the
same time, Harley’s (1998) study targeting grammatical gender showed significant im-
provement in oral production on a picture description task but none on an oral task eliciting
only unfamiliar nouns.
The different outcomes in oral production across these studies can be attributed to the
differential emphases in their instructional treatments (Lyster, 2004b). To do so, we refer
to skill acquisition theory and its distinction between declarative and procedural knowl-
edge (e.g., DeKeyser, 2007). Proponents of skill acquisition theory propose that L2 de-
velopment entails a gradual transition from effortful use that relies on declarative
knowledge (knowing about) to more automatic use of the target language that relies on
procedural knowledge (knowing how), brought about through practice and feedback in
meaningful contexts. In this view, effective instruction needs to target both types of
knowledge: (a) through noticing and awareness activities designed to increase the saliency
and frequency of the forms and functions of target features to facilitate their intake in
declarative form; and (b) through opportunities for production practice that allow stu-
dents to proceduralize more target-like representations in contextualized and mean-
ingful ways.
The two studies with no long-term effects on oral production arguably overemphasized
production activities at the expense of activities promoting noticing and metalinguistic
awareness. For example, the main thematic activities in Harley (1989) and Day and Shapson
(1991) – the creation of childhood albums and the design of futuristic space colonies, re-
spectively – succeeded in engaging students in meaningful interaction and motivating con-
tent, but may not have drawn their attention to linguistic accuracy any more than is typically
the case and, as noted in both studies, fell short of pushing students to actually use the target
forms in oral production.
In contrast, of considerable importance in the treatments targeting verbs of motion,
second-person pronouns, and gender were the awareness tasks that first helped students to
consciously notice the target features through typographical enhancement and increased
frequency, and then helped them to develop analyzable representations of the target features
through a range of consciousness-raising tasks. In addition, the production activities in these
four studies were limited to role-plays, games, riddles, rhymes, and songs, giving more em-
phasis to guided practice than to autonomous practice and thereby clearing the way for
teachers to provide corrective feedback more strategically.
These are important observations because they suggest that oral abilities, at least with
respect to difficult target features and recalcitrant interlanguage forms, do not develop
335
only from speaking but are further enhanced by opportunities to develop metalinguistic
awareness. This underscores the importance of preceding practice activities with rich
input-driven tasks that provide students with useful models while drawing their attention
to the target language patterns they will need for successful completion of the production
tasks.

A wide range of possibilities emerge when considering ways to improve both the quantity
and quality of ImDL students’ oral production. Consequently, we will narrow our focus to
two overarching instructional options that have driven much of our work in ImDL teacher
education (Tedick & Lyster, 2020): namely, (a) teacher scaffolding and (b) an instructional
intervention known as CAPA (an acronym derived from its constituent phases: con-
textualization, awareness, practice, and autonomy).
Scaffolding
Scaffolding was initially invoked as a means to characterize parent–child interaction and was
qualified as that which “enables a child or novice to solve a problem, carry out a task or
achieve a goal which would be beyond his unassisted efforts” (Wood et al., 1976, p. 90). The
notion of scaffolding has since been aptly applied to teacher–student interaction and is
considered to encapsulate effective teaching. Scaffolding provides ImDL teachers with the
means to structure classroom discourse in ways that make oral interaction a key source of
learning. Scaffolding is what makes ImDL work, because it enables students to engage with
content in a language they know only partially as they draw on the contextual clues provided
in the scaffolding while also drawing on prior knowledge. Scaffolding students’ oral pro-
duction is necessary for assisting them in producing a language they are still learning or
helping them to produce language that is more academically sophisticated, complex, and
precise. At the same time, as students engage with subject-matter content, teacher support in
the form of scaffolding for comprehension is equally important. In this sense, scaffolding for
comprehension and scaffolding for production can be seen as interrelated in actual classroom
practice.
Drawing on earlier work by Echevarría et al. (2008), Tedick and Lyster (2020) describe
three types of scaffolding. Verbal scaffolding for comprehension involves linguistic re-
dundancy whereby teachers express the same message in a variety of ways. Verbal scaffolding
for production aims to facilitate student language use during classroom interaction through
questioning techniques and follow-up moves. Procedural scaffolding encompasses activity
frames and routines that teachers use to facilitate comprehension or to create multiple op-
portunities for students to use the language independently. Instructional scaffolding refers to
various tools or print and multimedia resources embedded in instructional activities to
promote comprehension and support production. Table 23.1 provides examples of these
three types.
As a metaphor borrowed from the construction industry, scaffolding has often been
considered a temporary support (e.g., Cazden, 1983). However, in ImDL classrooms, teacher
scaffolding is a key instructional strategy throughout the entire programme from beginning
to end. Although the nature of the scaffolding changes as students progress and become
more autonomous, the need to provide support for student learning is not any less apparent
in higher grades where academic language and content become increasingly more complex
(Tedick & Lyster, 2020).
336
Table 23.1 Examples of scaffolding for comprehension and production
Scaffolding for Comprehension Scaffolding for Production
Verbal • linguistic redundancy – self-repetition, • balanced use of display and

paraphrasing, synonyms, and multiple referential questions
examples • strategically planned follow-up
• teacher talk – modification of speech questions to push students to
rate, articulation, and intonation to clarify and extend their ideas
make their speech comprehensible • corrective feedback
• body language – gestures, facial • sufficient wait time
expressions, pantomime, etc. to
support comprehension
Procedural • predictable routines (e.g., think-pair- • dyads and cooperative learning
share) groups
• building background knowledge • think-pair-share
• modelling • learning centres
• demonstrations • peer editing, tutoring, feedback
• pairing/grouping students to facilitate • role-plays, simulations, debates,
comprehension presentations
Instructional • graphic organizers • teaching formulaic “chunks” of
• manipulatives language
• visuals and imagery • explicit training in interpersonal
• age-appropriate books and other texts strategies needed for collaboration
• labels and word walls • sentences starters/frames
• props, graphs, maps • well-scaffolded written and oral
• multimedia resources tasks that invite extended student
discourse
• graphic organizers to elicit specific
academic discourse patterns (e.g.,
compare/contrast)
Contextualization, Awareness, Practice, and Autonomy (CAPA)

The research on ImDL students’ production abilities along with the aforementioned inter-
vention studies eventually led us to propose a four-phase instructional sequence known as
CAPA – an acronym derived from its four constituent phases, including contextualization,
awareness, practice, and autonomy. These phases draw on constructs from SLA research,
most notably skill acquisition theory, according to which, as previously mentioned, the
development of oral skills results from multiple opportunities in meaningful contexts to
access declarative representations so they become easier to access and thus proceduralized
(e.g., DeKeyser, 1998; Lyster & Sato, 2013).
What came to be known as the CAPA model was tried and tested during a 3-year Canadian
professional development project reported in Lyster (2019b) and Arshad and Lyster (2021) and
also in a two-way setting in the United States (Tedick & Young, 2016). The CAPA model serves
as a blueprint for designing instructional sequences interweaving both language and content
objectives in the spirit of a counterbalanced approach to ImDL pedagogy (Lyster, 2007, 2016;
Tedick & Lyster, 2020). The four component phases are interrelated by their dual focus on the
same subject-matter content and target language features, although the contextualization and
autonomy phases place primary emphasis on content while the awareness and practice phases
zoom in on language, as illustrated in Figure 23.1.
337
Focus
on Contextualization phase
content
Awareness
phase
Predominant
focus on
language
Practice
phase
Focus
on Autonomy phase
content
Figure 23.1 Variable emphases on content and language in the CAPA model ( Tedick & Lyster,
2020, p. 112)
The contextualization phase establishes a meaningful context related to content, usually

by means of a text in which target features have been contrived to appear more salient (i.e.,
typographical enhancement such as bolding and underlining) or more frequent (i.e., input
flood). The awareness phase then encourages the students to reflect on and manipulate the
target forms in a way that helps them to develop or restructure their explicit knowledge
representations, usually by means of rule-discovery tasks, metalinguistic exercises, and op-
portunities for pattern detection. The practice phase further engages students’ metalinguistic
awareness by pushing them to use the target features in meaningful yet controlled contexts to
develop automaticity and accuracy. The sequence then comes full circle at the autonomy
phase by returning to the content area that served as the starting point. Similar to the
practice phase, the autonomy phase requires the use of the target language features but in a
disciplinary or thematic context with fewer constraints to encourage more autonomous use
of the target language.
As an illustration, we provide a brief description of a sequence designed and im-
plemented by three Grade 4 teachers who integrated a focus on the passé composé in
French, along with its variable uses of avoir and être as auxiliary verbs, within their
history unit on Jacques Cartier and his three voyages to what was known as New France.
(The passé composé is a past tense in French generally used to refer to actions completed in
the past, similar to the preterite in Spanish or the simple past in English.) It has two parts:
an auxiliary verb and a past participle. The auxiliary derives either from the verb avoir (to
have) or the verb être (to be) and functions in tandem with the past participle to complete a
compound verb phrase such as Je suis tombé (I fell) or J’ai mangé (I ate). For the most
part, auxiliary forms derive from the verb avoir, but a small set of high-frequency verbs
use être. Understandably, L2 learners of French overgeneralize the use of avoir as an
auxiliary, resulting in frequent errors such as j’ai allé instead of je suis allé for I went.)
Several other teacher-designed instructional sequences based on the CAPA model appear
in Lyster (2018) and Tedick and Lyster (2020).
The sequence began with the contextualization phase during which students watched a
narrated, time-lapsed, animated biographical video of Cartier that the teachers created to
map out key exploits in the explorer’s life. The narrative was replete with instances of the
passé composé using both auxiliary verbs. After watching the video, the content focus stayed
338
in the foreground as students discussed the main points surrounding Cartier’s voyages. Next,
during the awareness phase, the text of the video’s narration was projected, with instances of
the passé composé in bold. Students were led to identify the tense of the highlighted verbs, to
notice the two different auxiliaries, and to make a list for future reference classifying verbs
according to auxiliary.
Then, during the practice phase, each student received one of five images illustrating an
important event or place related to Cartier and wrote a description using verbs in the passé
composé. They then mingled with other students to find those with the same image, and
together in small groups they synthesized their descriptions to create an historical account,
which they then conveyed orally to the whole class, thus giving the teacher an opportunity to
provide corrective feedback as necessary. Finally, in the autonomy phase, students produced
an illustrated timeline in small groups depicting some of the landmark events in Cartier’s
career, including a legend for each event using the passé composé. As each group presented its
timeline to the class, the teacher had the opportunity to provide feedback on both language
and content.
The CAPA model thus draws on previous ImDL classroom intervention studies by in-
corporating noticing and awareness activities that increase the frequency and saliency of
target features, as well as production practice activities that promote the proceduralization of
the target features. In addition, the CAPA model intertwines language and content objectives
by making autonomous use of the target language the ultimate goal while ensuring that
focused consciousness-raising tasks serve to consolidate students’ metalinguistic awareness in
ways that pave the way for their autonomous use of the target language.
6 Future Directions
It has long been established that ImDL programmes hold much promise for student language
acquisition. Nonetheless, the research reviewed in this chapter reveals that they are falling short
of their potential with respect to oral language development. It is difficult to push students’
language proficiency beyond intermediate levels, and significant shortcomings in grammatical
accuracy, lexical specificity and variety, and sociolinguistic appropriateness persist. Despite early
observations and recommendations regarding the need for ImDL teachers to increase oppor-
tunities for students to produce output (Swain, 1988; Swain & Lapkin, 1986), many teachers
today find themselves still relying on the provision of comprehensible input and providing too
few opportunities for students to engage with the language. When they do provide such op-
portunities, they tend to focus on content learning without systematically offering corrective
feedback to improve students’ language. Thus, recommendations that were proposed nearly four
decades ago remain relevant today. Teachers need to orchestrate classroom interaction and
learning activities to maximize student output. They must have high expectations, scaffold
learning so students can meet those expectations, and provide age-appropriate corrective feed-
back both strategically and selectively to push students’ interlanguage to the next level. More
research on classroom interaction and on interventions to improve students’ oral language de-
velopment is clearly needed.
Given that ImDL teachers are typically prepared as content teachers and rarely have
immersion-specific preparation, they often lack knowledge of the target language themselves
in addition to knowledge of pedagogical approaches and strategies to shift learners’ attention
between content (meaning) and language (form). They need resources along the lines of
Lyster’s (2016) book for French immersion teachers as well as systematic and sustained
professional development opportunities to learn about language and pedagogical approaches
to embed a focus on language in the context of their content instruction, such as the CAPA
339
model. The field also needs more research on the types of professional development ex-
periences that lead teachers to transform their practices and ultimately improve student
language development.
ImDL programmes need also to make students’ continued development in the minority
language a primary goal rather than limiting assessments of programme effectiveness to students’
academic achievement and English language outcomes. The development of high-quality
immersion-specific oral language assessments and more refined rating scales to capture differ-
ences in oral language development, particularly at High-Intermediate to Pre-Advanced stages,
would greatly benefit the field (Fortune & Tedick, 2015; Fortune & Ju, 2017).
Although translanguaging theory, practice, and pedagogy have caught the attention of
many researchers and teachers who work in ImDL and other bilingual settings, some
translanguaging practices may not be appropriate in certain contexts, especially those in
which English is the majority language (e.g., Lyster, 2019a; Tedick & Lyster, 2020). For
example, in its aim to “use the entire linguistic repertoire of bilingual students” (García,
2013, p. 2), translanguaging practice may be detrimental to the development of the
minority language if English is used to process and engage with increasingly complex
subject matter. Some studies in two-way classrooms have indeed shown that when stu-
dents are free to engage in translanguaging practices in the classroom, they default to
English (e.g., Hamman, 2018). Sustained use of the minority language is arguably more
beneficial for pushing its development forward than recourse to English – given appro-
priate instruction and sufficient scaffolding to sustain use of the minority language.
Translanguaging practices, however, that incorporate cross-linguistic pedagogy as a
means “to teach for two-way cross-lingual transfer” (Cummins, 2007, p. 11) have much
potential to foster biliteracy development by improving students’ morphological aware-
ness (Lyster et al., 2013) and increasing their motivation to read in both languages (Lyster
et al., 2009). More evidence-based research on translanguaging pedagogies and their ef-
fects on minority language development in a range of contexts is needed before endorsing
across-the-board implementation.
Nearly 60 years have passed since the first immersion programmes were established (one-way
in St. Lambert, Montreal and two-way in Miami-Dade County, Florida), with more launched
each year in countries around the world. This form of bilingual education will continue to evolve
and improve as it incorporates relevant research findings about effective instructional practices
that integrate language and content and develops sustained professional development oppor-
tunities for teachers to respond to students’ oral language development needs.
Further Reading
Lyster, R. (2016). Vers une approche intégrée en immersion [Towards an integrated approach in im-
mersion]. Montreal: Les Éditions CEC.
Lyster, R. (2019). Making research on instructed SLA relevant for teachers through professional de-
velopment. Language Teaching Research, 23(4), 494–513.
Tedick, D. J., & Björklund, S. (Eds.) (2014). Language immersion education: A research agenda for
2015 and beyond [Special issue]. Journal of Immersion and Content-Based Language Education, 2(2).
Tedick, D. J., & Lyster, R. (2020). Scaffolding language development in immersion and dual language
classrooms. New York: Routledge.
References
American Council on the Teaching of Foreign Languages (ACTFL). (2012). ACTFL Proficiency
Guidelines 2012. Alexandria, VA: ACTFL. Retrieved from https://www.actfl.org/publications/
guidelines-and-manuals/actfl-proficiency-guidelines-2012.
340
ACTFL. (n.d.). Assigning CEFR ratings to ACTFL assessments. Retrieved from https://www.actfl.org/
publications/additional-resources/assigning-cefr-ratings-actfl-assessments
Allen, P., Swain, M., Harley, B., & Cummins J. (1990). Aspects of classroom treatment: Toward a more
comprehensive view of second language education. In B. Harley, P. Allen, J. Cummins, & M. Swain
(Eds.), The development of second language proficiency (pp. 57–81). Cambridge, UK: Cambridge
University Press.
Arshad & Lyster, R. (2021). Professional development in action: Teachers’ experiences in learning to
bridge language and content. In K. Talbot, S. Mercer, M.-T. Gruber, & R. Nishida (Eds.), The
psychological experience of integrating language and content (pp. 232–249). Bristol, UK: Multilingual
Matters.
Avant Assessment. (2015). Standards-Based Measurement of Proficiency (STAMP). Eugene, OR:
Author. Retrieved from https://avantassessment.com/.
Ballinger, S., & Lyster, R. (2011). Student and teacher language use in a two-way Spanish/English
immersion school. Language Teaching Research, 15, 289–306. doi: 10.1177/1362168811401151
Ballinger, S., Lyster, R., Sterzuk, A., & Genesee, F. (2017). Context-appropriate crosslinguistic
pedagogy: Considering the role of language status in immersion education. Journal of Immersion and
Content-Based Language Education, 5(1), 30–57.
Broner, M. (2001). Impact of interlocutor and task on first and second language use in a Spanish im-
mersion program (CARLA Working Paper #18). Minneapolis: University of Minnesota, Center for
Advanced Research on Language Acquisition. Retrieved from http://www.carla.umn.edu/resources/
working-papers/documents/ImpactOfInterlocutorTaskOn1st2ndLanguage.pdf.
Burkhauser, S., Steele, J. L., Li, J., Slater, R. O., Bacon, M., & Miller, T. (2016). Partner-language
learning trajectories in dual-language immersion: Evidence from an urban district. Foreign Language
Annals, 49(3), 415–433.
Cazden, C. B. (1983). Adult assistance to language development: Scaffolds, models, and direct in-
struction. In R. P. Parker & F. A. Davis (Eds.), Developing literacy: Young children’s use of language
(pp. 3–17). Newark, DE: International Reading Association.
Center for Applied Second Language Studies (CASLS). (2013). What levels of proficiency do
immersion students achieve? Eugene, OR: Center for Applied Second Language Studies. Retrieved
from https://casls.uoregon.edu/wp-content/uploads/pdfs/tenquestions/TBQImmersionStudentProficiency
Revised.pdf.
Council of Europe. (2001). Common European Framework of Reference for Languages: Learning,
teaching, assessment (CEFR). Strasbourg: Author. Retrieved from www.coe.int/lang-CEFR.
Cummins, J. (2007). Rethinking monolingual instructional strategies in multilingual classrooms. Canadian
Day, E., & Shapson, S. (1991). Integrating formal and functional approaches to language teaching in
French immersion: An experimental study. Language Learning, 41, 25–58.
DeKeyser, R. (1998). Beyond focus on form: Cognitive perspectives on learning and practicing second
language grammar. In C. Doughty & J. Williams (Eds.), Focus on form in classroom Second language
acquisition (pp. 42–63). Cambridge, UK: Cambridge University Press.
DeKeyser, R. (Ed.). (2007). Practice in a second language: Perspectives from applied linguistics and
cognitive psychology. Cambridge, UK: Cambridge University Press.
Echevarría, J., Vogt, M., & Short, D. J. (2008). Making content comprehensible for English learners: The
SIOP® Model (3rd edn). Boston, MA: Pearson Education.
Escamilla, K., Hopewell, S., Butvilofsky, S., Sparrow, W., Soltero-González, L., Ruiz-Figueroa, O., &
Escamilla, M. (2014). Biliteracy from the start: Literacy Squared in action. Philadelphia: Caslon
Publishing.
Fortune, T. W. (2001). Understanding immersion students’ oral language use as a mediator of social
interaction in the classroom. Unpublished doctoral dissertation, University of Minnesota,
Minneapolis, MN.
Fortune, T. W., & Ju, Z. (2017). Assessing and exploring the oral proficiency of young Mandarin
immersion learners. Annual Review of Applied Linguistics, 37, 264–287. doi: 10.1017/S026719051
7000150
Fortune, T. W., & Song, W. (2016). Academic achievement and language proficiency in early total
Mandarin immersion education. Journal of Immersion and Content-Based Language Education, 4(2),
168–197.
Fortune, T. W., & Tedick, D. J. (2015). Oral proficiency development of English Proficient K–8 Spanish
immersion students. Modern Language Journal, 99(4), 637–655. doi: 10.1111/modl.12275
341
Fortune, T. W., & Tedick, D. J. (2019). Context matters: Translanguaging and language immersion
education in the U.S. and Canada. In M. Haneda & H. Nassaji (Eds.), Perspectives on language as
action: Festschrift in honor of Merrill Swain (pp. 27–44). Bristol, UK: Multilingual Matters.
García, O. (2009). Bilingual education in the 21stcentury: A global perspective. Malden, MA: Wiley-
Blackwell.
García, O. (with Makar, C., Starcevic, M., & Terry, A.) (2011). Translanguaging of Latino kinder-
garteners. In K. Potowski & J. Rothman (Eds.), Bilingual youth: Spanish in English-speaking so-
cieties (pp. 33–55). Amsterdam: John Benjamins.
García, O. (2013). Theorizing translanguaging for educators. In C. Celic & K. Seltzer (Eds.),
Translanguaging: A CUNY-NYSIEB guide for educators (pp. 1–6). New York, NY: CUNY-
NYSIEB.
Hamman, L. (2018). Translanguaging and positioning in two-way dual language classrooms: A case for
criticality. Language and Education, 32(1), 21–42, doi: 10.1080/09500782.2017.1384006
Harley, B. (1989). Functional grammar in French immersion: A classroom experiment. Applied
Harley, B. (1994). Appealing to consciousness in the L2 classroom. AILA Review, 11, 57–68.
Harley, B. (1998). The role of form-focused tasks in promoting child L2 acquisition. In C. Doughty & J.
Williams (Eds.), Focus on form in classroom second language acquisition (p. 156–174). Cambridge,
UK: Cambridge University Press.
Harley, B., Cummins, J., Swain, M., & Allen, P. (1990). The nature of language proficiency. In B.
Harley, P. Allen, J. Cummins & M. Swain (Eds.), The development of second language proficiency
(pp. 7–25). Cambridge, UK: Cambridge University Press.
Harley, B., & Swain, M. (1984). The interlanguage of immersion students and its implications for
second language teaching. In A. Davies, C. Criper & A. Howatt (Eds.), Interlanguage (pp. 291–311).
Edinburgh: Edinburgh University Press.
Hernández, A. M. (2015). Language status in two-way bilingual immersion: The dynamics between
English and Spanish in peer interaction. Journal of Immersion and Content-Based Language
Education, 3(1), 102–126.
Lindholm-Leary, K. J. (2001). Dual language education. Clevedon, UK: Multilingual Matters.
Loewen, S. (2011). Focus on form. In E. Hinkel (Ed.), Handbook of research in second language teaching
and learning (Vol. 2, pp. 576–592). New York: Routledge.
Lyster, R. (1994). The effect of functional-analytic teaching on aspects of French immersion students’
sociolinguistic competence. Applied Linguistics, 15, 263–287.
Lyster, R. (2004a). Differential effects of prompts and recasts in form-focused instruction. Studies in
Lyster, R. (2004b). Research on form-focused instruction in immersion classrooms: Implications for
theory and practice. Journal of French Language Studies, 14, 321–341.
Lyster, R. (2007). Learning and teaching languages through content: A counterbalanced approach.
Lyster, R. (2016). Vers une approche intégrée en immersion [Towards an integrated approach in im-
mersion]. Montreal: Les Éditions CEC.
Lyster, R. (2018). Content-based language teaching [The Routledge E-Modules on Contemporary
Language Teaching edited byB. VanPatten & G. Keating.] New York: Routledge.
Lyster, R. (2019a). Translanguaging in immersion: Cognitive support or social prestige? The Canadian
Modern Language Review, 75(4), 340–352.
Lyster, R. (2019b). Making research on instructed SLA relevant for teachers through professional
development. Language Teaching Research, 23(4), 494–513.
Lyster, R., Collins, L., & Ballinger, S. (2009). Linking languages through a bilingual read-aloud project.
Language Awareness, 18(3–4), 366–383.
Lyster, R., Quiroga, J., & Ballinger, S. (2013). The effects of biliteracy instruction on morphological
awareness. Journal of Immersion and Content-Based Language Education, 1(2), 169–197.
Lyster, R., & Sato, M. (2013). Skill Acquisition Theory and the role of practice in L2 development. In
P. García Mayo, M. Gutierrez-Mangado, & M. Martínez Adrián (Eds.), Contemporary approaches
to second language acquisition (pp. 71–92). Amsterdam: John Benjamins.
Otheguy, R., García, O., & Reid, W. (2015). Clarifying translanguaging and deconstructing named
languages: A perspective from linguistics. Applied Linguistics Review, 6(3), 281–307.
Potowski, K. (2007). Language and identity in a dual immersion school. Clevedon, UK: Multilingual
Matters.
342
Spada, N. (1997). Form-focused instruction and second language acquisition: A review of classroom
and laboratory research. Language Teaching, 29, 73–87.
Swain, M. (1988). Manipulating and complementing content teaching to maximize second language
learning. TESL Canada Journal, 6, 68–83.
Swain, M., & Lapkin, S. (1986). Immersion French in secondary schools: ‘The goods’ and ‘the bads’.
Contact, 5(3), 2–9.
Tarone, E., & Swain, M. (1995). A sociolinguistic perspective on second language use in immersion
classrooms. Modern Language Journal, 79, 166–178. doi: 10.1111/j.1540-4781.1995.tb05428.x
Tedick, D. J., & Lyster, R. (2020). Scaffolding language development in immersion and dual language
classrooms. New York: Routledge.
Tedick, D. J., & Wesely, P. (2015). A review of research on content-based foreign/second language
education in US K-12 contexts. Language, Culture and Curriculum, 28(1), 25–40.
Tedick, D. J., & Young, A. I. (2016). Fifth grade two-way immersion students’ responses to form-
focused instruction. Applied Linguistics, 37(6), 784–807.
Thompson, L. E., Boyson, B. A., & Rhodes, N. C. (2006). Administrator’s manual for CAL foreign
language assessments, grades K–8. Washington, DC: Center for Applied Linguistics.
Wood, D., Bruner, J., & Ross, G. (1976). The role of tutoring in problem solving. Journal of Child
Psychology and Psychiatry, 17, 89–100.
Wright, R. (1996). A study of the acquisition of verbs of motion by grade 4/5 early French immersion
students. The Canadian Modern Language Review, 53, 257–280.
Xu, X., Padilla, A. M., & Silva, D. M. (2015). Learner performance in Mandarin immersion and high
school world language programs: A comparison. Foreign Language Annals, 48, 26–38. doi: 10.1111/
flan.12123
343
24
SPEAKING AND ENGLISH AS A
LINGUA FRANCA
Enric Llurda
Second language acquisition (SLA) research has made enormous progress since its inception
more than 50 years ago. Yet, despite Sridhar and Sridhar’s (1986) early urgings to re-
searchers to look beyond native speaker models and avoid excessive reliance on native-
speaker environments, mainstream SLA studies have often focused on the native speaker as
the baseline model on which second language performance and achievement are measured.
Some authors have questioned the use of monolingual native models in SLA research by
claiming that the outcome of second language (L2) learning can never be identical to
monolingual first language (L1) competence (Cook, 1999; Grosjean, 1989), but it was the
appearance of the World Englishes paradigm (Kachru, 1985) and later the surge of interest in
English as a Lingua Franca (ELF) that contributed most to the displacement of the target
from the idealized native-like speaker to a real-life bi/multilingual L2 user (Cook, 1999).
Such displacement is necessary to avoid falling into a deficit perspective of SLA in which all
learners are doomed to fail.
The teaching of L2 speaking entails a range of techniques that make it a complex task,
which is often avoided by teachers who feel more comfortable dealing with grammar and
vocabulary, and the rules of writing. Yet, speaking is clearly different from writing and,
therefore, classroom activities should address its specific requirements. According to Burns
(2016), speaking involves a series of “core skills,” namely, pronunciation, speech function,
interaction management, and discourse organization skills. Therefore, when we look at re-
search on speaking and ELF, we should focus our attention on studies dealing with the
pronunciation and pragmatics of ELF.
In the specific area of pronunciation, Derwing and Munro have established the in-
dependence of intelligibility from accent (Derwing & Munro, 2015; Munro & Derwing, 1995),
and the limited role of accent in communicative effectiveness (Derwing & Munro, 2009), with
intelligibility being allocated the primary role in the achievement of success in oral interaction.
Accent is certainly an important aspect of L2 pronunciation, as a marker of a L2 speaker’s
“identification and affiliation with the target language” (Moyer, 2014, p. 11). The extent to
which speakers are willing to abandon their previous identities and embrace new ones affects
the extent to which they adopt the pronunciation of a particular community of speakers. But
what happens when the L2 is not identified with a specific community, and the speaker does
344 DOI: 10.4324/9781003022497-29

Speaking and English as a Lingua Franca
not have a socially motivated pull to adapt their way of speaking to a particular model that
characterizes a group of speakers? What if the L2 is a lingua franca used to bring together
speakers of many different speech communities? This is the case of English in its global or
lingua franca dimension. The community of speakers of English is not restricted to a local or
national context. Rather, it is spread across the world and is made up of people unevenly
distributed among all countries in the world. English is thus the language used to bring to-
gether people of different cultural backgrounds, which means that speaking English in lingua
franca situations involves developing intercultural communication skills and enhanced prag-
matic strategies to deal with cultural and pragmatic diversity.
Whereas the intent of the acquisition of most languages is communication with local
communities of speakers, the more international the language is, the more diverse settings its
speakers will encounter. English is thus a unique language, given that it is spoken in so many
contexts. So, when we deal with models of speaking, and particularly models of pro-
nunciation, we may need first to question who comprises the speaker’s potential target au-
dience. Widdowson (2012) emphasized that communication in English in international
contexts is necessarily ruled by the specific purposes of the participants in the interaction.
Thus, the relevant question is whether that purpose involves a narrowly defined speech
community. If not, we face a situation in which there is no point in imitating the specific
pronunciation of a group of speakers. In this context, broad global intelligibility is the de-
sired ultimate attainment. Nativeness becomes a singularity with no particular significance
other than the social value attached to a community of speakers.
Standard language ideology has had an enormous impact on mainstream conceptions of
language and has deeply affected language teaching by confining and reducing language to
its formalized standard versions, barring from classrooms any form that is not considered
“correct.” The importance of standard language ideology in determining goals of L2
speaking can be seen in the promotion in textbooks of artificial pseudo-spoken forms pre-
cooked in written form that rarely occur in natural conversations, or in the promotion of
pronunciation forms produced by specific groups of speakers (i.e., British Received
Pronunciation in many UK textbooks and General American English in materials produced
in the United States). ELF is certainly not the only research paradigm that challenges the
concept of standard language and the predominant role it has had in linguistic and applied
linguistic analysis (Canagarajah, 2007; García & Wei, 2014; Makoni & Pennycook, 2007),
but it clearly confronts such ideology and calls into question native speakerism (Holliday
2005), a pervasive notion that has sacralized native speakers while simultaneously ignoring
the many colours the language could take when spoken by individuals who have a different
first language.
Not only has standard language ideology created a potentially discriminatory situation
for non-native speakers of English, but it also contributes to the discrimination against many
native speakers whose accent is not deemed socially acceptable. Conversely, ELF research
emphasizes diversity within language use. Though such diversity has always been present in
NS forms and many sociolinguists have documented it, ELF has contributed to the trend in
L2 teaching and SLA to take a broader perspective, letting go of rigidities and constraints
posed by a narrowly defined concept of language use and a target language model.
The historical evolution of ELF can be summarized as moving from describing a single
variety that could eventually become a new internationally accepted standard variety and
serving as a model to future learners of the language to the recent pluricentric
345
Enric Llurda
characterization of ELF as mainly determined by the use of pragmatic strategies of ac-

commodation and adaptation in interactions in multilingual situations.
In a more detailed manner, Jenkins (2015) described the evolution of the field as having
gone through three stages: ELF1, ELF2, and ELF3. Now I will address Jenkins’ account of
the three phases of ELF.
ELF1
In this phase, the goal was to characterize ELF interactions and predict the future evolution
of an upcoming international variety of English, and to help learners find the linguistic
aspects key to intelligibility in ELF. More or less implicitly, researchers were hopeful to
eventually codify such a variety and establish it as a norm for the teaching and learning of
English in international contexts (Seidlhofer, 2001).
In 2000, Jenkins put forward the Lingua Franca Core (LFC), which is one of the most
controversial aspects of ELF and L2 speaking. The LFC was (mis)interpreted by several
applied linguists as providing an alternative model to native varieties. However, Jenkins
(2007, 2015) claimed she did not intend to offer a prescriptivist set of rules to be complied
with or imposed as a model, and that the LFC is a hierarchy of pronunciation elements to
help teachers and learners establish priorities in their teaching and learning, rather than a
rigid simplified version of the syllabus. Such priorities would necessarily take into account
the particularities of different groups of learners and their L1s. The LFC gave a tre-
mendous push to general interest in ELF among English language teaching (ELT) re-
searchers and practitioners. Additionally, research on pronunciation teaching has shown
the importance of prioritizing elements of pronunciation that are most likely to con-
tribute in an increase in the intelligibility of the speaker (Derwing et al., 1998, Munro
et al., 2015).
Very little research has continued Jenkins’s (2000) on pronunciation and the LFC. One
relevant study is Deterding (2013), who analysed misunderstandings in conversations
among Asian speakers of English and concluded that pronunciation is the main factor
causing them. More recently, Gardiner and Deterding (2018) concluded that consonant
clusters need to be maintained and “teachers should focus on the full number of con-
sonants in initial clusters” (p. 231), but the exact quality of the second consonant is not as
relevant, as a variant realization of that consonant would not create as much trouble as
omitting it.
Recently, not as much attention has been paid to the LFC, as ELF researchers (Cogo &
Dewey, 2012; Jenkins, 2007; Seidlhofer, 2011) have emphatically argued that ELF is a
complex and diverse entity that originates in natural interactions among speakers of different
L1s; they have mainly aimed at achieving an understanding of the dynamics of such language
encounters and the potential common patterns that appear. Understanding the complexity of
ELF gave way to the two subsequent stages: ELF2 and ELF3.
ELF2
Seidlhofer (2009) outlined a new orientation of ELF research, in which the emphasis was not
as much on forms that identify ELF as an emerging variety of English, but as several “multi-
faceted multilingual repertoires” (p. 242) of members of the global English-using community
of practice. The focus was placed on ELF’s variability, “understood as a defining char-
acteristic of ELF communication” (Jenkins, 2015, p. 55). This perspective was not very
different from that of Canagarajah (2007) and Makoni and Pennycook (2007), who
346
questioned the existence of varieties per se and raised concerns as to what “English” and
“Englishes” actually refer to. Thus, the emphasis moved from the description of a new
variety to the acknowledgement of the inappropriateness of categorizing varieties and the
need to focus on communicative strategies of global speakers of English in different contexts
and situations. Seidlhofer (2009) explained that “the crucial challenge has been to move from
the surface description of particular features, however interesting they may be in themselves,
to an explanation of the underlying significance of the forms” (p. 241) and Cogo and Dewey
(2012) argued that they are not so much interested in surface-level features of ELF in-
novations as in “the underlying communicative motives that give rise to them” (p. 14). These
authors went on to describe ELF as “a naturally occurring, very widespread, especially
contemporary phenomenon,” which “entails contact between speakers from varying lin-
guacultural backgrounds,” “involves online modifications of English language resources to
suit the particular communicative needs of interlocutors” and “entails (…) processes of
identity signaling, codeswitching, accommodation and language variation” (Cogo & Dewey,
2012, p. 18). In line with these defining traits, the authors further claim that “successful
communication is any exchange that proves to be meaningful for the participants and that
has reached the required purpose or purposes,” as they wish to analyze “the strategies that
participants use to make the conversation work, the moves they make to negotiate meaning
or to align with their co-participants” (Cogo & Dewey, 2012, p. 36).
ELF3
Much recent ELF research makes it clear that it always takes place in multilingual en-
vironments where two or more languages are involved, and at least one of the speakers is
proficient in at least one language other than English; more often than not those languages
are also present in the interaction. This evolution of ELF research is parallel to the interest
in the constant and fluid interplay among languages in contact situations, and the em-
phasis on the hybridity and plurilithic nature of language use in spontaneous commu-
nication as in Garcia and Wei’s (2014) work on translanguaging, which is defined as
an approach to the use of language, bilingualism and the education of bilinguals

that considers the language practices of bilinguals not as two autonomous language
systems as has been traditionally the case but as one linguistic repertoire with fea-
tures that have been societally constructed as belonging to two separate lan-
guages (p. 2).
Thus, ELF research focuses on the description of language(s) in action in conversations

where English is the thread that holds the conversation together. There is no such thing as an
ELF variety or an ELF teaching model. ELF goes hand in hand with a plurilithic vision of
language, and it is rather defined by diversity and accommodation to different rules de-
termined by different settings and speakers.

Given the emphasis on establishing the legitimacy of the use of English by speakers with
different L1s, initial definitions tended to emphasize the notion that ELF is the English used
between L2 speakers. Thus, Firth (1996) defined ELF as “a contact language by persons who
share neither a common native tongue nor a common (national) culture, and for whom
English is the chosen foreign language of communication” (p. 240). And House (1999) also
347
Enric Llurda
explicitly excluded L1 speakers of English by defining ELF interactions as “interactions

between members of two or more different linguacultures in English, for none of whom
English is the mother tongue” (p. 74).
Even though these definitions may still resonate among some researchers, those who focus
on ELF will more likely agree with Seidlhofer’s (2011) definition: ELF is “any use of English
among speakers of different first languages for whom English is the communicative medium
of choice, and often the only option” (p. 7). Thus, ELF involves all kinds of international
conversations in which English is not the shared L1 but a common language of commu-
nication. The goal of learning an L2 is no longer to establish communication with L1
speakers, but to establish communication for all speakers who happen to choose English to
communicate. The great diversity of potential audiences invalidates restrictive visions of how
an L2 speaker should speak, and calls for intelligibility as the primary element to determine
what is “good” L2 pronunciation and what is “good” L2 speaking.
Advocating ELF implies dealing with degrees of uncertainty. One of the strengths of
standard language ideology, and possibly one of the reasons why it has so many sup-
porters among educators, is the clarity of its message. By embracing a single standard as
the only valid model in teaching, a language teacher avoids confronting the complexity of
language in all its diverse forms and uses. Yet, being an ELF scholar implies having to
deal with potential conflict, especially when one combines this role with that of language
educator, since the external (and possibly the internal) expectations with regard to the
ultimate goal in language education is to prepare students to produce a version of the L2
that is acceptable to their audiences, and that is normally equivalent to “error-free”
standard forms. This means that the educator feels the need to correct students’ errors,
while the ELF researcher acknowledges and values deviations from the standard norm as
legitimate forms of linguistic creativity produced by rightful users of English in lingua
franca contexts. The dilemma is reminiscent of those experienced by language educators
who work among socially deprived groups who speak varieties of the language that are
socially sanctioned as sub-standard and therefore stereotyped as indicating a low edu-
cational level, such as Ebonics in the United States. The respect for the individual’s
identity and choice of language resources is confronted with the social pressure to fit in a
community with a rigid stratification strongly connected to language usage. Both poles of
this dilemma must be acknowledged, as ignoring either one will inevitably lead to a radical
position missing a key part of the global picture of the international uses of English as a
lingua franca.
Another critical issue in describing English as a lingua franca is the community of
speakers. As Cogo and Dewey (2012) argue, the concept of speech community “has far less
to do with proximity or geographic location, less to do with group cohesion, and far more to
do with an increasingly virtual notion of interactional networks that may operate entirely
independently of physical setting” (p. 9), and so the old assumption that members of the
same speech community share a single geographical or political space has to be reconsidered.
Seidlhofer (2011) supports the notion of community of practice as it is less constraining than
the speech community and suggests that ELF speakers are drawn together by the specific
functionalities of their respective interactions.

A key characteristic of the ELF community is the diversity and fluidity of its members, such
that one may not assume a stable context to develop “a common repertoire of shared
practices” (Cogo & Dewey, 2012, p. 115). Yet, this lack of stability of its members is what
348
paradoxically binds them together in the shared awareness of “being involved in an especially
diverse linguacultural encounter” (Cogo & Dewey, 2012, p. 115). This includes native
speakers of English, who must be aware of such diversity to help achieve meaning. If they
rely on native speaker norms when in ELF interactions, it is quite likely they will encounter
trouble and ensuing miscommunication.
Diversity involves socio-pragmatic practices that may hinder communication, but at the
same time speakers’ linguacultural background provides them with a pool of resources to
face any communicative situation. Thus, diversity constitutes an asset in dealing with the
unexpected.
Current research on L2 speaking and ELF includes pragmatics and communication
strategies used by ELF speakers in interaction. Matsumoto (2011) is an example of a study
on negotiation strategies solving potential problems created by differences in pronunciation.
She argues that creating solidarity and trust among ELF speakers during the interaction is
fundamental, and observes that some L2 speakers find it difficult to establish such bonding
mechanisms with L1 speakers of English, an aspect that, according to the author, is a
consequence of unequal power relations.
The study of pragmatics in ELF started with the work of Firth (1996), who identified
several strategies found in international business negotiations, most notably the let-it-pass
and the make-it-normal principles, and was complemented by House’s (1999) and
Meierkord’s (2000) contributions on communicative efficiency. All three emphasized how
ELF conversations were characterized by cooperation and they highlighted the joint con-
struction of meaning that reduces misunderstandings to a minimum. Mauranen (2006)
concluded that misunderstandings do not occur more frequently in ELF than in non-ELF
conversations, possibly due to the use of proactive strategies in anticipation of commu-
nicative difficulty. Self-repairs, co-construction of expressions, unsolicited clarifications and
repetitions frequently occurred in Mauranen’s data. Thus, pragmatics and communication
strategies appeared to provide a compensatory mechanism to make up for actual formal
differences among different ELF speakers, as conversations are strongly oriented “toward
securing mutual intelligibility (…) quite possibly on the basis of the natural commonsense
assumption that it is not easy to achieve (mutual understanding) without special effort”
(Mauranen, 2006, p. 147).
Björkman (2011) described the use of pragmatic strategies in an ELF academic en-
vironment and found that speakers could use a wide array of strategies regardless their
degree of proficiency, and that students working in groups deployed such strategies more
frequently than lecturing instructors (in particular, “backchanneling” and “comment on
common ground”), which was attributed to the different speech events (interactive vs.
monologic) typical of group-work sessions and lectures.
According to Cogo and Dewey (2012), a key element in ELF communication is the use
of strategies during moments of “non-understanding,” which indicates a realization by
speakers of an unsuccessful interaction. Such strategies are useful to pre-emptively avoid
non-understanding, as they bring the interlocutor’s attention to a preceding part of the
conversation “in order to clarify or precise that particular segment” (Cogo & Dewey,
2012, p.128).
Finally, an aspect that has attracted attention in contemporary research in ELF, in line
with the current interest in translanguaging and the multilingual turn in ELF research is the
interplay among the different languages known by ELF speakers. Brunner and Diemer
(2018) and Vettorel (2019) describe how ELF speakers of different nationalities resort to
code-switching to either their own or their interlocutor’s L1 and “multilingual repertoires” to
build rapport and increase communicative efficiency.
349
Enric Llurda

Research on ELF speaking has generally relied on observational methods. Researchers have
either collected spontaneous data or have used data in an ELF corpus such as VOICE
(Vienna Oxford International Corpus of English), ELFA (English as a Lingua Franca in
Academic settings), and ACE (Asian Corpus of English). Major claims regarding ELF
speakers’ performance have been made after the analysis of transcripts of spontaneous
speech. However, laboratory-based controlled experimental studies are practically non-
existent. This is a consequence of the scope and object of study, since ELF research explores
the naturally occurring, phonological, lexical, grammatical and pragmatic particularities of
communication among speakers of English with different linguistic backgrounds.
Researchers’ main goal is to document and categorize what occurs in natural language
settings. Even major studies, such as Jenkins (2000) or Cogo and Dewey (2012) rely on the
analysis of recorded spontaneous conversations. Jenkins focused on miscommunication
occurrences to determine the phonological features that could be responsible for them,
whereas Cogo and Dewey relied on a combination of Conversational Analysis and ethno-
graphy to understand the lexico-grammatical innovations and pragmatic strategies deployed
by ELF speakers. The limited number of observations may raise questions regarding possible
confounding factors in these data. Cogo and Dewey (2012) claim that their research “does
not aim at being fully representative” and that they “set out to undertake analytic induction,
which involves determining to what extent a given case can be regarded as telling” (p. 35).
However, replicating and testing in narrowly controlled conditions some of the claims de-
rived from observational analyses should contribute to the generalization of the findings.
Thir (2020) has contributed one of the few studies that tests one of Jenkins’ LFC claims in a
controlled experiment involving one speaker and over 500 listeners from different countries
and L1 backgrounds. Thir concluded that “the LFC’s recommendation to prioritize the
teaching of /З:/ (the vowel in the British pronunciation of the word “nurse”) over other
English vowel qualities is inappropriate when learners employ a phonetic rather than a
phonemic substitution” (Thir, 2020, p. 477). Future ELF research will need to follow the
path set by this study to refine the claims initially made by Jenkins (2000), which should be
complemented by further empirical studies validating or repudiating them.

Numerous proposals have been made on how to incorporate an ELF perspective into
ELT. McKay (2002) problematized the concept of standard language and the use of
native speaker models, as well as the focus on native speakers’ culture, in English lan-
guage teaching. Seidlhofer (2011) also provided an account of the extent to which English
should be addressed with a “realistic” approach, thus addressing the needs of the learners,
because sticking to just “real” (i.e., “native”) uses of the language would not contribute to
their learning as effectively as catering to their communicative and learning needs. She
argued that the English acquired and used by English as a foreign language (EFL)
learners will never be equivalent to native English usage, and therefore using a native-
speaker model is not appropriate. It may be more effective to focus on helping students
develop communicative efficiency by first analyzing the strategic means ELF users deploy
to understand and make themselves understood in spontaneous conversation and by
implementing a pedagogy that helps learners to progress from their current position. In
her own words: “even if one were to accept that the objective of learning is to replicate
native-speaker behavior, it makes no pedagogic sense to focus on this objective in
350
complete disregard of how it might be achieved” (2011, p. 182–183). The key point is that
“learners’ non-conformities are to be categorized not as errors but as evidence of suc-
cessful learning” (Seidlhofer, 2011, p. 186). One example of how traditional native
speaker-oriented pedagogical models may not be the best ones is the denial of any sig-
nificant role of the L1 in L2 development. The use of the L1 as a scaffolding mechanism to
increase L2 proficiency has been appraised in fairly recent literature (Hall & Cook, 2012),
which shows that learners can benefit from transferring tools and strategies developed in
their L1.
ELF researchers’ recommendations for teaching practitioners involve a change of attitude
rather than a list of discrete items to be incorporated or erased from common ELT syllabi.
Llurda and Mocanu (2019) outlined a detailed five-stage programme to change teachers’
attitudes towards ELF:
• Stage 1: Exposure to “realistic” situations, with examples of cultural and linguistic

diversity
• Stage 2: Analysis of data showing NNS professional performance
• Stage 3: Analysis of examples of academic uses of ELF
• Stage 4: Prospective scenarios for international English
• Stage 5: Reflection on one’s own teaching identity, context, condition and the ideal (yet
realistic) L2-self.
Teacher education is paramount in the required change of attitude or mentality. Awareness

has been repeatedly referred to as a key element in this change (Llurda et al., 2018). Bayyurt
and Sifakis (2017) and Sifakis and Bayyurt (2018) have suggested the term ELF-aware tea-
chers to describe instructors who follow a transformational programme implemented among
Turkish in-service teachers. This programme involves three phases: (1) exposure; (2) critical
awareness; and (3) action plan. According to these authors, the combination of awareness of
concepts with practical application in teaching is fundamental to foster a change towards
teaching English with an ELF perspective.
Seidlhofer (2011) proposes a series of principles that should be incorporated in ELF-
oriented classrooms. The basic tenet is that language instruction must acknowledge and
respect learners’ needs. If such needs involve approximating as much as possible a native
variety, the teaching should contribute to that. When the needs of learners are not necessarily
to integrate into a native-speaking community of speakers, but to communicate with
whomever they encounter, “the purpose of teaching becomes the development of a capability
for effective use which involves the process of exploiting whatever linguistic resources are
available, no matter how formally ‘defective’” (Seidlhofer, 2011, 197); this view is crucially in
step with current thinking on translanguaging as a communicative strategy and a pedagogic
tool (García & Wei, 2014).
In the area of pronunciation, ELF researchers have always emphasized the need to aim at
promoting intelligibility disregarding native-imitation, clearly aligning with most L2 pro-
nunciation researchers. And yet, the question of which pronunciation model to choose in the
classroom is still controversial among teachers who have been educated under a native or-
ientation and feel they need to accurately and flawlessly present a native model to their
students, which often results in a choice between a British or American model. This ob-
viously has implications in the selection of teachers in some countries (at least in private
schools), who are often hired based on their place of birth rather than their training or
expertise, which can result in inequality and discrimination towards fully trained, capable
non-native teachers who can also serve as models of successful multilingual speakers highly
351
Enric Llurda
aware of the complexities of using English in multilingual and multicultural contexts

(Llurda, 2015).
Inspired by Jenkins’ (2000) LFC, Walker (2010) argued for the need to: (a) raise learner
awareness; (b) teach Jenkins’ LFC; and (c) improve accommodation skills. Awareness and
accommodation skills would become the pillars supporting ELF pronunciation, thus em-
phasizing the need to constantly adapt and facilitate communication in ELF interactions.
For Walker, raising learner awareness involves letting students know about sociolinguistic
facts and the role of English worldwide, and reflecting on accent diversity in all languages
and the amount of stereotyping and prejudices projected over some accents. As for teaching
the LFC, Walker suggests working on minimal pairs that are complex for speakers of any
given L1, helping develop automatic speech habits through drills in disguise, such as tongue-
twisters, focusing on voicing and aspiration in individual consonants, and on consonant
clusters and vowel length. Walker also gives ideas on how to improve accommodation skills,
such as phonological accommodation, occurring when a speaker repeats a misunderstood
word with a slight change in pronunciation, to make it more comprehensible to the listener.
Such adjustments occur particularly when the speaker is aware of potential communication
problems caused by their way of pronouncing a specific English phoneme, for instance /r/ for
a Japanese speaker. However, it is not only the speaker who is responsible for intelligibility.
The listener has a major role in co-constructing intelligibility, and training people to listen to
accented speech can be an effective way to contribute to enhanced intelligibility, as shown by
Derwing et al. (2002). Walker (2010) proposes doing activities that enhance negotiation of
meaning skills. By way of example, he borrows Lynch’s (1996) “indirect negotiation” ac-
tivity, in which learners listen in groups to a recording. They can stop the recording any
moment when they do not understand the text, and when they do so, they discuss what they
have understood, and then proceed to listen, or elicit additional information from the tea-
cher, or continue listening to see if they can fill the comprehension gap.
For several years, Communicative Language Teaching has been considered the best
method for teaching English. Yet, Seidlhofer and Widdowson (2019) criticize it for equating
communicative competence in English with knowing “how to communicate as NSs do”
(p. 23), and claim that “the purpose of teaching would become the development of a cap-
ability for effective use which involves the process of exploiting whatever linguistic resources
are available, whether they conform to NS norms or not” (p. 30). Kohn (2019) highlights
“speaker satisfaction as the endonormative yardstick of communicative success” (p. 36), thus
giving L2 speakers the power to determine whether the ultimate goal of communication has
been achieved without going through the filter of exo-normative constraints.
Although the impact on ELF in English language education is underway; several fronts
remain strongly attached to native speaker norms. Textbooks is one such area. Despite the
recent surge of textbooks by major publishing houses claiming to address the needs resulting
from the global spread of English, these books’ syllabi are centred on NS norms (Dewey,
2015). Assessment is another area where ELF is struggling to have an impact. Chopin (2015)
shows how an English test assessing the communicative skills of English professors at a
Danish university is based on native speaker norms, and Newbold (2019) discusses the un-
derlying competences of an “ELF-aware test,” arguing learners should be assessed on their
capacity to understand and use language in interaction, disregarding attachment to standard
norms when they do not contribute to mutual understanding. Newbold prefers the term
“ELF-aware test” to “test of ELF” to indicate that the test measures communicative com-
petence in its globality and in connection to the specific context in which interactions takes
place, rather than linguistic competence in its more restricted, decontextualized sense. This is
an idea all communicative-oriented teachers could embrace, once they get over the long-
352
established tradition of measuring NNS competence against NS production. Certainly, the

scarce research available on ELF-aware language testing calls for the need to integrate
current knowledge in language testing and ELF to start envisioning tests that take the di-
versity of English used in global exchanges into account, as “the responsibility for successful
communication is (…) clearly joint, and involves more factors than language proficiency”
(McNamara & Shohamy, 2016, p. 230).
One may conclude that the essential aspects to take into account in teaching English with
an ELF focus are related to exposure, production and assessment, respectively. With regard
to exposure, learners should be provided with a great diversity of speaking models. The
guiding principle for production would be simplicity, letting speakers use the forms that
come more naturally to them. As for assessment, the keyword is “tolerance,” as intelligibility
should be the benchmark for determining what is considered an error.
7 Future Directions
ELF research has contributed to L2 speaking research by providing a profound under-
standing of interactions among speakers of English from different L1 backgrounds, and
particularly emphasizing how such speakers manage to develop communicatively efficient
interactions. Yet, as Derwing (2016) suggested, we need to expand research by conducting
longitudinal studies observing the effect of different communicative resources on long-
term intercultural relations. Furthermore, the work of Derwing and her colleagues on the
interplay of intelligibility, comprehensibility and accent demonstrates how accent is par-
tially independent of the more important variables which determine L2 users’ capability of
being understood by both L1 and L2 speakers. Thus ELF could benefit by incorporating
the framework developed by Derwing and Munro (1997; 2005) in which they distinguish
among those three speech dimensions, and outline several factors that are key in successful
communication, including comprehensibility and intelligibility, fluency, pragmatics,
nature of the interaction (short-term vs. ongoing) and Willingness to Communicate
(Derwing, 2016).
The implementation of an ELF-aware vision in speaking pedagogy and assessment needs
further development, especially when the general principle of going beyond NS norms is
translated into practical teaching and testing guidelines. However, research on pedagogical
experiences is increasing and the findings will likely engage more teachers into ELF-aware
practices. The current moment demands a higher integration of ELF and multilingualism
research traditions, as ELF always takes place in multilingual environments (Jenkins, 2015).
Llanes and Cots (2020), and Llurda and Cots (2020) demonstrate how a translanguaging and
ELF-inspired pedagogy in a Business English course can result in better outcomes for stu-
dents than for those following a conventionally monolingual NS-oriented approach.
Research on ELF speaking must continue making inroads into standard SLA research
and mainstream English pedagogy. However, as researchers and practitioners become in-
creasingly aware of the transformations required by the global spread of English, they are
gradually changing their perspective of a prototypical conversation in English, from one
involving NSs in a country where English is the majority language, to one in which speakers
are multilingual and who may live anywhere in the world.
Further Reading
Cogo, A., & Dewey, M. (2012) Analysing English as a lingua franca. A corpus-driven investigation.
London: Bloomsbury.
353
Enric Llurda
A corpus-based study on ELF that explores lexical and pragmatic aspects of ELF use, and additionally
provides a theoretical discussion on ELF research and its outcomes.
Jenkins, J., Baker, W., & Dewey, M. (Eds.). The Routledge handbook of ELF. Abingdon, UK:
Routledge.
A comprehensive volume dealing with different research topics and approaches to ELF research, and its
pedagogical implications. The most authoritative voices in the field have contributed to the book.
Sifakis, N., & Tsantila, N. (Eds.). English as a lingua franca for EFL contexts. Bristol: Multilingual Matters.
Reflective questions and tips for teachers on how to make the transition from traditional EFL to ELF-
aware teaching.
References
Bayyurt, Y., & Akcan, S. (Eds.). (2015). Current perspectives on pedagogy for English as a Lingua
Franca. Berlin: De Gruyter Mouton.
Bayyurt, Y., & Sifakis, N. (2017). Foundations of an EIL-aware teacher education. In A. Matsuda
(Ed.), Preparing teachers to teach English as an International Language (pp. 3–18) Clevedon:
Björkman, B. (2011). Pragmatic strategies in English as an academic lingua franca: Ways of achieving
communicative effectiveness? Journal of Pragmatics, 43, 950–964.
Brunner, M-L., & Diemer, S. (2018). “You are struggling forwards, and you don’t know, and then
you… you do code-switching…” – Code-switching in ELF Skype conversations. The Journal of
English as a Lingua Franca, 7, 59–88.
Burns, A. (2016). Research and the teaching of speaking in the second language classroom. In E. Hinkel
(Ed.), Handbook of research in second language teaching (pp. 242–256). Abingdon: Routledge.
Canagarajah, S. (2007). Lingua franca English, multilingual communities, and language acquisition.
The Modern Language Journal, 91, 923–939.
Chopin, K. (2015). Reconceptualizing norms for language testing: Assessing English language profi-
ciency from within an ELF framework. In Y. Bayyurt & S. Akcan (Eds.), Current perspectives on
pedagogy for English as a Lingua Franca (pp. 193–204). Berlin: DeGruyter Mouton.
Cogo, A., & Dewey, M. (2012). Analysing English as a Lingua Franca. A corpus-driven investigation.
London: Bloomsbury.
Cook, V. J. (1999). Going beyond the native speaker in language teaching, TESOL Quarterly, 33,
185–209.
Derwing, T. M. (2016). Challenges for intelligibility and comprehensibility in ELF. Plenary talk de-
livered at the 9th International Conference of English as a Lingua Franca, Lleida, June 26, 2016.
Derwing, T. M., & Munro, M. J. (1997). Accent, intelligibility, and comprehensibility. Studies in Second
Derwing, T. M., & Munro, M. J. (2005). Second language accent and pronunciation teaching: A
research-based approach. TESOL Quarterly, 39, 379–397.
Derwing, T. M., & Munro, M. J. (2009). Putting accent in its place: Rethinking obstacles to com-
munication. Language Teaching, 42, 476–490.
Derwing, T. M., & Munro M. J. (2015). Pronunciation fundamentals: Evidence-based perspectives for L2
teaching and learning. Amsterdam: John Benjamins.
Derwing, T. M., Munro, M. J., & Wiebe, G. (1998). Evidence in favor of a broad framework for
pronunciation instruction. Language Learning, 48, 393–410.
Derwing, T. M., Rossiter, M., & Munro, M. J. (2002). Teaching native speakers to listen to foreign-
accented speech. Journal of Multilingual and Multicultural Development, 23(4), 245–259.
Deterding, D. (2013). Misunderstandings in English as a lingua franca. Berlin: De Gruyter Mouton.
Dewey, M. (2015). Time to wake up some dogs! Shifting the culture of language in ELT. In Y. Bayyurt
& S. Akcan (Eds.), Current perspectives on pedagogy for English as a Lingua Franca (pp. 121–134)
Berlin: DeGruyter Mouton.
Firth, A. (1996). The discursive accomplishment of normality: On ‘lingua franca’ English and con-
versation analysis. Journal of Pragmatics, 26, 237–259.
García, O., & Wei, L. (2014) Translanguaging: Language, bilingualism and education. Basingstoke:
Palgrave.
Gardiner, I. A. & Deterding, D. (2018). Pronunciation and miscommunication in ELF interactions.In
J. Jenkins, W. Baker & M. Dewey (Eds.). The Routledge handbook of ELF (pp. 224–232). Abingdon,
UK: Routledge.
354
Grosjean, F. (1989). Neurolinguists, beware! The bilingual is not two monolinguals in one person. Brain
and Language, 36, 3–15.
Hall, G., & Cook, G. (2012). Own-language use in language teaching and learning. Language Teaching,
45, 3, 271–308.
Holliday, A. (2005). The struggle to teach English as an international language. Oxford: Oxford
University Press.
House, J. (1999). Misunderstanding in intercultural communication: Interactions in English as a lingua
franca and the myth of mutual intelligibility. In C. Gnutzmann (Ed.), Teaching and learning English
as a global language (pp. 73–93). Tübingen: Stauffenburg.
Press.
Jenkins, J. (2007). English as a lingua franca: Attitude and identity. Oxford: Oxford University Press.
Jenkins, J. (2015). Repositioning English and multilingualism in English as a lingua franca. Englishes in
Practice, 2, 49–85.
Kachru, B. B. (1985). Standards, codification and sociolinguistic realism. In R. Quirk & H. G.
Widdowson (Eds.), English in the world: Teaching and learning the language and literatures
Kohn, K. (2019). Towards the reconciliation of ELF and EFL: Theoretical issues and pedagogical changes.
In N. Sifakis & N. Tsantila (Eds.), ELF for EFL contexts (pp. 32–49). Bristol: Multilingual Matters.
Llanes, A., &. Cots, J. M. (2020). Measuring the impact of translanguaging in TESOL: A plurilingual
approach to ESP. International Journal of Multilingualism. doi: 10.1080/14790718.2020.1753749.
Llurda, E. (2015). Non-native teachers and advocacy. In M. Bigelow & J. Ennser-Kananen (Eds.), The
Routledge handbook of educational linguistics (pp. 105–116). New York: Routledge.
Llurda, E., Bayyurt, Y., & Sifakis, N. (2018). Raising teachers’ awareness about English and English as
a lingua franca. In P. Garrett & J. M. Cots (Eds.), The Routledge handbook of language awareness
(pp. 155–169). Abingdon: Routledge.
Llurda, E., & Cots, J. M. (2020). PLURELF: A project implementing plurilingualism and English as a
lingua franca in English language teaching at university. Status Quaestionis, 19, 259–276.
Llurda, E., & Mocanu, V. (2019). Changing teachers’ attitudes towards English as a lingua franca. In
N. Sifakis & N. Tsantila (Eds.), ELF for EFL contexts (pp. 175–191). Bristol: Multilingual Matters.
Lynch, T. (1996). Communication in the language classroom. Oxford: Oxford University Press.
Makoni, S., & Pennycook A. (Eds.). (2007). Disinventing and reconstituting languages. Bristol:
Matsumoto, Y. (2011). Successful ELF communications and implications for ELT: Sequential analysis
of ELF pronunciation negotiation strategies. The Modern Language Journal, 95, 1, 97–114.
Mauranen, A. (2006). Signaling and preventing misunderstanding in English as lingua franca com-
munication. International Journal of the Sociology of Language, 177, 123–150.
McKay, S. L. (2002). Teaching English as an international language. Oxford: Oxford University Press.
McNamara, T., & Shohamy, E. (2016). Language testing and ELF: Making the connection. In M.-L.
Pitzl & R. Osimk-Teasdale (Eds.), English as a lingua franca: Perspectives and prospects
(pp. 227–233). Berlin: DeGruyter.
Meierkord, C. (2000). Interpreting successful lingua franca interaction. An analysis of non-native-/non-
native small talk conversations in English. Linguistik Online, 5. https://bop.unibe.ch/linguistik-
online/article/view/1013/1673
Moyer, A. (2014). The social nature of L2 pronunciation. In J. M. Levis & A. Moyer (Eds.), Social
dynamics in second language accent (pp. 11–29). Boston: De Gruyter.
Munro, M. J., & Derwing, T. M. (1995). Foreign accent, intelligibility and comprehensibility in the
Munro, M. J., Derwing, T. M., & Thomson, R. I. (2015). Setting segmental priorities for English
learners: Evidence from a longitudinal study. IRAL, 53, 39–60.
Newbold, D. (2019). ELF in language tests. In N. Sifakis & N. Tsantila (Eds.), ELF for EFL contexts
(pp. 211–226). Bristol: Multilingual Matters.
Seidlhofer, B. (1999). Double standards: Teacher education in the expanding circle. World Englishes,
18, 233–245.
Seidlhofer, B. (2001). Closing a conceptual gap: The case for a description of English as a lingua franca.
International Journal of Applied Linguistics, 11, 133–158.
Seidlhofer, B. (2009). Common ground and different realities: World Englishes and English as a lingua
franca. World Englishes, 28, 236–245.
355
Enric Llurda
Seidlhofer, B. (2011). Understanding English as a lingua franca. Oxford: Oxford University Press.
Seidlhofer, B., & Widdowson, H. G. (2019). ELF for EFL: A change of subject? In N. Sifakis & N.
Tsantila (Eds.), ELF for EFL contexts (pp. 17–31). Bristol: Multilingual Matters.
Sifakis, N., & Bayyurt, Y. (2018). ELF-aware teaching, learning and teacher development. In J.
Jenkins, W. Baker, & M. Dewey (Eds.), The Routledge handbook of ELF (pp. 456–467). Abingdon,
UK: Routledge.
Sridhar, K. K., & Sridhar, S. N. (1986). ‘Bridging the paradigm gap: Second language acquisition
theory and indigenized varieties of English’. World Englishes, 5, 3–14.
Thir, V. (2020). International intelligibility revisited. L2 realizations of NURSE and TRAP and
functional load. Journal of Second Language Pronunciation, 6, 458–482
Vettorel, P. (2019). Communication strategies and co-construction of meaning in ELF: Drawing on
“multilingual resource pools”. Journal of English as a Lingua Franca 8, 179–210.
Walker, R. (2010). Teaching the pronunciation of English as a lingua franca. Oxford: Oxford University
Press.
Widdowson, H. G. (2012). ELF and the inconvenience of established concepts. Journal of English as a
Lingua Franca, 1, 5–26.
356
PART V
Emerging Issues
25
WORKPLACE COMMUNICATION
Lynda Yates
Workplace communication is a broad research area encompassing a range of spoken,
written, and mediated communicative activity. Studies from multiple perspectives using
several methodologies have provided rich descriptions of how people communicate to get
things done, exercise leadership, and build relationships at work. My aim here is not to
survey this work in its entirety – this has been well-covered elsewhere (e.g., Vine, 2018, 2020).
Rather, it is to consider the implications of this work for adults who must learn to speak an
additional, later-learned language at work. This is an emerging issue, not because it is new –
people have been using other languages at work for time immemorial – but because the world
of work is changing fast and the explosion in world-wide interconnectivity, large-scale global
flows of workers at all levels and the “superdiversity” (Vertovec 2007) that now characterizes
our major cities make it more important than ever to ensure adult learners have the skills
needed to speak effectively at work.
Although I will draw chiefly on literature related to English, my focus is on spoken com-
munication rather than the features of any specific language. The term “language” is therefore
used as shorthand for “the use of language that is appropriate in context.” This entails the
ability to understand the nature of an interaction and the relative rights and obligations of
participants, that is, sociopragmatic competence, and the ability to select and use the language
behaviours that are pragmatically appropriate, that is, pragmalinguistic competence.
As workplace communication is a relatively new area of focus in SLA studies, much of the
research is more accurately discussed in terms of critical issues currently contributing to the
emerging research strengths of the field.
3 Critical Issues and Topics; Current Contributions and Research Talk and
Workplaces
Workplaces have been described as “a moving target” (Kerekes, 2018, p. 414) for learners. Not
only do they embrace an almost infinite variety of settings, tasks and people, but also they
change at a phenomenal speed in ways that previous generations could not have imagined
DOI: 10.4324/9781003022497-31 359

Lynda Yates
under the impact of technical innovation and global events. These new and constantly evolving
conditions demand flexibility of thought, activity, and therefore of modes and styles of com-
munication. Moreover, since “jobs for life” are now the exception, an individual is likely to
experience a range of different roles and workplaces, so that an ability to style-shift and tailor
communication to context is of even greater importance, particularly for migrants who often
start “at the bottom” of the employment ladder, whatever their pre-migration skills.
Although talk at work shares many characteristics with talk elsewhere, it is constrained by,
and assumes particular meanings in relation to, the shared community of practices in the
workplace in which it is embedded (Drew & Heritage, 1992). While this is true for workplaces
in general, it is particularly visible in specialized work contexts where behaviours are strictly
regulated according to roles and power differences, such as turn-taking in courtrooms,
classrooms, boardrooms, and meetings. While the communicative demands of different
workplaces vary considerably, they are typically characterized by an orientation towards a
shared goal. This focus can require not only precise expression and technical language, but also
the ability to make and interpret economical contextual references to activities that may be
obscure to outsiders. Moreover, this shared attention to current or past activity can necessitate
complex explanations and a facility with specialized or localized discourses associated with a
particular industry or workplace (Handforth, 2018; Koester, 2006).
At work, we assume roles and identities which confer different rights and obligations to
speak from those we enact outside, and we often have to communicate effectively with a wide
range of people at different levels, whether we like them or not. This requires considerable
style-shifting and on-going relational work, both in overtly social conversations around the
water cooler or in the lunch room, and in more formal on-task interactions, and now in
virtual contexts for many. All of these interactions are important for building and main-
taining productive workplace relationships, especially for newcomers, since the consequences
of not “getting it right” can be severe, impacting not only efficiency and well-being, but also
ultimately job security and long-term career progression (Holmes, 2000).
Communicative Demands of the Workplace

Much of the research that has illuminated workplace communication has focused on office or
professional environments. Work from a Language for Specific Purposes (LSP) perspective,
for example, has documented the language needs of an increasingly mobile professional
workforce in areas such as law, engineering, healthcare, and aviation (e.g., Paltridge &
Starfield, 2012). These specialized discourses rely heavily on tacit reference to shared cultural
values relating to the profession, which neophytes, whatever their language background, must
learn if they are to communicate successfully. Corpus approaches to data collection and
analysis have allowed the comparison of small specialized corpora (particularly of business
English, e.g., Handforth, 2010) with larger, more general corpora to demonstrate how lan-
guage used in specialized workplace contexts may differ from language use elsewhere. This has
allowed the identification of linguistic patterns in large data samples, which can be explored
qualitatively to provide illustrations of how they are used in context (Handforth, 2018).
Koester (2006) describes the sophisticated way in which idioms can be used in a workplace as
distancing devices to allow emotionally-charged matters to be discussed in generalities thereby
deflecting the focus of discussion away from personal responsibility, or in a self-deprecating
way to mitigate threats of various kinds (pp. 109–114). These and other studies have enriched
the evidence base for language programmes targeting cosmopolitan, well-educated clientele.
Analyses of naturally occurring workplace data by the Language in the Workplace project
in New Zealand have added significantly to our understanding of communication in several
360
Workplace Communication
work settings and have provided insight into key areas of workplace talk. These include the
use of humour (Holmes, 2006), the role of social talk (Holmes, 2000), how leadership and
gender are enacted (Holmes, 2006; Holmes et al., 2011), and how people wield power and get
things done (Holmes & Stubbe, 2015; Vine, 2004). This important work uses an interactional
sociolinguistic methodology to illustrate and foreground the important role of subtle,
pragmatic aspects of interaction in how speakers achieve their goals and come to be per-
ceived by others. It has also highlighted the centrality of informal interpersonal aspects of
communication in the workplace (Newton & Kusmierczyk, 2011). This research has had
considerable practical as well as theoretical impacts through collaborative initiatives with
government authorities and the dissemination of findings through materials and programmes
targeting migrants, as well as those dealing with a multicultural workforce (e.g., Joe &
Riddiford, 2017; Riddiford & Newton, 2010; Work Talk).
Descriptions of the language demands of workforce entry roles have received somewhat
less attention, and indeed many can be considered “language marginal” (McAll Bayley 2003)
in that they offer few opportunities for workers to use their additional language at all. Yet, it
is in these more menial roles that many migrant learners, even those from professional
backgrounds, first find employment, and in which they not only hope to earn a living, but
also to learn the dominant language and start engaging with their new communities. One
aspect of factory floor communication that has received attention, and sometimes surprises
migrants to English dominant workplaces, is the role of swearing and humour. Daly et al.’s
(2004) study of talk on the factory floor of a packaging plant highlights the importance of
this behaviour as a means of building solidarity and comradeship. The use of humour in the
form of short jokes and apparently confrontational complaints and competitive sequences
reflects a close working relationship within some teams and increases bonding to make life
more enjoyable.
In considering the diversity of workplaces, it is also important to note their increasingly
multilingual nature as they reflect the superdiversity of our communities and the global reach
of many enterprises. Communication in such workplaces demands the ability to adroitly
navigate the “ecosystem” of different languages in the office or on the factory floor (Angouri,
2014). Some companies mandate the use of a particular language, often English, through an
explicit language policy. Research on ELF suggests that lingua franca communication takes
on particular characteristics: users are more collaborative, more tolerant of deviation from
norms, less focussed on linguistic accuracy and more supportive of each other as they ne-
gotiate meaning together (Firth, 1996). However, alongside explicit language policies, im-
plicit language policies can also affect how interactions are managed and reacted to (Hazel,
2015). This can involve the selection of a particular language for a particular function and
can give rise to interactional norms internalized by experienced members of a community but
which newcomers need to learn. Code-switching, for example, might occur at work because it
is efficient (Kleifgen, 2013) or to signal either solidarity or discord (Lauriks et al., 2015).
Since language selection can index membership, the confident use of the language policy can
index a high status, as illustrated by the public sanctioning of the use of a dispreferred
language (Danish) in a business meeting where ELF is routinely used (Hazel, 2015, p. 147).
At the other end of the employment spectrum, Goldstein (1997) has illustrated the role that
different languages can play in wielding power and getting things done on the factory floor.
Moreover, workers find their own creative ways of communicating. Kalmar (2001), for example,
showed how Latino immigrant workers in the United States used their own language as they
worked through English, matching unfamiliar English sounds to familiar ones, producing hybrid
words. While the development and use of these “fragmented” and context-specific “multilingual
repertoires” (Blommaert 2010) facilitates communication and evidences considerable creative
361
Lynda Yates
linguistic and metalinguistic skill, it can also leave workers unsure as to exactly what language
they are speaking and thus lacking in confidence about their actual level of proficiency in the
dominant language (Pujiastuti, 2017). For individuals seeking to climb the employment ladder,
such short-term efficiency gains can have a deleterious impact on longer-term prospects.
Culture and the Workplace

Although culture should be regarded as dynamic and fluid rather than deterministic
(Schnurr, 2013), research from a range of perspectives has illustrated its potential impact on
multiple aspects of spoken communication. A notoriously slippery concept, culture includes
speakers’ “basic assumptions and values, orientations to life, beliefs, policies, procedures and
behavioural conventions” (Spencer-Oatey, 2008, p. 3). Work by Gumperz and colleagues
(Gumperz, 1982) highlighted the importance of early learned cultural expectations in how
speakers interpret the intentions of others. Insights from this early research were able to feed
into materials targeting immigrants to the United Kingdom (Gumperz et al., 1979), pro-
viding the impetus for more recent work that has continued to raise awareness of the lin-
guistic and cultural barriers to effective communication that migrants face in pre-work
interviews and at work (Roberts, 2010, 2011).
Investigating everyday intercultural encounters involving immigrant communities,
Gumperz (1982) illustrated how speakers convey and interpret meaning through the use of
verbal and paralinguistic “contexualization cues” that signal meaning. Since these cues are
culturally shaped, conventionalized within communities, and largely below the level of
consciousness, they can be easily misinterpreted by speakers from different groups, as illu-
strated in the famous example of Pakistani cafeteria workers. Unused to the conventional
British English usage of rising intonation to make an offer, these servers used flat intonation
as they offered diners gravy with their food. This was perceived as off-hand. Some customers
reacted negatively and their behaviour was perceived as racist by the cafeteria workers,
leading to disagreements and misunderstandings between the two groups.
Insights from interactional sociolinguistic analyses have illustrated how what seem to be
very minor differences in usage can have a major impact on how speakers are perceived, and
this work has laid the foundation for investigations of the impact of linguistic and cultural
background using different methodologies from different perspectives. Fine-grained analysis
using rapport management theory has, for example, further illuminated the role of these less
visible aspects of pragmatic behaviour in cross-cultural miscommunication during business
interactions and their link to broader sociocultural and interactional principles. This ap-
proach has therefore provided a useful way of understanding the link between the micro and
the macro, between how interactions are enacted and the wider, culturally variable politeness
and face considerations that drive their use (Fletcher, 2018; Spencer-Oatey & Jiang, 2003).
Another influential approach to understanding how cultural and linguistic background
impact communication has focussed on the performance of speech acts. Building on early
work by the Cross Cultural Speech Act Research Project (Blum-Kulka et al., 1989), studies
comparing data from native speakers and learners from several language backgrounds have
thrown a spotlight on how languages differ in the repertoire of pragmalinguistic devices
available to mitigate face threat and build solidarity, and how the conventions learned in our
early language socialization affect the way speakers use a subsequently learned language (see,
for example, Félix-Brasdefer, 2017; Martinez-Flor 2010). This perspective has provided the
basis of materials focussing on pragmatics and has generated interest in how these pragmatic
aspects of language use can be learned and taught (Tatsuki & Houck, 2010; Ishihara &
Cohen, 2010).
362
While experimental role-plays used in early interlanguage pragmatic studies allowed the
comparison of learner and expert speaker behaviour in similar situations, they inevitably
tapped into the perceptions of “idealized” general settings and contexts about the use of a
particular variety (often English in the United States). In addition, as student populations
offer a ready source of participants for such studies, the findings likely reflect the intuitions of
young adults rather than older speakers more experienced in the workplace. To specifically
compare intuitions of experienced participants in workplace situations with those of migrant
learners preparing for the workforce in Australia, Yates (2010) elicited role-play data from
Dinka background refugees and mature Australians performing in common workplace si-
tuations. The findings highlighted some specific pragmalinguistic and sociopragmatic areas
where the learners might benefit from explicit attention, leading to the development of
teaching materials to specifically target these (Yates & Springall, 2010). Similarly, in the
interest of increasing workplace relevance, a combination of role-play data collected from
trained physicians and ethnographic data from naturally occurring doctor–patient con-
sultations were used in a series of studies investigating the pragmatic and cultural challenges
facing internationally trained doctors preparing for registration in Australia (Dahm & Yates,
2013; Yates & Dahm, 2016; Yates et al., 2016).
Cultural and linguistic backgrounds influence not only the forms that speakers select and
aspects of delivery such as turn-taking, hesitation phenomena, and gaze, but also expecta-
tions regarding the rights and obligations of speakers, the topics considered appropriate to
discuss, and so on. Since these vary cross-culturally, they are a potential source of mis-
communication at work. An example of this is small talk. While apparently trivial, research
suggests that small talk is actually very important for building good relationships and social
inclusion at work, yet the topics, functions, forms and notions of when and with whom small
talk might be appropriate can vary across cultures, so that newcomers can find themselves
excluded and unsure how to participate (Holmes, 2000). Similarly, other sociocultural ex-
pectations about behaviour at work, including how much and when to smile or laugh, what
counts as friendship, how much personal disclosure or informality is appropriate, and so on
can vary across cultures and pose a challenge for newcomers (Yates & Major, 2015).
To complicate matters, any workplace involves the intersection of different cultures re-
lating not only to the wider language and cultural community, but also to the particular
industry, company, and even department, and all will be relevant to the communicative
expectations of that site (Haugh & Watanabe, 2018). Effective speaking therefore demands
an understanding of the macro- and micro-cultural environment, as well as an understanding
of different genres and the communicative behaviours and roles expected in different
workplace activities. Thus, as discussed above, the swearing and apparently competitive
workplace talk reported in Daly et al. (2004) might in other settings be viewed as con-
frontational and offensive, while in the context of that factory floor it served to build soli-
darity. Learning to talk effectively at work involves much more than the skill to supply
behaviours appropriate to the context. It also demands the development of the knowledge
and skills to take stock of a new situation and respond appropriately. How these might best
be acquired through informal and formal learning will be considered later.
Learning to Speak Effectively at Work

Building on the original insights of Communities of Practice perspectives (Lave & Wenger,
1991; Wenger, 1998), studies have illuminated how newcomers can learn from more
363
Lynda Yates
experienced others, moving from peripheral to more central membership as they gain in their
understanding of how things are done in a community. Drawing on work in the medical field,
Lu and Corbett (2012) argue that it is only through direct observation that someone can
understand how things actually work. As any workplace involves the intersection of various
communities of practice (Holmes & Stubbe, 2015), there is clearly a role for informal on-the-
job learning in the development of both proficiency and relevant workplace and industry-
specific expertise (Losa, 2018). Moreover, it can sometimes be easier to see exactly what
needs to be learned, and how this might be different from what you already know, once you
are at work. Ming, in Li (2000), came to understand that her workplace expected commu-
nication to be “directly, truthful and things a little bit sweet” (p. 75) in the “American way.”
Similarly, Charles from Colombia observed that things were done differently in his
Australian workplace. His role in furniture design meant that he had to critique the work of
others, but he came to realize that it did not go down well if he gave negative opinions
directly. He therefore learned that he had to “say without saying” in order to keep relations
sweet at work, even if he felt the designs were “horrible” (Yates & Major, 2015, p. 147).
However, the success or otherwise of language learning at work depends on many factors,
including how conducive the workplace environment is to learning, or even communication.
Drawing on reports from migrants participating in a longitudinal study of their early ex-
periences in Australia, Yates (2018) reviews some of the factors that fostered or impeded
their ability to learn English at work. She concluded that although the strong motivation to
work among participants increased the relevance of learning English, especially since there
were few other opportunities to engage with English outside class, it could also be a siren call,
luring them prematurely from their language classes. Professional roles were particularly
useful for language learning as they made more demands on their skills and forced learners to
take risks with their language. Many found that even low-level jobs gave them the confidence
to use the English that they had, particularly if colleagues were helpful, and that these jobs
could be useful stepping stones to other, more challenging roles as their English improved.
However, many migrants take jobs in businesses run by speakers of their first language
(L1), or in factories, laundries, bakeries, salons or other workplaces requiring little English
outside the routine phrases and technical vocabulary needed to get the job done. In some
workplaces it can be too noisy to communicate, or talking can be explicitly forbidden (Yates,
2018). Thus, many of the menial jobs taken by migrants can offer few language learning
opportunities, as illustrated in Strömmer’s (2015) case study of Kifibin, a graduate from
Uganda who worked as a cleaner in Finland. As cleaning is often undertaken when everyone
else is absent, he worked largely on his own with little or no contact with clients or anyone
else. His supervisors only spoke to him when there were changes to the normal routine, and
his engagement with Finnish was therefore largely limited to receiving instructions and
asking for clarification.
However, even where co-workers are present, they are not always keen or helpful inter-
locutors. Derwing (2016), for example, found that employees in the Canadian workplace she
studied tended to socialize within their own first language groups. Most of the 24 native
speaker respondents reported difficulties understanding and being understood by their mi-
grant colleagues, and two-thirds indicated they were sometimes reluctant to initiate con-
versations with them, even though, ironically, most felt that more practice speaking would
improve their co-workers’ English. This seems to be a common experience (e.g., Sandwell,
2010; Yates, 2011). While these difficulties may relate to their lack of experience with the
language and formats of workplace conversations, they may also reflect the reluctance of
native and expert speakers to engage. Reasons for not engaging may range from a desire not
to place an undue burden on colleagues who appear to struggle with language, through to
364
discomfort with ethnic or cultural differences or, more seriously, racism. Whatever the cause,
responsibility for good communication rests with everyone, not solely newcomers. This
suggests an important role for training programmes targeting the skills and attitudes of
managers and co-workers (Derwing, 2016; Derwing et al., 2021).
Even where English is used at work as a lingua franca, co-workers may speak varieties
that are difficult to understand, or the workplace itself might foster the development of its
own variety. Thus, although Jurik from Mexico used English all the time with his interna-
tional co-workers in the kitchen of the golf club where he worked, their focus was on the
efficient delivery of food to customers rather than the beauty or otherwise of their English, so
that he felt the “kitchen English” he used did not help to improve his English (Chappell,
Benson & Yates, unpublished data). Similarly, Pujiastuti (2017) reports how hotel workers
from different language backgrounds developed a very effective multilingual mode of
communication that resulted in a bricolage, which one of them referred to as “basura de
lenguas” (language trash). While such creativity may be effective for productivity in the
workplace, it is much less useful for learning the dominant language used outside. Thus,
while experiential learning in the workplace can be beneficial, formal language learning also
has an important role to play in developing the range of spoken language that migrants and
others need if they are to retain the ability to respond to new employment opportunities.
Below I consider issues related to the design and delivery of programmes to support language
learning for the workplace.
Formal Learning: Language Programmes

Given the huge diversity in workplaces, and what might count as effective communication
within them, it is clear that “workplace language” is not one body of work that can be
reduced to some kind of an inventory (Kerekes, 2018). What, then, should be the priorities
for language programmes preparing migrants for the workforce? On the basis of a series of
studies conducted in companies with high migrant numbers, Derwing (2016) proposes a
focus on “the three P’s of ESL in the workplace: proficiency, pronunciation and prag-
matics” (p. 10).
Since speaking is an interactive skill, an important aspect of proficiency, and one often
under-represented in traditional, writing-based language classrooms familiar to many
migrants, is the ability to use language flexibly in real time. Learning to speak is not simply
the activation of written language in spoken form, since spoken language differs from
written language on a whole range of features. It involves a host of different skills, leading
some to propose the notion of “discourse” or “spoken grammars” (see Bygate, 2018;
Timmis, 2005 and Surtees & Duff, this volume). Talk at work is likely to call for a facility
with not only the quick-fire, short turns characteristic of many focussed spoken exchanges
(e.g., Kleifgen, 2013), but also with longer, syntactically complex explanations or in-
structions. Formal language programmes can supply the opportunity to practise and re-
ceive feedback on longer turns that are often lacking outside the classroom. Moreover,
although the acquisition of specialist language is often seen as important for the work-
place, it can be more straightforward to acquire the necessary technical language than the
more general ability to switch between the formal and informal styles of communication
demanded by many workplaces, as Duff et al. (2002) report. Although the healthcare aides
in their study found the medical language on their training programme demanding, it was
the informal, colloquial terms for similar activities that they needed to use with their
elderly patients, and the sociocultural agility to switch between them, that they found
particularly challenging.
365
Lynda Yates
The aspect of linguistic proficiency singled out for particular attention as a priority by
Derwing (2016), pronunciation, has long been recognized as an important, but often ne-
glected concern for employers, co-workers and the learners themselves. Since this aspect of
spoken language is covered in more depth elsewhere (see Derwing & Munro, this volume),
here I will only make brief comment on what might be appropriate pronunciation goals, why
they should be a priority for training, and the role of interlocutors at work.
While it is unnecessary and unfair to expect mature second language (L2) users to sound
native-like, there is general agreement that a comfortable intelligibility is a reasonable goal
for the workplace (see Derwing et al., 2021). Since adults typically find it challenging to
“hear” and reproduce the often subtle phonological and prosodic features of a later learned
language, they may not be aware of the extent to which their pronunciation affects their
intelligibility, and even sympathetic interlocutors can feel shy about offering the assistance
they might with other aspects of language, such as vocabulary or grammar. This makes it a
priority candidate for specific instruction, particularly as a speaker’s problematic pro-
nunciation can too easily be used to make negative judgements about their general language
ability, and even their competence (Derwing et al., 2021).
Intelligibility is a two-way process, and interlocutors’ familiarity with or attitude towards
an accent or ethnic group can play a major role in their ability or willingness to understand
what is said. Thus, accents associated with “high status” groups may be more “easily” un-
derstood (Bresnahan et al., 2002). Moreover, workplaces can be pressured environments
where supervisors and co-workers are not always sympathetic listeners. This suggests that
practical programmes for co-workers and managers on how to improve their understanding
and ability to communicate with migrant colleagues is also a priority, alongside pro-
nunciation classes for the migrants themselves. Findings from the few studies reporting on
the outcomes of such training are encouraging, even for older learners and long-term mi-
grants who have been in the workforce for many years (see Derwing, 2016; Derwing et al.,
2014; Derwing et al., 2002; Kang & Rubin, 2012).
As the discussion of research into the communicative demands of the workplace in the
previous parts demonstrates, pragmatic proficiency, that is, the ability to convey and un-
derstand intended meaning appropriately in context, is vital to communicative success. This
makes it a clear priority for workforce preparation programmes. To be able to use their
general language proficiency appropriately and to good effect, speakers need the socio-
cultural knowledge and skills to accurately identify the nature of the situation and context in
which they are speaking, the purpose of the conversation, and the relative rights and ob-
ligations of the participants. Crucially, since all communication has a relational dimension
indexing attitude, speakers also require the ability to make pragmalinguistic choices that
accurately reflect their intentions in that context. Since, as discussed earlier, these pragmatic
conventions are often below the level of consciousness and usually learned during early
language socialization in childhood, they have been referred to as “the secret rules” of
language (Yates, 2004). The danger here is that, while syntactic or phonological errors index
a speaker as a learner, infelicities involving pragmatic choices are less clearly “visible” as
learner errors, and thus more likely to be misinterpreted as rudeness. Importantly, general
language proficiency does not guarantee pragmatic proficiency. Indeed, greater general L2
proficiency can actually lead speakers into greater difficulty, since the more a learner can say,
the greater the scope for pragmatic infelicity (Kasper, 1992). This suggests the need for an
explicit focus on pragmatics.
However, despite its centrality in effective communication, with some notable exceptions,
pragmatics is often overlooked in preparation programmes. Learning materials and activities
appropriate to the level of learners are not always available (Diepenbroek & Derwing, 2013).
366
As Kerekes (2018) notes, applied linguists have an important role to play in making their work
available and accessible to the world of practice. She rejects, however, the notion that research
findings can be simplified into scripts that can simply be learned, advocating instead that they
be used as the stimulus for a focus on strategies rather than rules. This approach has been used
successfully in short courses for professional migrants drawing on insights from collaborations
between researchers and practitioners in the Language in the Workplace project. Through a
combination of explicit pragmatic instruction involving reflection on models and the oppor-
tunity to engage and observe during workplace internships, course participants could develop
their awareness of sociocultural and pragmalinguistic phenomena. Riddiford and Joe (2010),
for example, demonstrated how learners not only developed their awareness of different ways
of making requests, but they also put them into practice during their work placements and
reflected on their own behaviour. In this way, they began to develop the understandings and
analytical skills necessary to explore workplace pragmatics for themselves. Such skills are a
crucial component of any workplace programme because no one course can cover everything
learners need to communicate effectively at work.
A combination of explicit instruction, reflection and the opportunity for observation and
engagement in a workplace offers the advantages of both formal and informal learning,
equipping learners with the tools needed to have agency in their own learning beyond the
classroom. Approaches to workforce language training such as paid hours of formal learning
within working hours, programmes specifically designed and delivered in and for particular
workplaces (see, for example, Norquest, 2010) and workplace internships embedded into
language programmes (AMES Australia, 2016; Riddiford & Joe, 2010) offer a welcome al-
ternative to after-hours classes, which learners can find too disconnected from their own
workplace and too exhausting after a full-day’s work.
An important issue raised in workplace language training relates to the models that should be
used for programmes, since many workplaces are multilingual, and adult learners are not, and
may not want to be, native speakers of the language they are learning. As Timpe-Laughlin (2019)
warns, the “target norm” should not be equated to “native speaker norm.” As noted earlier,
lingua franca communication may have its own norms. Another issue raised in the literature
relates to the scope of programmes preparing migrants for the workplace, and the tension be-
tween providing migrants with the skills they need to do their (entry-level) jobs and the provision
of a foundation capable of catering for their longer-term language needs and career progression
(Warriner, 2010). Further, prescriptive approaches to language and culture training constrain
what learners can say, and thus how they act and think at work. The socially constitutive and
indexical nature of talk means that learning to talk the way that others do at work is a double-
edged sword: on the one hand, it can bring a feeling of acceptance, but on the other, it can
encourage the reproduction, and therefore tacit endorsement, of the status quo. There are
dangers, too, in prescriptive approaches that objectify culture by essentializing groups or failing
to acknowledge the complexity and fluidity of interaction and community membership. Thus, a
social constructionist perspective should be incorporated alongside historical or geographically
bound perspectives (Lazzaro-Salazar, 2018). Such recommendations accord with contemporary
approaches to the teaching of pragmatics, which stress learner reflection, understanding and
choice, rather than compliance with a set of rules (Kerekes, 2018; Timpe-Laughlin, 2019).
5 Future Directions
To conclude, I briefly outline directions for future research in workplace communication and
how research findings can benefit different kinds of learners who must acquire the knowledge
and skills they need for work.
367
Lynda Yates
While research into workplace communication has already provided considerable insight
into the nature and formats of how people talk at work, the workplaces of tomorrow will be
vastly different from those of the past, and the rate of change is rapid and ferocious. New
industries, new technologies and developments on the home–work interface will change work
practices and spawn new ways of interacting. It is imperative that research into the nature of
workplace communication not only keep pace with these changes, but also expand to include
attention to hitherto neglected aspects of spoken communication. While pragmatic per-
spectives have rightfully been a major focus, the relationship between pragmatics and other
modes of delivery relevant to communicative success such as pronunciation (but see Derwing
et al., 2021), gesture, deixis, gaze, the use of artefacts, and so on are as yet under-researched
and suggest important possibilities for future endeavours. We must also build our under-
standing of how communication works in other work contexts and languages.
Moreover, the increasingly globalized and multilingual nature of workplaces mean that
research into the role, status, and use of different languages and varieties at work will be
increasingly relevant. Such research will make an important contribution theoretically to our
understanding of the role and status of different languages at work; in practical terms it will
provide an evidence base for training interventions supportive of speakers from all back-
grounds – native, expert, and learner – to manage this diversity. This suggests the importance
of increased research attention to the nature of ELF and other linguae francae. Since much
ELF research to date has been conducted in academic contexts and in Europe, there is a need
to expand the scope to the exploration of other languages and workplace contexts.
From an applied perspective, we need more research insights into the design, conduct and
evaluation of workplace language training programmes and into how learners gain the skills
they need for work. While studies of workplace communication can increase researcher
awareness of the source content for such programmes, that is, how people actually com-
municate at work, there remains the significant challenge of ensuring access to these findings
for policy-makers, curriculum designers, and practitioners to inform practice. Long-term
collaboration between researchers and practitioners offers one productive way of addressing
this need. Collaborative, integrated research-to-practice initiatives could include exploration
of the content, design, and delivery of programmes for both learners and their co-workers.
For researchers, this connection with practitioner–collaborators can offer good insights into
learner challenges, access to new research sites and a sense of being able to make a real
difference. For practitioners, it can mean a reliable evidence base for materials development,
opportunities for professional growth, and confidence in relevance of their work to the
current needs of their learners. Such collaborations can help to nurture a stronger connection
between research and practice in support of those who need to learn more about how to
communicate effectively in the workplace.
Further Reading
Riddiford, N., & Joe, A. (2010). Tracking the development of sociopragmatic skills. TESOL Quarterly,
44(1), 195–205.
A report on the conduct of a practical intervention for migrant language learners which combined
classroom instruction based on empirical evidence from the Language in the Workplace Project with
reflection on experiences during a practical internship in New Zealand workplaces.
Timpe-Laughlin, V. (2019). Pragmatics learning in the workplace. In N. Taguchi (Ed.), The Routledge
handbook of second language acquisition and pragmatics (pp. 413–428). New York, NY: Routledge.
This chapter draws on a number of empirical studies of pragmatics in workplace communication to
offer both theoretical and applied perspectives that will be useful for readers with an interest in the
processes of learning as well as in the language used in work contexts.
368
Vine, B. (Ed.) (2018). The Routledge handbook of language in the workplace. Abingdon, Oxon:
Routledge.
A fascinating introduction to studies in the field as it brings together a wide range of perspectives on
workplace communication using a staggering array of different research methodologies.
References
Ames Australia. (2016). In transition: Employment outcomes of migrants in English language programs at
AMES Australia. Research and Policy Unit AMES Australia December 2016. Accessed 02.09.20 from
https://www.ames.net.au/-/media/files/research/transitions_slpet-short-report_-final_dec-2016.pdf
Angouri, J. (2014). Multilingualism in the workplace: Language practices in multilingual contexts.
Multilingua, 33(1–2), 1–9.
Angouri, Jo. (2012). Managing disagreement in problem-solving meeting talk. Journal of Pragmatics,
44, 1566–1570.
Blommaert, J., & Dong, J. (2010). Ethnographic fieldwork: A beginner’s guide. Buffalo, NY:
Blum-Kulka, S., House, J., & Kasper, G. (Eds). (1989). Cross-cultural pragmatics: Requests and
apologies. Norwood, NJ: Ablex.
Bresnahan, M. J., Ohashi, R., Nebashi, R., Liu, Y., & Shearman, S.M. (2002). Attitudinal and affective
response toward accented English. Language & Communication, 22(2), 171–185.
Bygate, M. (2018). Creating and using the space for speaking within the foreign language classroom:
what, why and how?. In R.Alonso (Ed.) Speaking in a second language.Amsterdam,
Netherlands:John Benjamins.
Dahm, M., & Yates, L. (2013). English for the workplace: Doing patient-centred care in medical
communication. TESL Canada, 30(Special Issue 7), 21–23.
Daly, N., Holmes, J., Newton, J., & Stubbe, M. (2004). Expletives as solidarity signals in FTAs on the
factory floor. Journal of Pragmatics, 36(5), 945–964.
Derwing, T. M. (2016). The 3 P’s of ESL in the workplace: Proficiency, pronunciation and pragmatics.
In H. McGarrell & D. Wood (Eds.), Contact: Refereed proceedings of the TESL Ontario Research
Symposium, 42(2), 10–20.
Derwing, T. M., Munro, M. J., Foote, J. A., Waugh, E., & Fleming, J. (2014). Opening the window on
comprehensible pronunciation after 19 years: A workplace training study. Language Learning, 64, 526–548.
Derwing, T.M., Waugh, E., & Munro, M. J. (2021). Pragmatically speaking: Preparing adult ESL
students for the workplace, Applied Pragmatics, 3(2),107–135.
Derwing, T. M., M. J. Rossiter, & M. J. Munro. (2002). Teaching native speakers to listen to foreign-
accented speech. Journal of Multicultural and Multilingual Development, 23(4), 245–259.
Diepenbroek, L. G., & Derwing, T. M. (2013). To what extent do popular ESL textbooks incorporate
oral fluency and pragmatic development? TESL Canada, 30, 1–20.
Drew, P., & Heritage, J. (1992). Analyzing talk at work: An introduction. In P. Drew & J. Heritage
(Eds.), Talk at work: Interaction in institutional settings (pp. 3–65). Cambridge, NY: Cambridge
University Press.
Duff, P., Wong, P., & Early. M. (2002). Learning language for work and life: The linguistic socialization
of immigrant Canadians seeking careers in healthcare. The Modern Language Journal, 86, 397–422.
Félix-Brasdefer, J. C. (2017). Interlanguage pragmatics. In Y. Huang (Ed.), The Oxford handbook of
pragmatics (pp. 416–434). Oxford: Oxford University Press.
Firth, Alan (1996). The discursive accomplishment of normality: On ‘lingua franca’ English and con-
versation analysis. Journal of Pragmatics, 26, 237–259.10.1016/0378-2166(96)00014-8.
Fletcher, J. (2018). Rapport management. In B. Vine (Ed.), The Routledge handbook of language in the
workplace (pp.77–88). Abingdon, Oxon: Routledge.
Goldstein, T. (1997). Two languages at work: Bilingual life on the production floor. Mouton de Gruyter.
Gumperz, J.J. (1982). Discourse strategies. Cambridge, England: Cambridge University Press.
Gumperz, J. J., Jupp, T. C. , & Roberts, C. ( 1979). Crosstalk. Background materials and notes to
accompany the BBC film. London: National Centre for Industrial Language Training.
Handforth, M. (2010). The language of business meetings. New York, NY: Cambridge University Press.
Handforth, M. (2018). Corpus linguistics. In B. Vine (Ed.), The Routledge handbook of language in the
workplace (pp. 51–64). Abingdon, Oxon: Routledge.
Haugh, M., & Watanabe, Y. (2018). In B. Vine (Ed.), The Routledge handbook of language in the
369
Lynda Yates
Hazel, S. (2015). Identities at odds: Embedded and implicit language policing in the internationalized
workplace. Language and Intercultural Communication, 15(1), 141–160.
Holmes, J. (2000). Talking English from 9 to 5: Challenges for ESL learners at work. International
Journal of Applied Linguistics, 10(1), 125–140.
Holmes, J. (2006). Sharing a laugh: Pragmatic aspects of humour and gender in the workplace. Journal
of Pragmatics, 38(1), 26–50.
Holmes, J., Marra, M., & Vine, B. (2011). Leadership, discourse and ethnicity. New York, NY: Oxford
University Press.
Holmes, J., & Stubbe, M. (2015). Power and politeness in the workplace: A sociolinguistic analysis of talk
at work (2nd edn). London, UK: Routledge.
Holmes, J., & Riddiford, N. (2011). From classroom to workplace: Tracking socio-pragmatic devel-
opment. ELT Journal, 65(4), 376–386.
Ishihara, N., & Cohen, A. D. (2010). Teaching and learning pragmatics: Where language and culture
meet. Abingdon, Oxon: Pearson Education.
Joe, A., & Riddiford, N. (2017). Applying research to real world challenges and Issues: developing
research-based resources to help migrants enter the workforce. In M. Marra, & P. Warren (Eds.),
Linguists at Work. Wellington: Victoria University Press.
Kalmar, T.(2001). Illegal alphabets and adult biliteracy: Latino migrants crossing the linguistic border.
Mahwah, NJ: Erlbaum.
Kang, O., & Rubin, D. (2012). Inter-group contact exercises as a tool for mitigating undergraduates’
attitudes toward ITAs. Journal of Excellence in College Teaching, 23(3), 159–166.
Kasper, G. (1992). Pragmatic transfer. Interlanguage Studies Bulletin (Utrecht), 8(3), 203–231.
Kerekes, J. (2018 ). Language preparation for internationally educated professionals. In Vine, B.
(Ed). Handbook of language in the workplace (pp. 389–400). Abingdon, Oxon: Routledge.
Kleifgen, J. A. (2013). Communicative practices at work: Multimodality and learning in a high-tech firm.
Koester, A. (2006). Investigating workplace discourse. London, UK: Routledge.
Koester, A. (2010). Workplace discourse. London, UK: Continuum.
Li, D. (2000). The pragmatics of making requests in the L2 workplace: A case study of language
socialization. The Canadian Modern Language Review/La Revue canadienne des langues vivantes,
57(1), 58–87.
Lazzaro-Salazar, M. (2018). Social constructionism. In B. Vine (Ed). Handbook of language in the
workplace (pp. 425–435), Abingdon, Oxon: Routledge.
Lauriks, S., Siebörger I., & De Vos, M. (2015). “Ha! Relationships? I only shout at them!” Strategic
management of discordant rapport in an African small business context. Journal of Politeness
Research, 11(1), 7–39.
Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation. Cambridge, UK:
Losa, S. A. (2018). Vocational education. In B. Vine (Ed.), The Routledge handbook of language in the
Lu, P-Y, & Corbett, J. (2012). English in medical education: An intercultural approach to teaching
language and values. Bristol: Multilingual Matters.
McAll, C. (2003). Language dynamics in the bi-and multilingual workplace. In R. Bayley & S. R.
Schecter (Eds.), Language socialization in bilingual and multilingual societies (pp. 235–250).
Clevedon: Multilingual Matters.
Martinez-Flor, A., & Usó-Juan, E. (Eds.) (2010). Speech act performance: Theoretical, empirical and
methodological issues. Amsterdam: John Benjamins.
Newton, J. (2004). Face-threatening talk on the factory floor: Using authentic workplace interactions in
language teaching. Prospect, 19(1), 47–64.
Newton, J., & Kusmierczyk, E. (2011). Teaching second languages for the workplace. Annual Review of
Norquest (2010). Common ground. English in the workplace. A how-to guide for employers. Edmonton,
Canada: Norquest College.
Paltridge, B., & Starfield, S. (2012). The handbook of English for specific purposes. Malden, MA: Wiley-
Blackwell Publishing.
Pujiastuti, A. (2017). Language socialization in the workplace: Immigrant workers’ language practice
within a multilingual workplace. Unpublished dissertation, Graduate School of The Ohio State
University.
370
Roberts, C. (2010). Language socialization in the workplace. Annual Review of Applied Linguistics, 30,
211–227.
Roberts, C. (2011). Gatekeeping encounters in employment interviews. In S. Sirangi & C. Candlin
(Eds.), Handbook of communication in organisations and professions (pp. 407–432). Berlin: De
Gruyter.
Riddiford, N., & Joe, A. (2010). Tracking the development of sociopragmatic skills. TESOL Quarterly,
44(1), 195–205.
Riddiford, N., & Newton. J. (2010). Workplace talk in context: An ESOL resource. Wellington, New
Zealand: School of Linguistics and Applied Language Studies, Victoria University Wellington.
Sandwall, K. (2010). ‘I Learn More at School’: A critical perspective on workplace-related second
language learning in and out of school. TESOL Quarterly, 44, 542–574.
Schnurr, S. (2013). Exploring professional communication: Language in action. London: Routledge.
Spencer-Oatey, H. (2008). Introduction. In H. Spencer-Oatey (Ed.), Culturally speaking: Culture,
communication and politeness theory (pp. 1–8). London, UK: Continuum.
Spencer-Oatey, H. (2000). Rapport management: A framework for analysis. In H. Spencer-Oatey (Ed.),
Culturally speaking. Managing rapport through talk across cultures (pp. 11–46). New York, NY:
Continuum.
Spencer-Oatey, H., & Jiang, W. (2003). Explaining cross-cultural pragmatic findings: Moving from
politeness maxims to sociopragmatic interactional principles (SIPs). Journal of Pragmatics,
35(10–11), 1633–1650.
Strömmer, M. (2016). Affordances and constraints: Second language learning in cleaning work.
Multilingua, 35(6), 697–721.
Tatsuki, D. H., & Houck, N. (2010). Pragmatics: Teaching speech acts. Alexandria, VA: TESOL.
Timmis, I. (2005). Towards a framework for teaching spoken grammar. ELT Journal, 59(2), 117–125.
Timpe-Laughlin, V. (2019). Pragmatics learning in the workplace. In N. Taguchi (Ed.), The Routledge
handbook of second language acquisition and pragmatics (pp. 413–428). New York, NY: Routledge.
Vertovec, S. (2007). Superdiversity and its implications. Ethnic and Racial Studies, 30, 1024–1054.
Vine, B. (Ed.), (2018). The Routledge handbook of language in the workplace. Abingdon, Oxon:
Routledge.
Vine, B. (2020). Introducing language in the workplace. Cambridge, UK: Cambridge University Press.
Vine, B. (2004). Getting things done at work. Amsterdam: John Benjamins.
Wenger, E. (1998). Communities of practice: Learning, meaning and identity. Ambridge, UK: Cambridge
University Press.
Warriner, D. S. (2010). Competent performances of situated identities: Adult learners of English.
Teaching and Teacher Education, 26, 22–30.
Work Talk. Accessed 1.08.20 at https://worktalk.immigration.govt.nz.
Yates, L. (2004). The ‘secret rules of language’: Tackling pragmatics in the classroom’. Prospect:
Journal of Australian TESOL, 19(1), 3–21.
Yates, L. (2018). Migrants at work: Language learning on-the-job. In B. Vine (Ed.), Handbook of
language in the workplace (pp. 425–435). Abingdon, Oxon: Routledge.
Yates, L. (2010). Dinkas downunder: Request performance in simulated workplace interaction. In G.
Kasper, H. Nguyen, D. Yoshimi & J. Yoshika (Eds.), Pragmatics and language learning 12
(pp. 113–140). University of Hawaii: National Foreign Language Resource Center.
Yates, L. (2011). Interaction, language learning and social inclusion in early settlement. International
Journal of Bilingual Education and Bilingualism, 14(4), 457–471.
Yates, L., Dahm, M., Roger, P., & Cartmill, J. (2016). Rapport and teamwork in Australia: Insights for
international medical graduates. English for Specific Purposes, 42, 104–116.
Yates, L., & Dahm, M. (2016). Doing patient-centred consultations: Some challenges for IMGs. In S.
White & J. Cartmill (Eds.), Communication in Surgical Practice (pp. 35–67). Sheffield, UK: Equinox.
Yates, L., & Major, G. (2015). “Quick-chatting”, “smart dogs”, and how to “say without saying”:
Small talk and pragmatic learning in the community. System Journal, 48, 141–152.
Yates, L., & Springall, J. (2010). Soften up!: Successful requests in the workplace. In D. Tatsuki & N.
Houck (Eds.), Pragmatics from research to practice: Teaching speech acts (pp.67–86). Alexandria,
Va: TESOL.
371
26
THE RELATIONSHIP BETWEEN
L2 SPEECH PERCEPTION AND
PRODUCTION
Ron I. Thomson
On hearing someone speaking with a familiar foreign accent, we can often identify the
speaker’s first language (L1) background. This perceptual ability is rarely accompanied by an
equal capacity to perfectly imitate the same accent. This disconnect could signal that speech
perception and production are two independent skills. Alternatively, the two skills may be
connected, but precision in production may lag behind accuracy in perception. In some
unusual cases, the ability to produce foreign language sounds may precede a speaker’s ability
to perceive them. This talent could indicate that learning to produce second language (L2)
sounds can be facilitated by a strategy of applying explicit knowledge in controlled contexts,
something that is not possible in infant L1 learning.
Scholars with an interest in speech perception and production have traditionally treated
these as separate processes. Moreover, researchers typically focus on one or the other, not
both. This division arose from early evidence that aphasia (neurological language impair-
ment) often impacts perception (Wernicke’s aphasia) and production (Broca’s aphasia) se-
parately, depending on the location of brain injury (Lichtheim, 1885). Psycholinguistic
research also largely treats speech perception and production as separate systems. Perception
is seen as comprising a number of complementary, non-linear processes (e.g., McClelland &
Elman, 1986), while production is characterized as largely linear in nature (e.g., Levelt,
1999). Given these different orientations, it is impossible to simply apply perceptual pro-
cesses in reverse to arrive at an explanation for speech production, or vice versa. See de Bot
and Bátyi (this volume) for a discussion of L1 and L2 speech models, and Simard (this
volume) for descriptions of the psycholinguistic processes involved in L2 speech production.
One consequence of placing speech perception and production in separate scholarly silos
is that limited attention has been given to potential relationships between subsystems in-
volved in each larger process. For example, both speech perception and speech production
feature subsystems which process meaning, words, and individual sounds. It is not un-
reasonable to ask whether subsystems within the larger perception and production processes
may interact across modalities. To the extent that they do, changes in perception could lead
to changes in production, and vice versa. The focus of this chapter is on this narrow sense of
speech perception and production. Henceforth, speech perception refers to the ability of
listeners to decode phonetic input and recognize individual segments (speech sounds) as
372 DOI: 10.4324/9781003022497-32

L2 Speech Perception and Production
intended by speakers; speech production refers to how a speaker generates segments in spoken
utterances. While speech perception and production also include prosodic patterns, these are
the focus of Mok (this volume) and are not discussed here.
I begin with a brief summary of relevant findings from typical L1 speech development
research, a basic understanding of which is essential if one assumes that L2 speech learning
relies on the same mechanisms underlying L1 acquisition. I then discuss L2 speech literature
to provide evidence supporting a partial alignment of L2 speech perception and production.
The chapter concludes with implications for L2 pronunciation teaching practices and future
directions.
L1 Speech Perception and Production Development

In L1 acquisition, the earliest vestiges of perceptual capabilities appear before birth. Late-
term fetuses can recognize their mother’s voices, and immediately after birth newborns show
a preference for that voice (Lee & Kisilevsky, 2014). During the first 6 months, infants can
discriminate nearly any phonetic contrast in any of the worlds’ languages, but after that
point, they begin losing this ability. This coincides with the emergence of robust L1-specific
phonological categories, a necessary precursor to later language development (Kuhl, 2004).
By the age of four, children’s ability to perceive foreign language contrasts has diminished to
the point that they perform no better than adults (Werker & Tees, 1984). A similar albeit
time-lagged pattern emerges in L1 speech production of individual sounds. Initially, infants
produce laryngeal sounds, but not intentional speech sounds. At 6 months, the segments
infants produce are not language-specific. By ten months, however, infant babble begins to
closely match their ambient input. Grenon et al. (2007) and Saffran et al. (1999) observe that
quite early in life, babies’ L1 speech perception and production both reflect statistical
properties of the language around them.
The Relationship Between L1 Speech Perception and Production

Given the developmental path of L1 speech, from perception to production, and how speech
matches properties of the ambient language, it may seem obvious that the two skills are
related. Yet how these systems are connected is a matter of dispute. “Auditorists” (e.g.,
Kingston & Diehl, 1994) argue that speech perception develops through exposure to pho-
netic information, and that in production, speakers attempt to match what listeners expect to
hear. In contrast, “gesturalists” (e.g., Liberman & Mattingly, 1985) suggest that auditory
perception cannot explain how listeners categorize highly variable input. They point to the
fact that how a particular sound category is pronounced varies from speaker to speaker, and
even within the same speaker’s productions, depending on context. To explain how listeners
categorize such input, gesturalists claim that listeners make reference to an innate and finite
set of speech gestures that can reasonably reproduce the sound that they perceive.
Neurophysiological evidence from brain-imaging studies has been used to support this view
(Fadiga et al., 2002). While interesting, the debate about the nature of the system is beyond
the scope of this chapter. What is important to note is agreement that the two systems appear
to be somehow connected at early stages of L1 acquisition.
While the mechanisms underlying L1 speech perception and production become attuned
to the target language, and subsequently less optimized for learning new categories, there is
evidence that they are available across the lifespan. When Goldinger and Azuma (2004) had
373
Ron I. Thomson
adults read printed words before and after repeated exposure to auditory tokens of those
same words, independent judges indicated that the read words sounded more like imitations
of the auditory prompts after exposure. A related ability is apparent in naturalistic speaking
contexts. The phonetic realizations of words spoken by pairs of interlocutors begin to
converge over the course of single conversation (Pardo, 2006). This suggests that inter-
locutors notice phonetic information in their conversation partner’s speech, and modify their
own productions, in real-time, to more closely approximate their interlocutors’
pronunciation.
The Nature of L2 Speech Learning

Applying child L1 speech theory and research to adult L2 learning is complicated by fun-
damental differences in age of learning and language experience (Flege et al., 1997; Flege
et al., 1995). Furthermore, adult L2 learners are not blank slates; they already possess
strongly established L1 speech categories, which often interfere with the development of L2
categories (Flege et al., 2003). Scovel (1988) pointed out that unlike other language skills,
production requires physiological control of motor patterns, which are best learned early in
life. While age, experience and L1 account for much of the difficulty adult L2 learners face,
other individual differences such as motivation, aptitude, and context of learning have also
been shown to play a role (Baker Smemoe & Haslam, 2013).
L2 Speech Perception and Production Development

If L2 speech uses the same mechanisms as L1 learning, we should assume a similar path from
perception to production. Flege’s (1995) Speech Learning Model (SLM), the most influential
conceptualization of L2 speech learning, posits exactly such a natural progression. Evidence
from naturalistic L2 speech perception and production research supports Flege’s (1995)
claims. While the ability to discriminate nonnative phonological contrasts dramatically de-
creases by the age of four (Werker & Tees, 1984), there is no evidence that it disappears
completely. Munro and Derwing (2008) and Derwing and Munro (2015) present longitudinal
data for vowels and consonants, confirming that during early periods of intensive L2 ex-
posure, L2 speech perception and production can improve without explicit instruction.
Within a few months, however, average improvement plateaus far short of native-like ability.
These studies also provide compelling evidence, almost always overlooked in other research,
that individual learners follow very different learning trajectories than those represented by
group means. Individual differences include substantial variability in how accurately
speakers of the same L1 produce target L2 sounds; some people acquire L2 sounds almost
immediately while others learn slower and still others do not make any progress. Advances in
explicit pronunciation instruction seem to promote learning beyond what is naturalistically
possible for many learners and can reduce the impact of individual differences (Thomson,
2011; Thomson & Derwing, 2015).
Some hold that L2 production does not require well-developed perception (Hattori &
Iverson, 2010; Sakai, 2016; Sheldon & Strange, 1982). In my view, these rare counter-
examples to the natural trajectory are analogous to the development of production skills by
L1 learners with hearing impairments (e.g., Kosky & Boothroyd, 2003), where L1 production
must be taught. The fact that production can precede perception in some instructed contexts
is not evidence that L2 speech learning mechanisms are fundamentally different from those
374
used in L1 speech learning, only that explicit instruction can alter normal progression. These
issues will be discussed in greater detail later in the chapter.

To better process current research in L2 speech perception and production, it is critical to
understand how data are collected. One challenge in comparing mental representations as-
sociated with L2 speech perception and production is that measures of each domain are
typically indirect. Perception is measured via listening tasks, while production is measured
via measures of articulatory control. Comparing results using incommensurate measures is
bound to lead to an imperfect match.
Measuring Perception
Task Type
Behavioural tasks have long been the standard in L2 speech perception research. One
technique is the Forced Choice Identification (FCID) task (e.g., Carlet & de Souza, 2018;
Schmitz et al., 2018). In this task, L2 learners listen to target stimuli and indicate which
sound category they perceive, from a fixed set of possibilities. Responses are usually captured
via computer, using a mouse or button click. The number of possible responses depends on
how many target sounds are involved. For example, Bradlow et al. (1997) investigated ca-
tegorization of two sounds, English /l/-/r/, while Thomson (2011) investigated categorization
of ten English vowels. How response options are represented also varies. For example,
Thomson and Derwing (2016) used phonetic symbols, while Baker and Trofimovich (2006)
used keywords written in standard orthography containing the target sounds. To ameliorate
the potential activation of faulty learner representations associated with previous experience
with orthography, Thomson (2011) used images of ten distinct nautical flags, which learners
first learned to associate with target vowels.
Another technique often used to measure L2 speech perception is the discrimination task
(e.g., Baese-Berk, 2019). These tasks do not assess whether a listener can accurately perceive
a member of a particular category, but instead test whether they are able to tell two cate-
gories apart. In an AXB design, the A and B comparators represent two contrastive cate-
gories and the listener must indicate whether the target X prompt is most similar to the A or
B prompt. The related AX task asks whether the A and X are the same or different. Instead
of AXB and AX tasks, some researchers use oddity discrimination tasks. In these tasks,
listeners hear a sequence of three productions and then indicate which, if any, is different
from the others (e.g., Flege & MacKay, 2004). One advantage of using discrimination tasks
over FCID tasks is that they do not require listeners to learn symbols against which to
associate L2 sounds, and they avoid potentially negative effects of orthography.
While FCID and discrimination tasks are both measures of speech perception, Wayland’s
(2007) comparison of FCID and AXB data revealed that FCID scores do not predict dis-
crimination patterns well. On its face, a FCID may seem to be more in line with what listeners
do in the real world, which is to recognize rather than discriminate sounds they encounter.
While not widely utilized, another approach to measuring the perception of speech sounds
is phoneme monitoring (Hanulíková et al., 2012). This requires listeners to listen to words
and indicate whether a specific phoneme is present. Alternatively, listeners can be asked to
indicate whether words, presented either in isolation or in a sentential context, are pro-
nounced with the target phoneme.
375
Ron I. Thomson
Although behavioural tasks remain the standard for adult L2 speech perception research,
neurophysiological methods are also possible (Schmitz et al., 2018). For example, event-
related potentials (ERP) can assess changes in brain activity when a listener encounters a
deviant stimulus. In this technique, electrodes placed on the surface of a listener’s scalp
provide a direct measure of auditory discrimination ability. A listener is presented with re-
peated instances of a familiar vowel category interrupted at some point by a token re-
presenting a different vowel category. An ability to discriminate the deviant token, is
reflected by a sudden shift in neurological activity.
Stimulus Characteristics
In addition to task type, stimulus characteristics may influence results obtained from L2
speech experiments. For example, FCID and discrimination tasks have presented isolated
vowels (Schmitz et al., 2018), vowels in open CV syllables (Thomson, 2011), sounds in
nonsense words (Carlet & de Souza, 2018), and sounds in real words (Baker & Trofimovich,
2006). Each context has a different impact on how the target sounds are perceived.
Researchers who use real words have found evidence that lexical frequency and phonetic
context influence the perception of sound categories (Thomson & Derwing, 2016; Thomson
& Isaacs, 2009). Presentation of sounds in isolation, in syllables, or in nonsense words are a
purer test of phonetic knowledge, though such contexts are more remote from real-world
experiences.
Surrounding phonetic context can also distort perception. When Wayland (2007) ob-
tained different results from the same listeners using FCID versus discrimination tasks, she
suspected that the presence of the A and B comparators before and after the target X sti-
mulus affected the perception of the X stimulus, which cannot occur in a typical FCID task,
where target stimuli are presented in isolation from other surrounding stimuli. Wayland
(2007) later modified the FCID task such that the tokens to be identified were presented
within the same AXB frames used in the discrimination task. This led to far more com-
parable results across the two task types.
Another stimulus characteristic that may influence results of speech perception experi-
ments is the decision to use synthetic speech generated by a computer rather than natural
speech recordings (e.g., Borden et al., 1983; Schmitz et al., 2018). Synthetic speech is often
used in experiments where the researcher wants to precisely measure listener responses to
vowels and consonants along a continuum representing ambiguous to less ambiguous in-
stances of contrasting sounds. While this may afford greater control over variability, it is
unclear to what degree synthetic speech tokens reflect the natural variability that learners
experience in the real world.
The number of talkers (voices) used in speech perception tasks is also important. Results
based on responses to a single talker (e.g., Hanulíková et al., 2012) are unlikely to adequately
capture listeners’ perceptual ability in the real world, where perception differs depending on
who is talking. In contrast, the use of multiple talkers (e.g., Baker & Trofimovich, 2006;
Carlet & de Souza, 2018; Thomson, 2011) allows researchers to obtain mean perception
scores that average out listener responses to speech produced by individual talkers.
Finally, speech perception research varies in how many replays are allowed before a
perceptual response is expected from the listener. For example, Thomson (2011) allowed no
replays, while Baker and Trofimovich (2006) allowed multiple replays. In the latter case, the
researchers maintained that allowing multiple listens meant that the final response was not
based on a guess, but upon phonological processing. The use of replays does not reflect the
ephemeral nature of speech perception in the real world, so may not lead to valid results.
376
Measuring Production
Unlike perception tasks, which capture listeners’ auditory responses to target stimuli in a
single step, production tasks require two steps. First, there is the speaking task itself, during
which productions are elicited and recorded. The second step requires instrumental or
human evaluation of those recordings to measure accuracy.
Speaking Tasks
A wide variety of speaking tasks are used to elicit L2 speech production (see Nagle et al., this
volume). As most research is focused on evaluating the production of specific sound categories,
controlled production tasks are the norm. While these may not reflect L2 learners’ spontaneous
speech production, they offer a measure of a learner’s knowledge in optimal conditions. Read
speech is frequently used in L2 pronunciation assessment (Thomson & Derwing, 2015), but it is
not as common in laboratory research. Reference to orthography can have both negative and
positive effects on production. For example, a word might be mispronounced if the ortho-
graphic representation is opaque. Alternatively, knowledge of spelling can allow L2 learners to
apply explicit knowledge to the production of difficult sounds (Thomson & Isaacs, 2009).
In laboratory research, elicited repetition of speech using auditory prompts is common,
and comes in two basic forms, either immediate or delayed. Immediate repetition (e.g.,
Hanulíková et al., 2012; Kabakoff et al., 2020) might be better characterized as a measure of
short-term phonological working memory (WM). Immediate imitations of prompts may
reflect properties of the stimulus, rather than a speaker’s typical ability to produce the same
sounds (Shockley et al., 2004). While WM is an important mechanism in speech learning, and
influences long-term phonological representations, it may not be indicative of the present
state of an L2 learner’s phonological system. To overcome the limitations of immediate
repetition, many L2 researchers use delayed and often interrupted repetition tasks. Auditory
prompts are embedded in a carrier phrase (e.g., “The next word is ____”) and learners re-
spond by producing the target word in a different carrier phrase (e.g., “Now I say____”) (see
Munro & Derwing, 2008; Thomson, 2011). Flege et al. (2003) added an additional layer of
complexity by presenting a target word in the middle of a three-word sequence, which was
then embedded in a carrier phrase. Embedding target words in carrier phrases interrupts the
listener’s ability to store a prompt’s phonetic properties in WM, and is believed to activate
their long-term phonological representation.
Whether repetitions of auditory prompts are immediate or delayed, they do not ne-
cessarily indicate production ability in isolation, since they are mediated by potentially in-
accurate perception of the prompts. In both cases, accurate production requires accurate
perception, but inaccurate production does not necessarily entail inaccurate perception, since
articulatory control is also required in production.
Hybrid approaches to speech elicitation, where speakers produce speech after seeing
written targets as well as hearing auditory models, have also been used (e.g., Bradlow et al.,
1997; Flege et al., 1999; Hanulíková et al., 2012). The combination of the two has been
shown to lead to more accurate productions than either on its own (Thomson & Isaacs,
2009), suggesting that learners are able to apply explicit knowledge during reading, while also
being alerted to any mismatch between the written form and the target pronunciation via the
auditory model. As such, this technique may not reflect the procedural knowledge that is
typically activated in communicative language use.
Though less frequently used, picture-naming is a more ecologically valid L2 speech elici-
tation task, because it can isolate production from perception, and resulting productions are
377
Ron I. Thomson
likely to be closer to spontaneous speech. Picture-naming is not without its own limitations,
however. For example, researchers can use pictures of minimal pairs such as “lock” versus
“rock” to elicit /l/ and /r/ productions, but are unable to as transparently do so across all
phonetic contexts, for example, “laughed” versus “raft.” The latter are more difficult to re-
present using images, and “raft” may be an unknown word for many L2 speakers. When the
target includes a larger number of contrasting sound categories (e.g., ten vowels) pictures
representing every vowel in the same phonetic context are not available (Schmitz et al., 2018;
Thomson & Derwing, 2016). Furthermore, being constrained to words with images, re-
searchers may need to include words that are not well-known to learners (e.g., Baker &
Trofimovich, 2006; Schmitz et al., 2018). Such research has attempted to overcome this ob-
stacle by including a familiarization phase to ensure learners actually know the target words
and how they are pronounced.
Stimulus Characteristics
Stimulus characteristics also determine which task type should be used to elicit L2 production.
In L2 English, for example, isolated vowels or nonsense words cannot be elicited using reading
tasks, unless the subjects are familiar with a phonetic alphabet. Picture-naming is not possible
unless participants are first taught to associate pictures with nonsense words. This suggests that
studies examining productions at the phonetic level almost always use elicited repetition tasks.
Evaluation of Production
Once recordings of L2 speech are obtained, they must be evaluated. This is done by human
judges or through acoustic analysis of the speech recordings.
When the target of analysis is individual phonemes, listeners are presented with randomized
recordings of L2 speech tokens using a FCID task (e.g., Thomson, 2011; Thomson & Derwing,
2016) to indicate which category they perceived. If the goal is to measure change in L2 speech
production over time, paired-comparison tasks are sometimes used. Judges listen to pairs of pre-/
post-test recordings, randomized within pairs, and report which is a better production of the
target (Bradlow et al., 1997). To account for variation across listeners’ identification of L2
speech, it is common to average scores across multiple judges. In other studies, expertly trained
phoneticians transcribe recordings, which are then validated by additional experts (McAndrews
& Thomson, 2017). To measure within-category differences, scalar ratings of “goodness-of-fit”
to a native speaker model are sometimes employed (Flege et al., 2003; Hanulíková et al., 2012).
Acoustic Analysis
An alternative to human judgements for assessing L2 speech production is measures of
acoustic features known to be correlated with the perception of specific sound categories. For
example, frequency and durational components of L2 vowels and consonants can be com-
pared to the same information extracted from recordings of native speakers producing the
same sounds, or to recordings of sounds produced in the learners’ L1, to determine whether
L2 productions reflect the target language or L1 transfer (Thomson et al., 2009).
Debate Over Using Human Judges Versus Acoustic Measures?

Evaluation of L2 speech production by human listeners is the gold standard when the goal is
to measure intelligibility to listeners. If the goal is to compare perception and production,
378
however, it can be rather imprecise, because listeners bring their own perceptual distortions
to the task. Just as L2 learners fail to perceive and accurately classify target sounds due to the
influence of their L1 categories, native listeners may be unable to detect differences in L2-
accented productions (Flege et al., 1997).

While many studies have examined L2 speech perception and production independently, the
focus of this part is on research that examines both.
L2 Speech Perception and Production at a Fixed Point in Time

Given that speech production lags speech perception in L1 acquisition, it is unrealistic to
expect that the two skills should develop simultaneously in L2 acquisition. In most cases,
researchers who compare L2 speech perception and production at a fixed time have found
evidence of a relationship in which perception scores are higher than production scores for
the same L2 sounds. The two skills should only be expected to reach maximal alignment
when learning has reached an end-state or plateaus in both skills.
Flege et al. (1997) used Mandarin L2 English learners’ AXB discrimination scores to
predict immediate repetition scores by the same learners for twelve American English vowels.
L2 productions were judged by NSs using an FCID task. Flege et al. found small but sig-
nificant correlations between individual perception and production scores. Using the same
methodology, Flege et al. (1999) conducted a study with Italian L2 English learners, and
again found significant correlations between perception and production. These studies
highlight the potential consequence of methodological decisions. As noted earlier, AXB
discrimination tasks and FCID tasks do not yield the same results. Yet, discrimination scores
of L2 learners were compared to identification scores by native speaker judges. Further, the
production task utilized an immediate imitation task, which is not a valid measure of long-
term phonological representations. The fact that significant correlations were found, despite
these limitations, provides convincing evidence of a relationship between perception and
production. Had comparisons using more analogous measures been used, a stronger re-
lationship may well have been uncovered.
Thomson (2008) compared Mandarin L2 learners’ perception and production of ten
Canadian English vowels. He used an FCID task to measure perception and a delayed re-
petition task to elicit productions. Productions were then evaluated acoustically and mapped
to the closest target category using a probabilistic model. Instead of correlating perception
and production scores, Thomson asked whether the group’s mean perception and production
scores for each of the ten vowels were significantly different. For eight of ten vowels, no
significant difference was found. For another vowel, production significantly lagged per-
ception, while in another case perception lagged production. The latter exception could be
explained by differences in how cues are used in perception and production. Learners might
have been inappropriately relying on duration as a primary cue in the FCID task, while
shortening or lengthening a vowel would have little impact on its identification in
production.
Schmitz et al. (2018) examined the relationship between perception and production of
Spanish L2 Catalan vowels. Perception data were obtained using an FCID task, while a
picture-naming task was used to elicit L2 Catalan productions. Productions were later
judged by NSs for degree of accent. Using factor analysis, Schmitz et al. found a weak
relationship between L2 speech perception and degree of accent. The researchers’ decision to
379
Ron I. Thomson
judge degree of accent in the L2 productions is unusual, because it is not analogous to the
FCID task used by learners. Had they used judgements of intelligibility, instead, the re-
lationship may have been stronger.
Some rare counter-examples provide evidence that L2 production can precede L2 per-
ception. Sheldon and Strange (1982) examined the pronunciation of the /l/-/r/ contrast by
advanced Japanese L2 English learners living in the United States and found that some were
able to accurately produce the contrast, despite being unable to discriminate the difference.
Notably, not only were they unable to recognize the difference in productions of native
speakers, but also they failed to discriminate it in recordings of their own accurate pro-
ductions. Bradlow et al. (1997) and Borden et al. (1983) reported similar patterns for /l/-/r/
perception and production by Japanese and Korean L2 English learners, respectively. While
such results are interesting, they are not surprising. The production data in these studies were
elicited using a reading task, suggesting that participants relied on explicit knowledge about
production of words spelled with /r/ and /l/. As such, these results cannot reasonably be taken
to disprove that perception precedes production in normal circumstances.
L2 Speech Perception Training and Its Impact on L2 Production

A more convincing approach to measuring the relationship between L2 speech perception
and production is to determine how training in perception influences production, and vice
versa. Since improvement in L2 speech production does not happen quickly without inter-
vention (e.g., Munro & Derwing, 2008), improvement coincident with perceptual training
can be taken as evidence that changes in perception trigger improvement in production. A
meta-analysis by Sakai and Moorman (2018) concluded that perceptual training leads to
small changes in production. The studies they evaluated comprised a wide range of training
techniques, learning contexts and assessment methodologies, which almost certainly con-
tributed to their small effect size. Individual studies vary in the degree to which perceptual
training is found to affect production.
Thomson (2011) utilized High Variability Pronunciation Training (HVPT) to teach
Mandarin L2 English learners to better perceive ten English vowels. In HVPT, learners use
a FCID task to identify training stimuli produced by multiple talkers of the target lan-
guage, in multiple phonetic contexts, and receive token-by-token feedback on the accuracy
of their choices. In Thomson (2011), target vowels were trained in /p/ + vowel or /b/ +
vowel sequences; subsequently, delayed repetitions were evaluated by NS judges. After
eight 15- to 20-minute training sessions spread over 3 weeks, learners significantly im-
proved their productions. Changes in vowel production generalized to productions in a
new phonetic context, but not to the same extent, and failed to generalize to a third
phonetic context. While a lack of generalization to new phonetic contexts in HVPT is
unfortunate from a training perspective (see Thomson, 2018), this provides compelling
evidence that larger changes in production in trained contexts are directly attributable to
changes in perception. Using the same technique, Thomson and Derwing (2016) trained
learners from a diverse range of L1 backgrounds to perceive ten English vowels in a variety
of consonant + vowel syllables, followed by a few sessions focused on vowels in real words.
They found significant improvement in the learners’ pronunciation of the real words.
Bradlow et al. (1997) used HVPT to train Japanese English L2 learners to perceive the
English /l/-/r/ contrast and also reported significant changes in production of target words.
HVPT studies consistently demonstrate that improvement in perception as a result of
training is greater than accompanying changes in production, consistent with a lag between
the development of perception and its impact on production.
380
Lee and Lyster (2017) demonstrated that the impact of perceptual training on production
is mediated by feedback type. Specifically, learners should not only be told when they mis-
classify a training prompt, but they must also re-hear the correct category, or hear an ex-
ample of the incorrect category chosen in error. Only with one of these two types of auditory
feedback did perceptual training in Lee and Lyster’s study transfer to production. Curiously,
providing replays of both correct and incorrect exemplars together interferes with transfer.
The perceptual training studies discussed thus far indicate that, given optimal learning
conditions, L2 production is sensitive to and benefits from changes in L2 speech perception.
While rare counter-examples have failed to find a transfer of perceptual training to pro-
duction, methodological issues are likely explanations. For example, Carlet and de Souza
(2018) trained Brazilians to perceive L2 English vowels presented in real words. As Thomson
and Derwing (2016) argue, when training stimuli are not minimal pairs, or nonsense words,
learners can apply explicit knowledge to decide what sounds are supposed to be in words,
despite not being able to accurately perceive them. In another study, Sakai (2016) found no
impact of perceptual training on the production of the English /i/-/ɪ/ contrast by Spanish L1
speakers. In her study, perceptual training utilized synthetically idealized exemplars of target
categories, which may have allowed learners to discriminate the contrast using acoustic cues
which would not prove useful in production.
A few studies have examined the impact of production training on perception. Liakin
et al. (2013) used automatic speech recognition (ASR) technology to provide feedback to
English learners on their productions of an L2 French vowel contrast. Training resulted in
improvement in production scores, as judged by NSs of French, but with no concomitant
improvement in perception. Herd et al. (2013) trained a group of Spanish L2 learners to
produce the Spanish intervocalic /d, ɾ, r/ contrasts, by having them compare waveforms and
spectrograms of their own productions to waveforms and spectrograms of NS productions of
the same sounds. Notably, learners did not hear recordings of the NS productions that they
were trying to match. Training resulted in improvements in both production and perception.
In both studies, production training was not entirely independent of perceptual input, since
the learners could hear their own productions.
Sakai (2016) isolated production from the influence of learners’ own perceptual input by
mapping spectral information from L1-Spanish productions of English /i/-/ɪ/ onto a two-
dimensional vowel space, analogous to the IPA vowel chart. Learners produced the target
vowels, while attempting to have them mapped to the appropriate part of the space. One
training group monitored their own productions, while another group wore noise cancelling
headphones, which made it impossible to hear their own productions. The group that
monitored their own productions significantly improved in their production of the L2 con-
trast, while the group who could not hear themselves did not. Both production groups
significantly improved in their ability to perceive the contrast. This study provides compel-
ling evidence for a bi-directional connection between production and perception
mechanisms.

In classroom practice, L2 pronunciation is predominantly discussed in terms of production,
with little attention to the role of perception. This reflects a pervasive belief by language
teachers that instruction should focus on articulation (Trofimovich & Foote, 2017). Yet, to
better understand how to tackle particular L2 pronunciation errors, teachers must know the
underlying cause of the errors. While there is clearly a need for articulatory training, deci-
sions about where to focus such training should be based on which L2 speech sounds may
381
Ron I. Thomson
not improve on their own. For example, improvement in the perception of L2 vowels seems
to trigger improvement in production in most, but not all cases (Thomson, 2008; 2011). In
contrast, many English learners have difficulty producing a Spanish trill, despite being able
to perceive it (Herd et al., 2013). In cases where perceptual learning is ineffective, working on
production may be a good strategy, and one which may benefit perception (Sakai, 2016).
For perceptual training to work effectively, training prompts should incorporate varia-
bility in terms of the number of talkers who provide training stimuli, as well as the number of
contexts in which target sounds are presented (Thomson, 2018). Training should also provide
auditory corrective feedback on errors, by either replaying prompts learners get wrong, or
playing an example of the incorrect choice (Lee & Lyster, 2017).
7 Future Directions
Given many unanswered questions regarding the nature of L2 speech perception and pro-
duction, the available avenues for future work are numerous. Highlighted here are some of
the most important concerns. First, researchers should identify the most valid research
methodologies. Without precise and comparable measures, it is difficult to draw accurate
conclusions about how the two skills are connected. Future studies using neurophysiological
measures, such as event-related potentials (ERP) or functional MRIs, can offer important
insights, in addition to confirming the validity of commonly used behavioural measures.
Second, the field would benefit from longitudinal research. In most L2 studies, measures of
perception and production are taken from a single time-point, or immediately before and
after a short intervention. Nagle (2018, 2020) provides a compelling example of what
longitudinal research can reveal. He found evidence of a time-lag between when L1 English
speakers learned to accurately perceive an L2 Spanish stop contrast and when that learning
emerged in production. He also found that the relationship varied by stop category. Third,
very little is known about how social factors mediate the development and relationship
between L2 speech perception and production. The fact that the speech of adult interlocutors
phonetically converges over the course of a conversation (Pardo, 2006; Lewandowski &
Jilka, 2019) suggests that basic imitative mechanisms remain intact over the lifespan. What is
unclear is how this ability might be used to improve long-term phonological representations.
The shadowing technique (Foote & McDonough, 2017), in which listeners imitate a recorded
stretch of speech as quickly as possible after hearing each word, may be tapping into this
ability. Fourth, researchers should examine which types of training are most effective. While
evidence suggests that improvement in perception induces improvement in production, and
vice versa, there is also evidence that targeting both skills simultaneously is deleterious to
learning (Baese-Berk, 2019; Herd et al., 2013). Finally, most research is limited to examining
the impact of training on individual segments in isolated words. The extent to which L2
speech training leads to improvements in production beyond the level of segments should be
researched.
Further Reading
Nagle, C. L. (2020). Revisiting perception–production relationships: Exploring a new approach to
investigate perception as a time‐varying predictor. Language Learning. doi: 10.1111/lang.12431.
An application of mixed-effects modeling to longitudinally examine the relationship between change in
L2 speech perception and its time-lagged impact on production.
Sakai, M., & Moorman, C. (2018). Can perception training improve the production of second language
phonemes? A meta-analytic review of 25 years of perception training research. Applied
382
A meta-analysis of eighteen studies in which researchers used perceptual training to effect change in L2
speech production.
Trofimovich, P., & Foote, J. A. (2017). Second language pronunciation learning: An overview of
theoretical perspectives. In The Routledge handbook of contemporary English pronunciation
(pp. 93–108). Philadelphia: Routledge.
A detailed overview of major theories of second language speech learning, including linguistic, psy-
chological and sociocultural perspectives.
References
Baese-Berk, M. M. (2019). Interactions between speech perception and production during learning of
novel phonemic categories. Attention, Perception, & Psychophysics, 81(4), 981–1005.
Baker Smemoe, W., & Haslam, N. (2013). The effect of language learning aptitude, strategy use and
learning context on L2 pronunciation learning. Applied Linguistics, 34(4), 435–456.
Baker, W., & Trofimovich, P. (2006). Perceptual paths to accurate production of L2 vowels: The role of
individual differences. International Review of Applied Linguistics in Language Teaching, 44(3),
231–250.
Borden, G., Gerber, A., & Milsark, G. (1983). Production and perception of the /r/-/l/ contrast in
Korean adults learning English. Language Learning, 33, 499–526.
Bradlow, A. R., Pisoni, D. B., Akahane-Yamada, R., & Tohkura, Y. (1997). Training Japanese lis-
teners to identify English /r/ and /l/: IV. Some effects of perceptual learning on speech production.
Journal of the Acoustical Society of America, 101, 2299–2310.
Carlet, A., & de Souza, H. K. D. (2018). Improving L2 pronunciation inside and outside the classroom:
Perception, production and autonomous learning of L2 vowels. Ilha do Desterro, 71(3), 99–123.
Fadiga, L., Craighero, L., Buccino, G., & Rizzolatti, G. (2002). Speech listening specifically modulates
the excitability of tongue muscles: A TMS study. European Journal of Neuroscience, 15, 399–402.
Flege, J. E. (1995). Second language speech learning: Theory, findings, problems. In W. Strange (Ed.),
Speech perception and linguistic experience: Issues in cross-language research (pp. 233–277).
Timonium, MD: York Press.
Flege, J. E., & MacKay, I. R. (2004). Perceiving vowels in a second language. Studies in Second
Flege, J. E., Bohn, O.-S., & Jang, S. (1997). Effects of experience on non-native speakers’ production
and perception of English vowels. Journal of Phonetics, 25, 437–470.
Flege, J. E., MacKay, I. R. A., & Meador, D. (1999). Native Italian speakers’ perception and pro-
duction of English vowels. Journal of the Acoustical Society of America, 106, 2973–2987.
Flege, J. E., Munro, M. J., & MacKay, I. R. (1995). Factors affecting strength of perceived foreign
accent in a second language. The Journal of the Acoustical Society of America, 97(5), 3125–3134.
Flege, J. E., Schirru, C., & MacKay, I. R. (2003). Interaction between the native and second language
phonetic subsystems. Speech Communication, 40(4), 467–491.
Foote, J. A., & McDonough, K. (2017). Using shadowing with mobile technology to improve L2
pronunciation. Journal of Second Language Pronunciation, 3(1), 34–56.
Gass, S. (1984). Development of speech perception and speech production abilities in adult second
language learners. Applied Psycholinguistics, 5, 51–74.
Goldinger, S. D., & Azuma, T. (2004). Episodic memory reflected in printed word naming.
Psychonomic Bulletin & Review, 11, 716–722.
Grenon, I., Benner, A., & Esling, J. H. (2007). Language-specific phonetic production patterns in the
first year of life. Proceedings of the 16th International Congress of Phonetic Sciences, 3, 1561–1564.
Hanulíková, A., Dediu, D., Fang, Z., Bašnaková, J., & Huettig, F. (2012). Individual differences in the
acquisition of a complex L2 phonology: A training study. Language Learning, 62, 79–109.
Hattori, K., & Iverson, P. (2010). Examination of the relationship between L2 perception and pro-
duction: An investigation of English /r/-/l/ perception and production by adult Japanese speakers.
Paper presented at the Interspeech Workshop on Second Language Studies: Acquisition, Learning,
Education and Technology, Waseda University.
Herd, W., Jongman, A., & Sereno, J. (2013). Perceptual and production training of intervocalic/d, ɾ,
r/in American English learners of Spanish. The Journal of the Acoustical Society of America, 133(6),
4247–4255.
383
Ron I. Thomson
Kabakoff, H., Go, G., & Levi, S. V. (2020). Training a non-native vowel contrast with a distributional
learning paradigm results in improved perception and production. Journal of Phonetics, 78. doi: 10.1
016/j.wocn.2019.100940.
Kingston, J., & Diehl, R. L. (1994). Phonetic knowledge. Language, 70(3), 419–454.
Kosky, C., & Boothroyd, A. (2003). Perception and production of sibilants by children with hearing
loss: A training study. The Volta Review, 103(2), 71–98.
Kuhl, P. K. (2004). Early language acquisition: Cracking the speech code. Nature Reviews Neuroscience,
5(11), 831–843.
Lee, A. H., & Lyster, R. (2017). Can corrective feedback on second language speech perception errors
affect production accuracy? Applied Psycholinguistics, 38(2), 371–393.
Lee, G. Y., & Kisilevsky, B. S. (2014). Fetuses respond to father’s voice but prefer mother’s voice after
birth. Developmental Psychobiology, 56(1), 1–11.
Levelt, W. J. (1999). Models of word production. Trends in Cognitive Sciences, 3(6), 223–232.
Lewandowski, N., & Jilka, M. (2019). Phonetic convergence, language talent, personality and attention.
Frontiers in Communication, 4. doi: 10.3389/fcomm.2019.00018.
Liakin, D., Cardoso, W., & Liakina, N. ( 2013). Mobile speech recognition software: A tool for
teaching second language pronunciation. OLBI Journal, 5.
Liberman, A. M., & Mattingly, I. G. (1985). The motor theory of speech perception revised. Cognition,
21, 1–36.
Lichtheim, L. (1885). On aphasia. Brain, 7, 433–484.
McAndrews, M. M., & Thomson, R. I. (2017). Establishing an empirical basis for priorities in pro-
nunciation teaching. Journal of Second Language Pronunciation, 3(2), 267–287.
McClelland J., & Elman J. (1986). The TRACE Model of Speech Perception. Cognitive Psychology,
18, 1–86
Munro, M. J., & Derwing, T. M. (2008). Segmental acquisition in adult ESL learners: A longitudinal
study of vowel production. Language Learning, 58, 479–502.
Nagle, C. L. (2018). Examining the temporal structure of the perception–production link in second
language acquisition: A longitudinal study. Language Learning, 68, 234–270.
Nagle, C. L. (2020). Revisiting perception–production relationships: Exploring a new approach to
investigate perception as a time‐varying predictor. Language Learning. doi: 10.1111/lang.12431.
Pardo, J. S. (2006). On phonetic convergence during conversational interaction. Journal of the
Acoustical Society of America, 119, 2382–2393. doi: 10.1121/1.2178720
Saffran, J. R., Johnson, E. K., Aslin, R. N., & Newport, E. L. (1999). Statistical learning of tone
sequences by human infants and adults. Cognition, 70(1), 27–52.
Saito, K., & Plonsky, L. (2019). Effects of second language pronunciation teaching revisited: A pro-
posed measurement framework and meta‐analysis. Language Learning, 69(3), 652–708.
Sakai, M. (2016). (Dis)Connecting perception and production: Training native speakers of Spanish on the
English /i/-/I/ distinction (Unpublished doctoral dissertation). Georgetown University, Washington, DC.
Sakai, M., & Moorman, C. (2018). Can perception training improve the production of second language
phonemes? A meta-analytic review of 25 years of perception training research. Applied
Schmitz, J., Díaz, B., Fernandez Rubio, K., & Sebastian-Galles, N. (2018). Exploring the relationship
between speech perception and production across phonological processes, language familiarity, and
sensory modalities. Language, Cognition and Neuroscience, 33(5), 527–546.
Scovel, T. (1988). A time to speak. A psycholinguistic inquiry into the critical period for human speech.
Rowley, MA: Newbury House.
Sheldon, A., & Strange, W. (1982). The acquisition of /r/ and /l/ by Japanese learners of English:
Evidence that speech production can precede speech perception. Applied Psycholinguistics, 3,
243–261.
Shockley, K., Sabadini, L., & Fowler, C. A. (2004). Imitation in shadowing words. Perception &
Psychophysics, 66(3), 422–429.
Thomson, R. I. (2008). L2 English vowel learning by Mandarin speakers: Does perception precede
production? Canadian Acoustics, 36(3), 134–135.
perception improves pronunciation. CALICO Journal, 28(3), 744–765.
Thomson, R. I. (2018). High Variability [Pronunciation] Training (HVPT): A proven technique about
which every language teacher and learner ought to know. Journal of Second Language
384
Thomson, R. I., & Derwing, T. M. (2015). The effectiveness of L2 pronunciation instruction: A nar-
rative review. Applied Linguistics, 36(3), 326–344.
Thomson, R. I., & Derwing, T. M. (2016). Is phonemic training using nonsense or real words more
effective? In J. Levis, H. Le., I., Lucic, E. Simpson, & S. Vo. (Eds.), Proceedings of the 7th
Pronunciation in Second Language Learning and Teaching conference (pp. 88–97). Ames, IA: Iowa
State University.
Thomson, R. I., & Isaacs, T. (2009). Within-category variation in l2 English vowel learning. Canadian
Acoustics, 37(3), 138–139.
Thomson, R. I., Nearey, T. M., & Derwing, T. M. (2009). A modified statistical pattern recognition
approach to measuring the crosslinguistic similarity of Mandarin and English vowels. The Journal of
the Acoustical Society of America, 126(3), 1447–1460.
Trofimovich, P., & Foote, J. A. (2017). Second language pronunciation learning: An overview of
theoretical perspectives. In The Routledge handbook of contemporary English pronunciation
(pp. 93–108). Philadelphia: Routledge.
Wayland, R. P. (2007). The relationship between identification and discrimination in cross-language
perception: The case of Korean and Thai. In O.-S. Bohn & M. J. Munro (Eds.), Second-language
speech learning: The role of language experience in speech perception and production: A festschrift in
honour of James E. Flege (pp. 201–218). Amsterdam: John Benjamins.
Werker, J. F., & Tees, R. C. (1984). Cross-language speech perception: Evidence for perceptual re-
organization during the first year of life. Infant behavior and development, 7(1), 49–63.
385
27
THE RELATIONSHIP BETWEEN
GESTURES AND SPEAKING IN L2
LEARNING
Marianne Gullberg
Speaking is a multimodal act involving many articulators – not only the mouth, but also the
hands, arms, heads, eyebrows, etc. In other words, when we speak, we also gesture. Gestures,
defined as non-practical actions and visible bodily movements related to ongoing talk and
recognized as communicatively relevant by onlookers, are an integral part of the speech
production process. They are not an added communicative frill, but a fundamental aspect of
speaking. There is substantial evidence for the view that gestures are systematically and
closely linked to language in speech production (and in comprehension), the modalities
forming an integrated mode of expression that is subject to cross-linguistic, cognitive, social,
and cultural variation (Bavelas, 1994; Clark, 1996; Holler & Levinson, 2019; Kendon, 2004;
McNeill, 2017; Özyürek, 2017). The term multimodal is used throughout to refer to the use
of speech and language-related bodily visual behaviour, such as manual gestures and head
movements. “Multimodal” includes the term bimodal language use (speech + gesture) also
found in the literature.
The tight link between speaking and gesturing is seen in many ways. Gestures are pre-
dominantly a speaker phenomenon. More importantly, speech and gestures express se-
mantically related, discursively and prosodically highlighted meaning at the same time with
millisecond precision (Kendon, 1980; Levy & McNeill, 1992; Loehr, 2007; McNeill, 1992).
This fine-grained coordination also means that gestures reflect cross-linguistic variation in
which semantic elements are expressed in speech, and how they are morphosyntactically
organized (Kita, 2009 for an overview). The coordination is neurologically based: speech and
gestures engage similar brain regions and motor control systems in speech production and
comprehension (Gentilucci & Volta, 2008; Oi, Saito, Li, & Zhao, 2013; Özyürek, 2014).
Linguistically, gestures provide necessary referential content to deictic expressions (e.g., the
key is there; Fricke, 2014), and gestures can function as independent speech acts (e.g.,
pointing to a door as an imperative Get out). Finally, speech and gestures develop in parallel
in child language (Colletta et al., 2015; Gullberg et al., 2008; Iverson & Goldin-Meadow,
2005), and break down in parallel in stuttering (Mayberry & Jaques, 2000), disfluency
(Graziano & Gullberg, 2018; Seyfeddinipur, 2006), and aphasia (Rose, 2006). The reasons
for why the speech–gesture link exists are under debate (see Church et al., 2017), but the link
itself is not. These empirical facts strongly suggest that gestures are an integral part of
386 DOI: 10.4324/9781003022497-33

Gestures and Speaking in L2 Learning
speaking. They therefore naturally become relevant to the study of speaking in a first (L1) or
second language (L2). The term second language acquisition (SLA) (L2 acquisition) will be
used throughout for both second and foreign language contexts, both instructed learning and
naturalistic acquisition. Moreover, the term “L2 learner” will refer to participants sometimes
called “L2 learners/users” by linguists, sometimes “bilinguals” by psychologists.
I will focus on gestures as defined earlier, leaving aside non-verbal behaviours such as
posture shifts and proxemics. The remaining class of movements, gestures, can be structu-
rally characterized in terms of articulators (hands, head, eyebrows, etc.), place of articula-
tion, and movement patterns with internal phase structure – a sort of “phonetics of gesture”
(Kendon, 1980). Gesture analyses often focus on the core phase of the movement, the stroke,
which is the most meaningful part of the movement. Other phases include the preparation
phase, the retraction phase when hands return to a resting position, and holds, when gestures
are momentarily kept immobile in space (Kendon, 1980). Gestures are also often classified
functionally or semiotically (see, Kendon, 2004 for an overview of classification systems).
For example, representational gestures convey meaning by iconically representing properties
of concrete or abstract objects or actions (iconic and metaphoric gestures) or by spatial
contiguity to an intended entity (deictic and indexical gestures). Rhythmic gestures (beats)
mark scansion; pragmatic gestures express non-referential content such as stance or com-
ments on what is said; and interactive gestures refer to some aspect of conversation itself.
Gestures also show different degrees of conventionalization, ranging from fully lexicalized,
conventional gestures (emblems, quotable gestures, e.g., thumbs-up), which are language-
and culture-specific form-meaning pairs that function like words or idiomatic expressions. In
contrast, non-conventional gestures (gesticulation, co-speech, or speech-associated gestures)
lack fixed form or meaning but accompany speech on the fly to convey speech-related
meaning. Since co-speech gestures are the most closely aligned with speech semantically,
prosodically, and temporally, they will be the focus of this review.
Gestures are deeply multi-functional. They serve both addressee-directed (communicative)
and speaker-directed (cognitive) functions in speaking. Speakers produce and tailor gestures
for their addressees to convey, highlight, and disambiguate meaning and speech content, to
establish common ground, and to regulate turn-taking (Bavelas et al., 2008; Holler & Beattie,
2003; Streeck, 2009). But speakers also produce gestures for themselves to organize thoughts
and facilitate their own speech production (Kita et al., 2017 for an overview). Both aspects
are vital.
I will briefly review how insights on the multimodal nature of speaking affects research on
speaking in SLA.
The study of gesture has a long history (see Kendon, 2004, for an overview). However, the
advent of easily accessible film and video recordings in the 1970s enabled pioneering mul-
timodal work in interaction studies and anthropology (Efron, 1941/1972; Kendon, 1972,
2004), child language studies (Bates et al., 1977; Volterra et al., 2005), psychology (Goldin-
Meadow, 2003; McNeill, 1992, 2005), and psychiatry (Davis, 1985; Freedman, 1972). These
studies laid the foundation for the explosion of work seen in the past two decades. Gesture
studies is now a vibrant research field in its own right.
In SLA studies, interest in gestures has been slower to develop. Early on gestures were occa-
sionally discussed as culture-specific practices to acquire for cultural fluency in a target language
(e.g., Green, 1968; Pennycook, 1985; Von Raffler-Engel, 1980; Wylie, 1985), or as a pedagogical
tool for improving L2 comprehension in language classrooms (e.g., Kellerman, 1992). A few
387
Marianne Gullberg
studies also discussed whether bilinguals switch language in both speech and gesture (Efron, 1941/
1972; Lacroix & Rioux, 1978; Von Raffler-Engel, 1976). In the 1990s, interest grew in applying
gesture studies to theoretical issues in SLA. The theoretical, methodological, and empirical ad-
vances in L1 gesture studies provided new analytical tools and empirical facts about multimodal
language use to motivate such a shift (e.g., Duncan, 1996; Kendon, 1986, 1990; McNeill, 1992;
Müller, 1998). This early L2 work examined how gestures function in L2 interaction as commu-
nication strategies for lexical, grammatical, and pragmatic challenges (Gullberg, 1998); how native
speakers adjust speech and gestures to learners in multimodal foreigner talk (Adams, 1998); how
learners use private speech and gesture to internalize knowledge (McCafferty, 1998); and how
learners’ gestures may show traces of cross-linguistic influence (Stam, 1998). Since the early 2000s,
research on SLA and gestures has diversified considerably (Gullberg, 2006b, 2008; Gullberg & de
Bot, 2010; Gullberg & McCafferty, 2008; McCafferty & Stam, 2008; Stam, 2012; Stam &
Buescher, 2018 for overviews and collections of papers). It is not yet a separate subfield of study in
SLA, but the potential is obvious.

Core issues addressed in studies of multimodal L2 speaking partly echo key topics in “tra-
ditional” SLA, and partly topics in L1 gesture studies. L2 speech–gesture production is ty-
pically compared to (monolingual) native speaker production (and sometimes to other
learners or bilinguals) looking at developmental trajectories and “outcomes” with a focus on
effects of age, formal proficiency, cross-linguistic influence, individual differences, the
learning context, and on interaction and collaborative practices. A rather narrow range of
topics have hitherto been examined multimodally compared to SLA as a whole.
One core topic in multimodal SLA is gesture frequency in L2 production as an index of
proficiency (formal accuracy) and/or fluency. Observations to the effect that gesture rates are
often higher in L2 than in L1 production has led to a range of studies examining when L2
speakers gesture more than L1 speakers, and what functions gestures have in L2 production.
These studies connect to claims in the L1 gesture literature concerning why speakers gesture
at all, and whether gestures are mainly speaker- or address-directed (cf. Kendon, 1994). The
debate around L2 gesture rates therefore evolves around whether learners gesture to help
themselves or their addressees. Strikingly, this line of multimodal work is entirely divorced
from SLA research on L2 fluency (e.g., De Jong et al., 2015), linguistic complexity, accuracy,
and fluency (CAF; Housen et al., 2012), and effects of task complexity (Robinson, 2007).
A second critical issue is cross-linguistic influence. This work builds on the observations of
cross-linguistic differences in how L1-speakers gesture as a reflection of language-specific
selection of meaning for expression and its morphosyntactic organization (Kita, 2009 for an
overview). The L1 work is often cast in terms of Slobin’s notion of “thinking for speaking”
(e.g., Slobin, 1996), or the idea that the available linguistic categories of a language influence
what information speakers chose to verbalize and how they organize it morphosyntactically
and multimodally. As L2 speakers move from one language to another, they may produce
gesture patterns more typical of the L1 even as they speak the L2 in a kind of “manual
foreign accent.” This line of work assumes that such multimodal clashes may reveal learners’
underlying conceptual representations (thinking), and crucially, lingering influence from L1
representations.
A third, related question is what gestures reveal about linguistic phenomena common to
all learners at a given developmental stage regardless of the languages in contact, so-called
general learner phenomena. An example of such general learner behaviour is over-explicit
reference found in low proficiency learners across many language pairs (e.g., Williams, 1988).
388
A key issue is to understand which aspects of such general behaviour are driven by cognitive
or developmental mechanisms, and which are more firmly rooted in communicative con-
cerns. Different aspects of gesture production may shed light on both issues (e.g., gesture
frequency may speak to cognitive aspects, and gesture articulation in space relative to an
interlocutor may speak to communicative issues).
A fourth topic is concerned with what role learners’ speech–gesture ensembles play in
interaction (e.g., turn-taking), in collaborative practices for the establishment of meaning
and structure (e.g., jointly finding words with gestures), for understanding, problem re-
solution, and ultimately for L2 acquisition. Approaches here are largely interactionist,
conversational analytical (CA), or sociocultural, and apply qualitative micro-analyses.
Learners’ and their interlocutors’ gesture production is spontaneous, and the focus is on
sequences of unfolding behaviour.
A related issue is whether learners’ L2 acquisition of lexicon and grammar can be im-
proved by gesture production during explicit teaching. This line of work is experimental and
exclusively focused on instruction and explicit (non-spontaneous) gesture production. It
draws on L1 gesture research showing that gesture production affects memory more gen-
erally (Cook & Fenn, 2017 for an overview), possibly because gestures strengthen re-
presentations by evoking motor and visual imagery (Morett, 2018), or because gestures
engage sensorimotor brain networks that grow larger the more sensory modalities are linked
to a new element (Macedonia et al., 2019). The last two topics have important pedagogical
implications.

This part exemplifies current research on the five topics outlined earlier.
First, many studies examine the relationship between L2 fluency, formal proficiency, and
gesture frequency or gesture rates. The assumption is often that the lower the proficiency
and/or fluency, the higher the gesture rate. Indeed, many studies show that L2 speakers
produce more gestures overall than L1 speakers, and typically even than themselves when
speaking the L1 (Gullberg, 2012a; Nicoladis, 2007). Some studies suggest that this is because
L2 users use representational gestures depicting lexical content to compensate for and resolve
lexical problems and to promote lexical access through cross-modal priming from motor
representations to lexical representations (Nicoladis, 2007; Rauscher et al., 1996).
Other studies instead suggest that the most frequent L2 gestures have pragmatic functions,
indicating ongoing trouble and holding the floor, rather than lexical, representational
functions (Gregersen et al., 2009; Gullberg, 1998). Clearly, the connection between gesture
frequency, fluency, and/or proficiency is not straightforward. Both in L1 and L2 speech
gestures are overwhelmingly produced with fluent rather than with disfluent speech
(Graziano & Gullberg, 2018). Moreover, gesture rates are further modulated by task de-
mands (Aziz & Nicoladis, 2019; Lin, 2020), individual communicative style (Gullberg, 1998;
Nagpal et al., 2011), language anxiety (Gregersen, 2005), and possibly by the nature of the
languages in contact (So, 2010). No single study controls for all these factors, and issues of
linguistic complexity, cognitive capacities, location of disfluency, etc., are rarely considered.
Second, multimodal studies of cross-linguistic influence (CLI) investigate whether and when
L2 learners change their speech–gesture patterns towards a target language. As in traditional
SLA studies, L2 learners’ speech and gestures are compared to native speakers’ language-specific
speech–gesture patterns. Many studies have examined and found gestural influence from the L1
on the L2 in the expression of voluntary and caused motion (Gullberg, 2009; Stam, 2006),
prepositional expressions of time (Gu et al., 2017), and verbal aspect (Denisova et al., 2018),
389
Marianne Gullberg
sometimes even when L2 speech looks target-like. Interestingly, new studies also find effects of
L1 co-speech gestures on the SLA of sign language (Ortega & Morgan, 2015). Gestural CLI can
be reflected in the timing of gestures relative to spoken elements (Stam, 2006), gesture meaning
(Choi & Lantolf, 2008), gesture form (Casey et al., 2012; Gullberg, 2009), gesture frequency (So,
2010), or in the way information is distributed across speech and gesture and in how co-
expressive the modalities are (Brown & Gullberg, 2008). Overall, studies find that even as speech
shifts towards L2 patterns, gestures often reveal lingering influences from the L1, sometimes
persistently (Özçalışkan, 2016). But there is also evidence of gestural shifts and learning
(Gullberg, 2009; Lewis, 2012; Stam, 2015). On the whole, speech seems to shift more easily
towards the L2 pattern than gestures. Why this is the case remains unclear.
Another line of CLI work examines the reverse influence, from the L2 on the L1, or even
bidirectional influences even at modest levels of L2 proficiency. In keeping with psycho-
linguistic evidence that all known languages affect each other in an individual mind (Cook,
2003; Van Hell & Dijkstra, 2002), studies have found traces of L2 speech–gesture patterns in
the L1. For example, speakers with knowledge of an L2 speak and gesture significantly
differently in their L1 from monolingual peers and are, crucially, sometimes indistinguishable
from themselves when speaking the L1 and the L2, suggesting multimodal convergence
(Brown, 2015; Brown & Gullberg, 2008). Similar bidirectional influences have been found
from a signed L2 onto L1 co-speech gestures (Casey et al., 2012).
Third, multimodal studies looking at general learner phenomena are rare (beyond issues
of gesture rates and proficiency). One line of study has explored how early L2 learners or-
ganize information about entities to create coherent discourse (reference tracking) (Perdue,
2000), especially when pronominal systems and word order patterns are not yet mastered.
These studies show that learners tend to avoid pronouns, and instead use full lexical noun
phrases (NPs) to refer both to new and old entities (Hendriks, 2003; Williams, 1988). They
create over-explicit discourse. L2 gesture analyses reveal that early L2 speakers with different
L1s and L2 are also multimodally over-explicit, accompanying every mention of a referent,
new or old, with a gesture (Gullberg, 2006a; So et al., 2013; Yoshioka, 2008). Interestingly,
this pattern appears whether addressees can see the gestures or not (Gullberg, 2006a), sug-
gesting that gesture production is not only a disambiguation strategy but may also serve a
self-directed purpose, perhaps to reduce memory strain (“cognitive load”; Cook & Fenn,
2017), by externalizing the referents that must be kept in mind onto gesture. An outstanding
issue is whether these patterns hold also when pro-drop languages are involved (cf. So et al.,
2013; Yoshioka, 2008). The precise temporal alignment between gestures and spoken ele-
ments must be clarified to test this since in the case of zero anaphora, gestures must align
with other elements than NPs or pronouns.
Fourth, a growing number of studies explore the role of gestures in L2 learners’ spoken
interactions inside and outside of classrooms, examining how learners deploy gestures as
communication strategies (Gullberg, 1998), in repair sequences (Olsher, 2008), in joint co-
production with native speakers (Mori & Hayashi, 2006), or to internalize new knowledge in
private speech (Lee, 2008; McCafferty & Rosborough, 2014). Other studies examine how
learners and teachers use gestures in L2 classroom speech to support the learning of voca-
bulary, grammar, pronunciation, and even writing (Eskildsen & Wagner, 2015; Kim & Cho,
2017; Lazaraton, 2004; Matsumoto & Dobs, 2017; Smotrova, 2017). Gestural teacher talk is
also studied, showing, for example, that during vocabulary training language teachers
modulate their gestures depending on students’ L2 proficiency, with more, longer and bigger
representational gestures the lower students’ proficiency (Tellier & Stam, 2012). Other studies
examine the effects of gestural corrective feedback and re-casts (Nakatsukasa & Loewen,
2017 for an overview), with mixed effects on L2 learning, perhaps depending on the linguistic
390
domain. Overall, these studies reveal that gestures serve as a crucial communicative and
semiotic resource to learners and their interlocutors alike.
Finally, a flourishing subfield probes whether learners’ gesture production can measurably
improve their L2 acquisition (cf. Cook & Fenn, 2017). By now, many studies show that both
child and adult L2 learners who repeat modelled speech and gestures during vocabulary training
retain more words than learners who do not gesture, especially when gesture meanings match
speech content (Andrä et al., 2020; Kelly et al., 2009; Morett, 2014; Tellier, 2008). (Many studies
also examine the effects of simply observing gestures, but since that is perception and not
speaking, it is outside the scope of this review, see Macedonia, 2019, for a discussion.)
Similarly, gesture production during the training of new speech sounds improves L2
pronunciation (Baills et al., 2019; Li et al., 2020). In all cases, it is assumed that learning is
boosted by the double engagement of motor and auditory memory (Morett, 2018).
Interestingly, the effects of gesture training are mixed in the domain of L2 phonology where
gesture appears to benefit L2 production skills more (Li et al., 2020) than reception skills
(Hirata et al., 2014). Although all studies are pre-test/post-test designs, the tasks involved
vary greatly, meaning that the overall effects remain unclear. Further, other linguistic do-
mains need probing, the longevity of the effect needs clarification, and effects of non-
modelled, spontaneous gestures also need investigating.

Studies of how L2 users speak and gesture typically rely on audio and video recordings of (a)
spontaneous conversation or classroom interaction, or (b) elicitation or experimental tasks
task such as narrative retellings, video description tasks, referential communication tasks,
explanation tasks (of vocabulary of or practical problems), and recall tasks, sometimes in
pre-test/post-test designs. Participants are generally not told that gestures are the object of
study until the task or recording is completed to maintain spontaneous gesturing. Moreover,
to promote gesture production, settings are mostly interactive (cf. Bavelas et al., 2008) with
confederate or naive interlocutors, often with an information gap between participants and
their interlocutors. Stimulus materials vary but there is frequently an emphasis on spatial
information or action assumed to promote gesture production. Stimuli are typically removed
during speech production to tax participants’ memory and encourage gesture production.
Other tasks are often added, such as language background questionnaires, proficiency
measurements, rating tasks, stimulated recall protocols, interview data, and instruction
materials. And as in other SLA research, participants often perform tasks both in their L1
and in the L2. This creates baseline L1 data and useful individual gestural profiles given that
gestures, like other L2 data, is subject to considerable individual variation.
Speech–gesture data are often transcribed and annotated in software for video annotation
(e.g., ELAN, Wittenburg et al., 2006; CHAT, MacWhinney, 2000). Transcription, annota-
tion schemes, and units of analysis are guided by individual research questions. There is no
universal coding scheme for gestures, but most studies are inspired by Kendon’s or McNeill’s
approach (Kendon, 2004; McNeill, 1992), with possible adaptations to CA, for example.
Gesture annotations may focus on strokes or on bigger units between major resting posi-
tions; on different gestural functions such representational (iconic) gestures, or beats, etc.
These may then be located relative to different units in speech such as turns, utterances,
clauses, or speech elements exactly aligned with the gesture. Gesture annotations may also
target handedness, gestural form (handshape, etc.), location in gesture space, gesture dura-
tion, gestural meaning, timing relative to speech, order in a sequence, etc., depending on the
research question. In multimodal SLA studies, the most frequent analyses are gesture rate
391
Marianne Gullberg
(Nicoladis, 2007), gesture form and meaning (Choi & Lantolf, 2008), timing (Stam, 2006),
function (Gullberg, 1998), and degree of semantic overlap (co-expressivity) with speech
(Brown & Gullberg, 2008). Speech annotations of course target similar issues as in other SLA
research. It is vital to have interrater reliability measures of gesture annotations since gesture
coding is not standardized and often under-described.
Qualitative CA-based or sociocultural perspectives currently seem to dominate the study
of multimodal SLA, but quantitative, (semi-)experimental approaches are also frequent, with
intervention studies in particular on the rise.

The study of speaking and gesturing in L2 is such a young area of study that re-
commendations for interaction or classroom practice are premature. However, the existing
L2 findings and the wider L1 literature on speech–gesture production clearly suggests that it
is beneficial for L2 learners to gesture during L2 speaking in various ways. Gestures help L2
learners to express themselves and maintain spoken L2 output. They help to solve lexical,
grammatical and pragmatic challenges, and establish meaning since speech–gesture en-
sembles are processed as meaningful units by addressees. They boost interlocutors’ im-
pressions of learners in that gesturing seems to promote positive affect towards the speaker.
They draw attention to things in need of elucidation, etc. Instructors and interlocutors may
therefore want to encourage L2 learners to gesture, or, at least, not dissuade them from doing
so. It would also benefit instructors to consciously attend more to both their own and L2
learners’ gestural practices. Whether some instructors’ gestures help more or even hinder
learning is not yet possible to say, but is a matter for more research. Discussions of gestures
and cross-linguistic/cross-cultural differences in gestural repertoires would also be helpful to
raise awareness of the multimodal nature of language.
Recommendations for research practice are easier. First and obviously, many more studies
of L2 speaking should also examine L2 gestures. Ignoring L2 gestures is ignoring half of L2
learners’ language output. Second, studies that do deal with gestures generally need to provide
rich detail in the descriptions of data treatment and analyses to facilitate replicability. This is
vital in a domain as new as this, where no standardized transcription or coding schemes exist,
and where data must always be approached in terms of statistical trends or preferences, rather
than in absolute terms (see Gullberg, 2010 for methodological desiderata).
7 Future Directions
The study of L2 speaking and gesturing has come far in the past 20 years. However, much remains
to be done. Crucially, the study of multimodal SLA needs to widen the empirical base. We need to
go beyond lexical concerns and probe other linguistic domains of speaking, in new languages, and
other settings. That is, we need to know what native speakers of languages X and Y do gesturally
when using evidentials, complex demonstrative systems, or when they distinguish entities with
regard to definiteness and specificity, for example. The extension of multimodal SLA to new
domains and to new languages calls for more L1 baseline data than is currently available. A way to
achieve this is to systematically study participants’ production in both their L1 and L2.
We should also move beyond the compensatory view that treats L2 gesture production
mainly as a communicative support system (cf. the discussion of gesture frequency). Embodied
language use is everywhere, in L1 and L2 speakers alike (Holler & Levinson, 2019), and fluent
speech is more tightly paired with gesture production than disfluent speech, also in L2 pro-
duction (Graziano & Gullberg, 2018). There is thus good reason to consider gestures differently.
392
In general, we still know very little about whether, when, and how L2 learners come to
speak and gesture in language-specific ways. We do not know why gestures seem to be more
resistant to change than speech, whether gestures change through imitation (e.g., during
study abroad and immersion) or with shifts in the linguistic system, or some combination of
the two. As in all SLA, we need more longitudinal work both inside and outside of class-
rooms to address these issues. We know nothing at all about whether L2 learners come to
produce culture-specific emblems when speaking, for example using the circle gesture ac-
curately (joining the thumb and index finger to mean zero/bad, excellent, money, and bodily
orifice, respectively) depending on linguistic community and culture (Morris et al., 1979).
Since emblems function like idiomatic expressions and discourse markers, they are well
worth investigating relative to their spoken equivalents.
Most importantly, however, current work on gestures and L2 speaking is not very well
embedded in current SLA theory or empirical endeavours in SLA studies. Therefore, the
most pressing future direction for this line of work is to focus on what gesture analysis can
add to key SLA domains (cf. the table of contents of this volume). Obvious candidate areas
include work on the role of gestures for attention, noticing, implicit/explicit learning and
knowledge; task demands and cognitive load; individual differences; the entire field of in-
structed SLA; L2 processing in production. To this list one might add that we know nothing
about how uninstructed learners or learners with different degrees of literacy speak and
gesture. The entire domain of assessment of spoken L2 skills is also in need of a multimodal
approach. Although a few studies suggest that learners’ gestures affects assessments of their
spoken skills positively (Gullberg, 1998; Jenkins & Parra, 2003), this is a deeply under-
researched area with pedagogical implications.
In sum, gestures offer a rich and multidimensional view of L2 speaking. Multimodal
analyses of L2 speech and gesture can provide a fuller picture of both communicative and
cognitive aspects of L2 speaking. As such, gestures have considerable potential for ex-
panding the scope and depth of SLA research. The study of L2 gestures still needs to
integrate SLA concerns more fully, and work towards establishing gestures as a natural
element of any study of L2 speaking. It behoves us all to work to shift our theories and
models of SLA away from monomodal perspectives towards multimodal ones. It is time
for a gestural turn.
Further Reading
Two core texts on gesture research by the pioneers:
Kendon, A. (2004). Gesture. Visible action as utterance. Cambridge: Cambridge University Press.
McNeill, D. (2005). Gesture and thought. Chicago: University of Chicago Press.
A collection of papers on various domains of SLA where gesture analysis has fruitfully been applied:
McCafferty, S. G., & Stam, G. (Eds.). (2008). Gesture. Second language acquisition and classroom re-
search. New York: Routledge.
A brief overview of methodological issues in research on gestures in SLA:
Gullberg, M. (2012b). Gesture analysis in second language acquisition. In C. Chapelle (Ed.),
Encyclopedia of Applied Linguistics. Oxford: Wiley-Blackwell.
References
Adams, T. W. (1998). Gesture in foreigner talk. (PhD diss), University of Pennsylvania, Philadelphia.
Andrä, C., Mathias, B., Schwager, A., Macedonia, M., & von Kriegstein, K. (2020). Learning foreign lan-
guage vocabulary with gestures and pictures enhances vocabulary memory for several months post-
learning in eight-year-old school children. Educational Psychology Review, 32, 815–850. doi: 10.1007/s1064
8-020-09527-z
393
Marianne Gullberg
Aziz, J. R., & Nicoladis, E. (2019). “My French is rusty”: Proficiency and bilingual gesture use in a
majority English community. Bilingualism: Language and Cognition, 22(4), 826–835.
Baills, F., Suárez-González, N., González-Fuente, S., & Prieto, P. (2019). Observing and producing
pitch gestures facilitates the learning of Mandarin Chinese tones and words. Studies in Second
Bates, E., Benigni, L., Bretherton, I., Camaioni, L., & Volterra, V. (1977). From gesture to the first
word: On cognitive and social prerequisites. In M. Lewis & L. A. Rosenblum (Eds.), Interaction,
conversation, and the development of language (pp. 247–307). New York: Wiley.
Bavelas, J. B. (1994). Gestures as part of speech: Methodological implications. Research on Language
and Social Interaction, 27(3), 201–221.
Bavelas, J. B., Chovil, N., Lawrie, D. A., & Wade, A. (1992). Interactive gestures. Discourse Processes,
15(4), 469–489.
Bavelas, J. B., Gerwing, J., Sutton, C., & Prevost, D. (2008). Gesturing on the telephone: Independent
effects of dialogue and visibility. Journal of Memory and Language, 58(2), 495–520.
Brown, A. (2015). Universal development and L1–L2 convergence in bilingual construal of manner in
speech and gesture in Mandarin, Japanese, and English. The Modern Language Journal,
99(S1), 66–82.
Brown, A., & Gullberg, M. (2008). Bidirectional crosslinguistic influence in L1-L2 encoding of Manner
in speech and gesture: A study of Japanese speakers of English. Studies in Second Language
Acquisition, 30(2), 225–251.
Casey, S., Emmorey, K., & Larrabee, H. (2012). The effects of learning American Sign Language on co-
speech gesture. Bilingualism: Language and Cognition, 15(4), 677–686.
Choi, S., & Lantolf, J. P. (2008). Representation and embodiment of meaning in L2 communication.
Motion events in the speech and gesture of advanced L2 Korean and L2 English speakers. Studies in
Church, R. B., Alibali, M. W., & Kelly, S. D. (Eds.). (2017). Why gesture?: How the hands function in
speaking, thinking and communicating: Philadelphia, Amsterdam: John Benjamins.
Clark, H. H. (1996). Using language. Cambridge: Cambridge University Press.
Colletta, J.-M., Guidetti, M., Capirci, O., Cristilli, C., Demir, O. E., Kunene-Nicolas, R. N., & Levine,
S. (2015). Effects of age and language on co-speech gesture production: An investigation of French,
American, and Italian children’s narratives. Journal of Child Language, 42(1), 122–145.
Cook, S., & Fenn, K. M. (2017). The function of gesture in learning and memory. In R. Breckinridge
Church, M. W. Alibali, & S. D. Kelly (Eds.), Why gesture?: How the hands function in speaking,
thinking and communicating (pp. 129–153). Amsterdam: John Benjamins.
Cook, V. (2003). Introduction: The changing L1 in the L2 user’s mind. In V. Cook (Ed.), Effects of the
second language on the first (pp. 1–18). Clevedon: Multilingual Matters.
Davis, M. (1985). Nonverbal behavior research and psychotherapy. In G. Stricker & R. H. Keisner
(Eds.), From research to clinical practice (pp. 89–112). New York: Plenum Press.
De Jong, N. H., Groenhout, R., Schoonen, R. O. B., & Hulstijn, J. H. (2015). Second language fluency:
behavior. Applied Psycholinguistics, 36(2), 223–243.
Denisova, V. A., Cienki, A., & Iriskhanova, O. K. (2018). Boundary expression in verbs and gesture:
Differences between L1 and L2 speakers. In Computational Linguistics and Intellectual Technologies
(pp. 163–171). Moscow, May 30–June 2, 2018.
Duncan, S. D. (1996). Grammatical form and ‘thinking-for-speaking’ in Mandarin Chinese and English:
An analysis based on speech-accompanying gesture (PhD diss.), University of Chicago, Chicago.
Efron, D. (1941/1972). Gestures, race and culture. The Hague: Mouton.
Eskildsen, S. W., & Wagner, J. (2015). Embodied L2 construction learning. Language Learning, 65(2),
268–297.
Freedman, N. (1972). The analysis of movement behavior during the clinical interview. In A. W.
Siegman & B. Pope (Eds.), Studies in dyadic communication (pp. 153–175). New York: Pergamon.
Fricke, E. (2014). Deixis, gesture, and embodiment from a linguistic point of view. In C. Müller, A.
Cienki, E. Fricke, S. H. Ladewig, D. McNeill, & S. Tessendorf (Eds.), Body – Language –
Communication (pp. 1803–1823). Berlin, New York: Mouton de Gruyter.
Gentilucci, M., & Volta, R. D. (2008). Spoken language and arm gestures are controlled by the same
motor control system. The Quarterly Journal of Experimental Psychology, 61(6), 944 - 957.
Goldin-Meadow, S. (2003). Hearing gesture: How our hands help us think. Cambridge, MA: The
Belknap Press.
394
Graziano, M., & Gullberg, M. (2018). When speech stops, gesture stops: Evidence from developmental
and crosslinguistic comparisons. Frontiers in Psychology, 9(879). doi: 10.3389/fpsyg.2018.00879
Green, J. R. (1968). A gesture inventory for teaching Spanish. New York: Clinton Books.
Gregersen, T. S. (2005). Nonverbal cues: Clues to the detection of foreign language anxiety. Foreign
Language Annals, 38(3), 388–400.
Gregersen, T. S., Olivares-Cuhat, G., & Storm, J. (2009). An examination of L1 and L2 gesture use:
What role does proficiency play? The Modern Language Journal, 93(2), 195–208.
Gu, Y., Mol, L., Hoetjes, M., & Swerts, M. (2017). Conceptual and lexical effects on gestures: The case
of vertical spatial metaphors for time in Chinese. Language, Cognition and Neuroscience, 32(8),
1048–1063.
Gullberg, M. (1998). Gesture as a communication strategy in second language discourse. A study of
learners of French and Swedish. Lund: Lund University Press.
Gullberg, M. (2006a). Handling discourse: Gestures, reference tracking, and communication strategies
in early L2. Language Learning, 56(1), 155–196.
Gullberg, M. (2006b). Some reasons for studying gesture and second language acquisition (Hommage à
Adam Kendon). International Review of Applied Linguistics, 44(2), 103–124.
Gullberg, M. (2008). Gestures and second language acquisition. In P. Robinson & N. C. Ellis (Eds.),
Handbook of cognitive linguistics and second language acquisition (pp. 276–305). London:
Routledge.
Gullberg, M. (2009). Reconstructing verb meaning in a second language: How English speakers of L2
Dutch talk and gesture about placement. Annual Review of Cognitive Linguistics, 7, 222–245.
Gullberg, M. (2010). Methodological reflections on gesture analysis in SLA and bilingualism research.
Second Language Research, 26(1), 75–102.
Gullberg, M. (2012a). Bilingualism and gesture. In T. K. Bhatia & W. C. Ritchie (Eds.), The handbook
of bilingualism and multilingualism (2nd edn, pp. 417–437). Malden, MA: Wiley-Blackwell.
Gullberg, M. (2012b). Gesture analysis in second language acquisition. In C. Chapelle (Ed.),
Encyclopedia of Applied Linguistics. Oxford: Wiley-Blackwell.
Gullberg, M., & de Bot, K. (Eds.). (2010). Gestures in language development. Amsterdam: Benjamins.
Gullberg, M., de Bot, K., & Volterra, V. (2008). Gestures and some key issues in the study of language
development. Gesture, 8(2), 149–179.
Gullberg, M., & McCafferty, S. G. (2008). Introduction to Gesture and SLA: Toward an integrated
approach. Studies in Second Language Acquisition, 30(2), 133–146.
Hendriks, H. (2003). Using nouns for reference maintenance: A seeming contradiction in L2 discourse.
In A. G. Ramat (Ed.), Typology and second language acquisition (pp. 291–326). Berlin: Mouton.
Hirata, Y., Kelly, S. D., Huang, J., & Manansala, M. (2014). Effects of hand gestures on auditory
learning of second-language vowel length contrasts. Journal of Speech, Language, and Hearing
Research, 57, 2090–2101.
Holler, J., & Beattie, G. (2003). How iconic gestures and speech interact in the representation of
meaning: Are both aspects really integral to the process? Semiotica, 146(1), 81–116.
Holler, J., & Levinson, S. C. (2019). Multimodal language processing in human communication. Trends
in Cognitive Sciences, 23(8), 639–652.
Housen, A., Kuiken, F., & Vedder, I. (Eds.). (2012). Dimensions of L2 performance and proficiency:
Complexity, accuracy, and fluency in SLA. Amsterdam: Benajmins.
Iverson, J. M., & Goldin-Meadow, S. (2005). Gesture paves the way for language development.
Psychological Science, 16, 367–371.
Jenkins, S., & Parra, I. (2003). Multiple layers of meaning in an oral proficiency test: The com-
plementary roles of nonverbal, paralinguistic, and verbal behaviors in assessment decisions. Modern
Kellerman, S. (1992). ‘I see what you mean’: The role of kinesic behaviour in listening and implications
for foreign and second language learning. Applied Linguistics, 13(3), 239–257.
Kelly, S. D., McDevitt, T., & Esch, M. (2009). Brief training with co-speech gesture lends a hand to
word learning in a foreign language. Language and Cognitive Processes, 24(2), 313–334.
Kendon, A. (1972). Some relationships between body motion and speech: An analysis of an example. In
A. W. Siegman & B. Pope (Eds.), Studies in dyadic communication (pp. 177–210). New York:
Pergamon.
Kendon, A. (1980). Gesticulation and speech: Two aspects of the process of utterance. In M. R. Key
(Ed.), The relationship of verbal and nonverbal communication (pp. 207–227). The Hague: Mouton.
Kendon, A. (1986). Some reasons for studying gesture. Semiotica, 62(1/2), 3–28.
395
Marianne Gullberg
Kendon, A. (1994). Do gestures communicate?: A review. Research on Language and Social Interaction,
27(3), 175–200.
Kendon, A. (1990). Conducting interaction. Cambridge: Cambridge University Press.
Kendon, A. (2004). Gesture. Visible action as utterance. Cambridge: Cambridge University Press.
Kim, S., & Cho, S. (2017). How a tutor uses gesture for scaffolding: A case study on L2 tutee’s writing.
Discourse Processes, 54(2), 105–123.
Kita, S. (2009). Cross-cultural variation of speech-accompanying gesture: A review. Language and
Cognitive Processes, 24(2), 145 - 167.
Kita, S., Alibali, M. W., & Chu, M. (2017). How do gestures influence thinking and speaking? The
gesture-for-conceptualization hypothesis. Psychological Review, 124(3), 245–266.
Lacroix, J. M., & Rioux, Y. (1978). La communication non-verbale chez les bilingues. Canadian Journal
of Behavioral Science, 10(2), 130–140.
Lazaraton, A. (2004). Gesture and speech in the vocabulary explanations of one ESL teacher: A mi-
croanalytic inquiry. Language Learning, 54(1), 79–117.
Lee, J. (2008). Gesture and private speech in second language acquisition. Studies in Second Language
Acquisition, 30(2), 169–190.
Levy, E. T., & McNeill, D. (1992). Speech, gesture, and discourse. Discourse Processes, 15(3), 277–301.
Lewis, T. N. (2012). The effect of context on the L2 Thinking for Speaking development of path
gestures. L2 Journal, 4(2), 247–268.
Li, P., Baills, F., & Prieto, P. (2020). Observing and producing durational hand gestures facilitates the
pronunciation of novel vowel-length contrasts. Studies in Second Language Acquisition, 42(5),
1015–1039. doi: 10.1017/S0272263120000054
Lin, Y.-L. (2020). A helping hand for thinking and speaking: Effects of gesturing and task planning on
second language narrative discourse. System, 91, 102243.
Loehr, D. P. (2007). Aspects of rhythm in gesture and speech. Gesture, 7(2), 179–214.
Macedonia, M. (2019). Embodied learning: Why at school the mind needs the body. Frontiers in
Psychology, 10(2098). doi: 10.3389/fpsyg.2019.02098
Macedonia, M., Repetto, C., Ischebeck, A., & Mueller, K. (2019). Depth of encoding through observed
gestures in foreign language word learning. Frontiers in Psychology, 10(33). doi: 10.3389/fpsyg.201
9.00033
MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk (3rd edn). Mahwah, NJ:
Lawrence Erlbaum Associates.
Matsumoto, Y., & Dobs, A. M. (2017). Pedagogical gestures as interactional resources for teaching and
learning tense and aspect in the ESL grammar classroom. Language Learning, 67(1), 7–42.
Mayberry, R. I., & Jaques, J. (2000). Gesture production during stuttered speech: Insights into the
nature of gesture-speech integration. In D. McNeill (Ed.), Language and gesture (pp. 199–214).
McCafferty, S. G. (1998). Nonverbal expression and L2 private speech. Applied Linguistics,
19(1), 73–96.
McCafferty, S. G., & Rosborough, A. (2014). Gesture as a private form of communication during
lessons in an ESL-designated elementary classroom: A sociocultural perspective. TESOL Journal,
5(2), 225–246.
McCafferty, S. G., & Stam, G. (Eds.). (2008). Gesture. Second language acquisition and classroom re-
search. New York: Routledge.
McNeill, D. (1992). Hand and mind. What gestures reveal about thought. Chicago: University of Chicago
Press.
McNeill, D. (2005). Gesture and thought. Chicago: University of Chicago Press.
McNeill, D. (2017). Gesture-speech unity: What it is, where it came from. In R. Breckinridge Church,
M. W. Alibali, & S. D. Kelly (Eds.), Why gesture?: How the hands function in speaking, thinking and
communicating (pp. 77–101). Amsterdam: John Benjamins.
Morett, L. M. (2014). When hands speak louder than words: The role of gesture in the communication,
encoding, and recall of words in a novel second language. The Modern Language Journal, 98(3),
834–853.
Morett, L. M. (2018). In hand and in mind: Effects of gesture production and viewing on second
language word learning. Applied Psycholinguistics, 39, 355–381.
Mori, J., & Hayashi, M. (2006). The achievement of intersubjectivity through embodied completions: A
study of interactions between first and second language speakers. Applied Linguistics, 27(2),
195–219.
396
Morris, D., Collett, P., Marsh, P., & O’Shaughnessy, M. (1979). Gestures, their origins and distribution.
London: Cape.
Müller, C. (1998). Redebegleitende Gesten. Berlin: Berlin Verlag Arno Spitz GmbH.
Nagpal, J., Nicoladis, E., & Marentette, P. (2011). Predicting individual differences in L2 speakers’
gestures. International Journal of Bilingualism, 15(2), 205–214.
Nakatsukasa, K., & Loewen, S. (2017). Non-verbal feedback. In H. Nassaji & E. Kartchava (Eds.),
Corrective feedback in second language teaching and learning: Research, theory, applications, im-
plications (pp. 158–173). New York: Routledge.
Nicoladis, E. (2007). The effect of bilingualism on the use of manual gestures. Applied Psycholinguistics,
28(3), 441–454.
Oi, M., Saito, H., Li, Z., & Zhao, W. (2013). Co-speech gesture production in an animation–narration
task by bilinguals: A near-infrared spectroscopy study. Brain and Language, 125(1), 77–81.
Olsher, D. (2008). Gesturally-enhanced repeats in the repair turn: Communication strategy or cognitive
language-learning tool? In S. G. McCafferty & G. Stam (Eds.), Gesture. Second language acquisition
and classroom research (pp. 109–130). New York: Routledge.
Ortega, G., & Morgan, G. (2015). Phonological development in hearing learners of a sign language:
The influence of phonological parameters, sign complexity, and iconicity. Language Learning, 65(3),
660–688.
Özçalışkan, Ş. (2016). Do gestures follow speech in bilinguals’ description of motion? Bilingualism:
Özyürek, A. (2014). Hearing and seeing meaning in speech and gesture: Insights from brain and
behaviour. Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1651),
20130296.
Özyürek, A. (2017). Function and processing of gesture in the context of language. In R. B. Church, M.
W. Alibali, & S. D. Kelly (Eds.), Why gesture? How the hands function in speaking, thinking, and
communicating (pp. 39–58). Amsterdam: John Benjamins Publishing Company.
Pennycook, A. (1985). Actions speak louder than words: Paralanguage, communication and education.
TESOL Quarterly, 19(2), 259–282.
Perdue, C. (2000). Organising principles of learner varieties. Studies in Second Language Acquisition,
22(3), 299–305.
Rauscher, F. H., Krauss, R. M., & Chen, Y. (1996). Gesture, speech and lexical access: The role of
lexical movements in speech production. Psychological Science, 7(4), 226–231.
Robinson, P. (2007). Task complexity, theory of mind, and intentional reasoning: Effects on L2 speech
production, interaction, uptake and perceptions of task difficulty. International Review of Applied
Linguistics in Language Teaching, 45(3), 193–213.
Rose, M. L. (2006). The utility of arm and hand gesture in the treatment of aphasia. Advances in
Speech-Language Pathology, 8(2), 92–109.
Seyfeddinipur, M. (2006). Disfluency: Interrupting speech and gesture. (PhD diss), Radboud University,
Nijmegen.
Slobin, D. I. (1996). From “thought and language” to “thinking for speaking”. In J. J. Gumperz & S. C.
Levinson (Eds.), Rethinking linguistic relativity (pp. 70–96). Cambridge: Cambridge University
Press.
Smotrova, T. (2017). Making pronunciation visible: Gesture in teaching pronunciation. TESOL
Quarterly, 51(1), 59–89.
So, W. C. (2010). Cross-cultural transfer in gesture frequency in Chinese-English bilinguals. Language
and Cognitive Processes, 25(10), 1335–1353.
So, W. C., Kita, S., & Goldin-Meadow, S. (2013). When do speakers use gestures to specify who does
what to whom? The role of language proficiency and type of gestures in narratives. Journal of
Psycholinguistic Research, 42(6), 581–594.
Stam, G. (1998). Changes in patterns of thinking about motion with L2 acquisition. In S. Santi, I.
Guaïtella, C. Cavé, & G. Konopczynski (Eds.), Oralité et Gestualité (ORAGE ‘98) (pp. 615–619).
Paris: l’Harmattan.
Stam, G. (2006). Thinking for Speaking about motion: L1 and L2 speech and gesture. International
Review of Applied Linguistics, 44(2), 143–169.
Stam, G. (2012). Second language acquisition and gesture. In C. Chapelle (Ed.), Encyclopedia of
Applied Linguistics. Oxford: Wiley-Blackwell.
Stam, G. (2015). Changes in thinking for speaking: A longitudinal case study. The Modern Language
Journal, 99(S), 83–99.
397
Marianne Gullberg
Stam, G., & Buescher, K. (2018). Gesture research. In A. Phakiti, P. De Costa, L. Plonsky, & S.
Starfield (Eds.), Palgrave Handbook of applied linguistics research methodology (pp. 793–809).
London: Palgrave Macmillan.
Stam, G., & McCafferty, S. G. (2008). Gesture studies and second language acquisition: A review. In S.
G. McCafferty & G. Stam (Eds.), Gesture. Second language acquisition and classroom research
(pp. 3–24). New York: Routledge.
Streeck, J. (2009). Forward-gesturing. Discourse Processes, 46(2), 161–179.
Tellier, M. (2008). The effect of gestures on second language memorisation by young children. Gesture,
8(2), 219–235.
Tellier, M., & Stam, G. (2012). Stratégies verbales et gestuelles dans l’explication lexicale d’un verbe
d’action. In V. Rivière (Ed.), Spécificités et diversité des interactions didactiques (pp. 357–374). Paris:
Riveneuve éditions.
Van Hell, J. G., & Dijkstra, T. (2002). Foreign language knowledge can influence native language
performance in exclusively native contexts. Psychonomic Bulletin & Review, 9(4), 780–789.
Volterra, V., Caselli, M. C., Capirci, O., & Pizzuto, E. (2005). Gesture and the emergence and devel-
opment of language. In M. Tomasello & D. I. Slobin (Eds.), Beyond nature-nurture: Essays in honor
of Elizabeth Bates (pp. 3–40). Mahwah, NJ: Erlbaum.
Von Raffler-Engel, W. (1976). Linguistic and kinesic correlates in code switching. In W. C. McCormack
& S. A. Wurm (Eds.), Language and man: Anthropological issues (pp. 229–238). The Hague:
Mouton.
Von Raffler-Engel, W. (1980). Kinesics and paralinguistics: A neglected factor in second language
research and teaching. Canadian Modern Language Review, 36(2), 225–237.
Williams, J. (1988). Zero anaphora in second language acquisition. Studies in Second Language
Wittenburg, P., Brugman, H., Russel, A., Klassman, A., & Sloetjes, H. (2006). ELAN: A professional
framework for multimodality research. In Proceedings of the fifth international conference on
Language Resources and Evaluation (LREC 2006) (pp. 1556–1559). Genoa.
Wylie, L. (1985). Language learning and communication. The French Review, 57(6), 777–785.
Yoshioka, K. (2008). Gesture and information structure in first and second language. Gesture, 8(2),
236–255.
398
28
SPEECH-LANGUAGE
PATHOLOGISTS AND L2 SPEAKERS
Marie Nader
Speech-Language Pathologists (SLPs) are healthcare professionals primarily concerned with
the prevention, assessment, diagnosis, and treatment of a wide range of communication and
swallowing disorders in a variety of settings (i.e., hospitals, schools, and private clinics). By
communication disorder, I refer to “unexpectedly long-lasting, persistent, or recurrent diffi-
culties that interfere with normal, successful, ordinary communication” (Oller et al., 2010,
p. 5) whether developmental or acquired. They include speech sound disorders (SSDs) (i.e.,
impairment of the articulation of speech sounds, fluency and/or voice), and language dis-
orders [i.e., “impaired comprehension and/or use of language which may involve (1) the form
of language (phonology, morphology, syntax), (2) the content of language (semantics), and/
or (3) the function of language in communication (pragmatics)”] (American Speech-
Language-Hearing Association [ASHA], 1993, n.p.). With increasing diversity in the popu-
lation, professional colleges stress ethical delivery of service to all individuals regardless of
their cultural and linguistic backgrounds (ASHA, 2017; Speech-Language and Audiology
Canada [SAC], 2016). However, SLPs find it difficult to ensure equal quality of service to all
culturally and linguistically diverse individuals. Nevertheless, clinicians are increasingly in-
volved with and challenged by communication disorders of second language (L2) speakers,
that is, “bilingual speakers who have already made significant progress toward acquisition of
[L1] when they begin the acquisition of a second language” (Paradis et al., 2011, p. 6).
In addition to working with native speakers (NSs) and non-native speakers (NNSs) with
disorders, SLPs provide services to typically developing (TD) individuals, people who do not
present any underlying disorders to explain their observed behaviour(s). For instance, SLPs
may provide pronunciation instruction (PI) to L2 speakers, often labelled foreign accent
modification/management/reduction (FAM). Foreign accent is defined as “non-pathological
speech that differs in some noticeable respects from native speaker pronunciation norms”
(Munro & Derwing, 1995, p. 289). This encompasses segmental features (consonants and
vowels) and suprasegmental features (e.g., word stress, sentences stress, intonation, and
rhythm). Pronunciation is operationalized in L2 research through three partially distinct
dimensions (see Munro & Derwing, 1995), namely accentedness, that is, difference from a
local variety, intelligibility, that is, how understandable L2 speech is, and comprehensibility,
that is, the effort a listener expends understanding an utterance. For NNSs, accents result
DOI: 10.4324/9781003022497-34 399

Marie Nader
from the influence of an already-existing L1 system on the acquisition of L2 mental re-

presentations (Flege, 2016). Although foreign accents are expected in L2 speakers, some
speakers experience stereotyping, discrimination, and loss of employment opportunities
(Lippi-Green, 2012). In an attempt to control such outcomes, to enhance communicative
skills, or for personal or professional reasons, some L2 speakers seek to modify their accent.
ASHA (2017) includes FAM as an elective service while classifying foreign accents as non-
pathological linguistic differences. However, unlike other areas of specialization such as
swallowing or voice disorders, which have dedicated mandatory courses in SLP graduate
programmes, to date there are no mandatory L2-related courses in L2 pronunciation or
other L2 areas. Thus, SLPs’ qualifications to provide FAM are frequently challenged by
L2 leading experts (Derwing & Munro, 2015; Müller, Ball, & Guendouzi, 2000;
Thomson, 2014).
Here, I will examine the current state of speech-language pathology service delivery to L2
speakers. To avoid confusion, I will refer to SLPs working with individuals with disorders as
clinical SLPs, and those who work with TD speakers seeking PI as SLP-FAM providers.
First, I will highlight key factors impeding the provision of services to these populations. I
outline a historical perspective regarding the origins of challenges in two areas, (1) accent
modification and (2) assessment of disorders in L2 context. Next, I discuss core issues related
to practice, followed by a focus on recent contributions and research. Finally, I suggest
recommendations and future directions for the field.
Focus on Accent Modification

In North America, FAM, formerly accent correction (used in this part for historical re-
levance), dates back to the 19th century. It was initially provided by pioneering practi-
tioners, mainly teachers, physicians, and elocutionists known as “speech correctionists.”
Divided into two groups, speech teachers (with a teaching background) and speech spe-
cialists (with a medical background in the organic causes of speech defects), speech cor-
rectionists offered services to NSs and NNSs with speech disorders. They “dealt with the
various causes of speech defects such as stuttering, lisping, mumbling, foreign accent,
throaty voice, and many others” (Rapeer, 1916, p. 519, emphasis added). Thus, practi-
tioners also provided accent correction to TD L2 speakers misidentified as individuals
with speech disorders. With the increasing number of immigrants arriving in the United
States and the “Americanization movement” in the 1910s, “standard” American pro-
nunciation became a hallmark of citizenship, while foreign and regional accents were
regarded as defective (see Cavanaugh, 1996). Since clear “unaccented” speech was seen as
a necessary tool for intellectual and social advancement, accent correction was thought a
necessary “treatment” for non-native accents.
In the early 1920s, “speech correction swept over the country like a huge wave” (Hedrick,
1922). From a socioeconomic viewpoint, the speech correction industry was flourishing. For
instance, Rapeer (1916) stressed the relation of speech training to industrial and social ef-
ficiency. He noted, “a number of large business firms in this country have been so affected by
the faulty speech of their employees that they have organized and experimentally carried on
systematic training in speech” (p. 520). Businesses enlisted correctionists to assist workers in
developing a “better speech [for a] better business” (Sprague, 1925). The speech business
model paved its way into the educational system. Speech correction classes were initiated in
universities for students. Moreover, when compulsory school attendance laws were issued in
400
Speech-Language Pathologists
1917, many L2 speakers were included in speech classes by correctionists who had up to 250
students in their caseload.
A functional approach to speech production was central to the pre-profession. Accent
correction programmes were usually taken from treatment techniques designed for mono-
lingual native English children with speech disorders. Through articulatory placement,
modelling, and imitating NS speech and articulation patterns, practitioners such as Van
Riper (1954) proposed accent “treatment” aiming to develop nativelikeness, a dominant
principle in pronunciation teaching for decades. However, developing a native-like accent is
an unrealistic goal for most individuals because it is conditioned by age, amount and quality
of L2 exposure, amount of L1 use, and motivation. With the Civil Rights movement (1960s),
the need to distinguish disorders from differences was highlighted. In 1975, ASHA published
its first position statement acknowledging foreign accents and social dialects as differences,
not disorders. Yet, to date, SLPs remain influenced by a medical view of foreign accent
whereas language teachers embrace a pedagogical view to L2-pronunciation instruction
(Derwing et al., 2014; Thomson & Foote, 2019).
Focus on Differential Diagnosis

Misdiagnosing L2 speakers with disorders was common since the pioneering years, partly
due to a lack of understanding of L2 acquisition processes and the dominance of a functional
approach. As the functional-oriented approach to speech correction gave way to a more
rigorous psychometric-structuralist approach in the 1930s, a new “medical-model” practice
emerged. Standardized assessment tools were developed enabling practitioners to classify
individuals into diagnostic groups through the use of test norms. Consequently, various
diagnostic categories for speech disorders emerged. Publications of lists of native speech and
language developmental norms allowed comparisons of individuals’ performances to a TD
monolingual cohort group (Duchan, 2006). The medical model approach gave early SLPs the
perception of objectivity although misdiagnosing speakers was ongoing.
In the 1960s, professionals were urged to reduce misplacements of “culturally dis-
advantaged” individuals (including L2 speakers) within special education classes and clin-
icians’ caseloads. A differential diagnostic approach was adopted, that is, a process which
aims to validate the presence or absence of an underlying disorder in regard to given be-
haviour(s) observable in an individual. Further, since the enactment of Title VI of the Civil
Rights in 1964, assessing L2 speakers in the majority language as well as in their L1 was
strongly advocated for. Until the mid-1970s, SLPs remained influenced by the psychometric-
structuralist approach (Damico, 1993). Questioning the reliability and validity of psycho-
metric structural tools for diagnosing functional disorders, a comprehensive interactive
approach was advocated, refocusing attention on functional aspects of communication.
Nonetheless, to date, norm-referenced standardized assessment approaches are woven into
SLPs’ practice with L2 speakers.
Terminology Considerations
In speech-language pathology, L2 speakers are described using a framework of cultural and
linguistic diversity. This framework enables professional colleges and policy makers to em-
phasize legal and ethical responsibilities to provide appropriate services to all culturally and
linguistically diverse (CLD) individuals (ASHA, 2014; Individuals with Disabilities
401
Marie Nader
Education Act [IDEA], 2004). However, since populations within the framework vary as a
function of disability, ethnicity, gender identity, culture, language, and dialect, challenges are
expected when language is the focus of investigation. Maydosz and Maydosz (2020)
examined case law and law review journals regarding CLD individuals with disorders,
including L2 speakers. They noted that the term CLD is not uniformly defined across the
literature, resulting in faulty practices as evidenced by legal complaints of inequitable ser-
vices, biased evaluations, and misplacements in special education. Despite increasing
research examining CLD individuals, operational definitions remain scarce. When provided,
a definition encompasses heterogeneous groups such as bilinguals, non-standard dialect
users, and monolinguals in a minority language (D’Souza et al., 2012).
Obviously, differences across and within these groups are expected, such as sociocultural
background, amount of L2 exposure, and language sociopolitical status (minority vs. ma-
jority). Furthermore, terms within the framework are hardly interchangeable and may in-
clude a variety of subgroups. For instance, non-standard dialect users refers to speakers of a
regional/cultural dialect of a given language, who may or may not be bilinguals. Bilinguals,
on the other hand, is a neutral term referring to individuals who speak at least two languages
with varying degrees of proficiency (Meisel, 2019). Cook (2002) challenges the use of the term
bilingual as it has “contradictory definitions and associations in both popular and academic
usage” (p. 4). Among bilinguals are L2 speakers and heritage-language (HL) individuals, that
is, bilinguals of an immigrant minority language, raised in migrant families, exposed to their
HL and the majority language of the community since birth or in childhood, who may speak
or merely understand their HL (Benmamoun et al., 2013). Each subgroup is characterized by
linguistic particularities. For instance, TD L2 speakers can show signs of non-target-like
acquisition in different linguistic components of the L2 (e.g., phonetics, phonology, mor-
phology, semantics, syntax) (Benmamoun et al., 2013). TD HL speakers share linguistic
properties with both NSs and L2 speakers, and exhibit signs of non-target-like development
in their HL as well as in L2 which often becomes their more dominant language (Valdés,
2005). Such characteristics have implications for both assessment and intervention, hence, an
in-depth knowledge of L1 and L2 processes for each subgroup is paramount to providing
equitable and ethical speech-language pathology services.
Issues Related to Clinical SLPs
L2-Related Education and Training

Currently, there are no mandatory L2-related certifications legally required of clinical SLPs,
beyond a clinical competence certification. Although ASHA emphasizes the need to develop
cultural and linguistic competencies when working with all populations, it also states that
“how one attains and maintains competency in a given area is up to that individual and her
director and/or facility” (ASHA, n.d.). As a result, variability among SLPs’ L2 knowledge
and insufficient L2 education are reported (Guiberson & Atkins, 2012). At the university
level, L2-related topics may be embedded in courses for one to a few hours (if any) ranging
from a broad definition of bilingualism to a more specific explanation of why pathological
features should be observable in both speakers’ languages to diagnose a disorder.
Consequently, clinical SLPs have expressed a lack of confidence and competence when
working with L2 speakers (ASHA, 2016a; Caesar & Kohler, 2007). Interestingly, Guiberson
and Atkins (2012) reported that, of 154 certified monolingual and bilingual SLPs surveyed,
about 70% felt comfortable working with culturally diverse individuals but felt less compe-
tent with linguistically diverse populations. Wallace (1997) had similar findings in a survey of
402
SLPs working with adults with neurogenic communication disorders (such as aphasia and
traumatic brain injuries); 60% of clinicians “did not feel competent to provide clinical ser-
vices to diverse populations, particularly when a language or dialect difference was involved”
(p. 116). More recently, in a survey of 83 English and French-speaking SLPs in Québec, 96%
strongly agreed that mandatory L2-related education and training in graduate programmes
are needed (Nader & Chapdelaine, 2022).
Ongoing Misidentification and Disproportionality

Misidentification, that is, under and over-identification of disorders, is ongoing. In L2 as-
sessment, under-identification refers to misidentifying pathological features as characteristics
of typical L2 acquisition. Over-identification occurs when characteristics of typical L2
acquisition are mistakenly diagnosed as disorders (Bedore & Peña, 2008). Misidentifying
individuals can lead to disproportionality within health and educational systems, “the extent
to which membership in a given group affects the probability of being placed in a specific
disability category” (Oswald et al., 1999, p. 198). Both misidentification and dis-
proportionality negatively impact L2 speakers’ academic, social and professional develop-
ment and success. For example, appropriate intervention can be denied or delayed for up to
3 years for L2 speakers whose speech or language disorder is not identified (Wagner et al.,
2005). Conversely, L2 speakers mistakenly diagnosed with a disorder can be inappropriately
placed into special education. Misplacement, which mostly occurs with school-age children,
“can be stigmatizing and can also deny individuals the high quality and life enhancing
education to which they are entitled” (Artiles et al., 2002, p. 4). As Sullivan (2011) argued,
“ongoing disproportionality [and misidentification] strongly indicate systemic problems of
inequity, prejudice, and marginalization” (p. 318). These challenges are associated with
several factors, mainly lack of L2 knowledge and lack of evidence-based assessment ap-
proaches for L2 speakers with disorders, adults (Anderson et al., 2017) and children (Arias &
Friberg, 2017).
L2 Clinical Assessment Issues

An objective of clinical assessment is validating the presence or absence of a speech or
language disorder. With L2 speakers, differential diagnosis is complicated. In practice,
clinicians investigate L2 speakers’ case history (i.e., medical, linguistic, and educational)
through informal tools such as interviews and questionnaires, then evaluate L2 speakers’
linguistic performances, ideally in L1 and L2. However, frequently, only the L2 is assessed
when it is the speaker’s dominant language, but also, most commonly, due to a lack of
sufficient time and resources. To evaluate speech and language skills, norm-referenced
standardized assessment tools remain the “cornerstone of diagnosis in the field” (Peña et al.,
2006, p. 247). Yet, researchers argue against using such tools in L2 practice due to linguistic
biases and disproportionate representations of L2 speakers within normative samples (Laing
& Kamhi, 2003). One way to aid differential diagnosis is by assessing speakers’ L1 and L2
since observing difficulties in both is viewed as evidence of a disorder. Norm-referenced tools
are available in some languages other than English, mostly Spanish, although their use re-
mains scarce given that few SLPs are proficient in the speakers’ L1, and collaborating with
interpreters, rarely trained in language or pronunciation assessments, can be challenging.
Importantly, few tools exist in most minority languages, which leads some SLPs, aided by
family members or interpreters, to translate existing tools into the speakers’ L1s. SLPs
should beware that translated tools are rarely informative because they do not represent the
403
Marie Nader
structures and characteristics of the L1. Finally, the use of L1 norm-referenced tools with L2
children can lead to their misidentification for disorders. Indeed, L2 children are exposed to
L1 input that is quantitatively and qualitatively different from the input to which NSs of the
L1 are exposed in their country of origin. Some L2 children often undergo incomplete ac-
quisition or even attrition of their L1.
Incomplete Acquisition and Attrition in L2 Clinical Assessment

According to Rothman (2009)
quantitative and qualitative differences in HL input and the introduction, influence

of the societal majority language, and differences in literacy and formal education
can result in what on the surface seems to be arrested development [i.e., incomplete
acquisition] of the HL or attrition in bilingual knowledge (p. 156).
Based on the Fundamental Difference Hypothesis (Bley-Vroman, 1990) within the per-
spective of the Universal Grammar, Sorace (1993) argued that L2 incomplete representations
in speakers’ interlanguage are due to a critical period for acquisition. On the other hand,
incomplete acquisition of L1 can also occur as observed in internationally adopted and HL
children, due to insufficient L1 input (Montrul, 2008; but see Perez-Corteset al., 2019). As for
L1 attrition, it refers to a non-pathological decrease in L1 use due to the loss/limited access to
previously acquired linguistic features or structures (see Chapter 31 this volume). L2 speakers
may experience L1 incomplete acquisition (children) and/or attrition (children and adults) as
a by-product of L2 contact. A gradual shift in language dominance from L1 to L2 can occur,
which, for children, is often observed before the L1 is fully developed, hindering L1 native-
like proficiency in many linguistic and phonological aspects (Montrul, 2008). Given that
SLPs are encouraged to assess speakers in their L1 and L2, factoring incomplete acquisition
and attrition into their clinical judgement is fundamental. Indeed, if L2 speakers have un-
dergone L1 incomplete acquisition and/or attrition, an underperformance on L1 test scores
would be observed compared to monolingual age-peers. In such cases, over-identification of
speech and language disorders can occur by clinicians unfamiliar with these processes.
Issues Related to SLP-FAM Providers
L2-pronunciation Instruction: Education or Medical Background?

How SLPs view pronunciation determines their focus of practice with TD L2 speakers.
Unlike language teachers, who view pronunciation as part of a language-focused educational
programme, SLPs approach FAM through a medical lens. For instance, many still use terms
such as “treatment,” “intervention,” “procedure,” as well as “disorder,” “impairment,” and
“patient” when describing FAM (Schmidt & Sullivan, 2003), contrary to colleges’ position
statements on the issue. Moreover, as health professionals and L1 communication experts,
SLPs rely on their medical background and graduate-level courses such as phonetics and L1
disorders to inform their FAM practice.
There are several problems with this “pseudo-medical” approach (Derwing et al., 2014).
First, a medical background is not relevant for FAM as a TD L2 accent is not a speech
disorder. Second, L1 pronunciation graduate-level courses are insufficient to qualify SLPs to
provide L2-pronunciation services to TD L2 speakers. Third, SLPs rely on L1 research to
inform their clinical judgement. However, when it comes to TD L2 speakers, relying on L1
404
clinical research can lead to unethical practices, even by the most well-intentioned
practitioner.
Inaccuracies can be found in SLP-led research stemming from a lack of L2-pronunciation
education and evidence-based practice. In the following statement for instance, “[a] perceived
strong or thick foreign or regional accent as compared to a mild accent is difficult or im-
possible for a native speaker to understand” (Freysteinson et al., 2017, p. 300); aside from
collapsing NNSs, NSs with non-standard (regional) accent, and NSs with a more “standard”
accent, the statement also reflects a lack of knowledge of L2 research that has clearly es-
tablished that having a strong foreign accent does not necessarily impede intelligibility
(Munro & Derwing, 1995).
Experts in the field of L2 pronunciation frequently argue that SLPs, as well as language
teachers wishing to work as FAM/PI providers, must acquire specialized education and
training in L2-pronunciation (Derwing et al., 2014; Müller et al., 2000; Thomson & Foote,
2019). Yet, to date in North America, specific L2-education and training requirements and
enforceable policies by professional bodies are lacking, thus contributing to ongoing pro-
blems in the L2 pronunciation field.
FAM Assessment Issues

ASHA’s Code of Ethics Principle I, Rule M, states that “[SLPs] shall use independent and
evidence-based clinical judgment, keeping paramount the best interests of those being served
(2016b, n.p.).” That is, SLPs must rely on available research to inform subsequent training.
In FAM, SLPs often rely on L1 clinical research and practice. For instance, they include a
complete case history to better understand the underlying nature of the speaker’s speech
errors. In FAM, however, the adequacy of a case history is questionable since there is no
need for differential diagnosis when assessing TD individuals. Information as to the in-
dividual’s motivation for seeking FAM, age of L2 acquisition, amount and contexts of L1
and L2 use, can be more pertinent. Furthermore, similar to clinical evaluations, FAM as-
sessments are often done in individual quiet testing settings. Yet, SLPs would gain from
analyzing L2 speakers’ spontaneous speech recorded in different settings with different
interlocutors.
Another problem resulting from a reliance on L1 research is that terminology is often
defined differently in L1 and L2 research. For example, in L1 clinical research on SSDs,
intelligibility is defined as “the degree to which the acoustic signal (the utterance produced by
the speaker) is understood by a listener.” Conversely, in L2 research, intelligibility refers to
the actual understanding not only at the acoustic level but at the lexical, semantic and
pragmatic levels (Levis, 2018, p. 11). These differences help explain why most SLPs focus
their assessments of L2 intelligibility on the anatomical and physiological aspects of speech.
For instance, most SLPs include oro-motor assessments evaluating the functionality of
speech-related muscles. However, an oral-motor assessment is irrelevant to L2 intelligibility
as neither foreign accents nor communication breakdowns with TD speakers stem from weak
speech muscles or improper airflow. Second, most SLPs focus their L2 speech assessments
primarily on elicitation of words (picture naming), repetition tasks of isolated phonemes,
syllables, words and sentences, as well as read-alouds mostly with older children and adults
as a means to guide pronunciation and obtain a clear data for analysis. Yet, these artificial
tasks do not measure connected speech given that L2 speakers “can consciously attend to
their pronunciation to reduce connected speech with a goal of clearer enunciation and
greater intelligibility. This is especially true when the speaker reads a text aloud, rather than
composing and speaking a text simultaneously” (Wagner & Toth, 2017, p. 74).
405
Marie Nader
Finally, most SLPs obtain a detailed repertoire of L2 speakers’ phonetic and phonological
errors to inform subsequent individual training programmes (stemming from a structuralist
approach to SSDs). However, not all phonemes are equally important for intelligibility.
Following the principle of functional load, that is, “a measure of the work which two pho-
nemes (or a distinctive feature) do in keeping utterances apart” (King, 1967, p. 831),
segmental contrasts have been ranked according to their importance in pro-

nunciation […] based on factors such as frequency of minimal pairs, the neu-
tralization of phonemic distinctions in regional varieties, segmental position within a
word, and the probability of occurrence of individual members of a minimal pair
(Munro & Derwing, 2006, p. 522).
Hence, to determine the needs of TD L2 speakers in PI, an approach should be adopted

focusing on segmentals with a higher functional load since they can have a greater impact on
speakers’ comprehensibility and intelligibility (Kang, Thomson & Moran, 2020). Moreover,
determining context of L2 use is important since “what constitutes mutual intelligibility
varies from context to context” (Kang et al., 2020, p. 454). Finally, assessing suprasegmental
features such as prosody is also necessary, although most SLPs tend to focus almost ex-
clusively on segmentals.
FAM Training Issues

ASHA’s Code of Ethics (2016b), Principle I, Rule K states that “[SLP] shall evaluate the
effectiveness of services provided […] and shall provide services […] only when benefit can
reasonably be expected” (n. p.).
Differently put, SLPs should not provide services if benefits cannot be expected, and ex-
pectations should be evidence-based. Unfortunately, there is limited information on empiri-
cally validated FAM intervention programmes. Given that SLPs “do not typically recognize
contributions of or projects that come from outside [their] own discipline” (Sikorski, 2005,
p. 127), most SLPs rely on programmes in line with approaches initially designed for L1 SSDs,
following either a segmental approach, utilizing mostly imitation drills of phonemes, or a
contrastive phonological approach which emphasizes enhancement of sound contrasts. For
instance, the phonomotor treatment approach, that is, a segmental approach originally devel-
oped for adults with aphasia, has been proposed for “treating” L2 speakers’ accent focusing on
articulation training (Oelke et al., 2015). It emphasizes repetition and use of reactive feedback
with visual cues of tongue/articulation placements for sounds. Another technique is ultrasound
visual biofeedback where traditional articulation therapy techniques are combined with the use
of spectrogram (i.e., software enabling practitioners to create a visual representation of sound
frequencies in a waveform) to inform speakers’ articulation placements and phonological
patterns (Brady et al., 2016). The effectiveness of such techniques with TD L2 speakers remains
unvalidated by empirical research. It is, however, noticeable that they do not account for
perception in L2 pronunciation, and target predominantly segmental features with little or no
consideration to segmental contrasts and functional loads.
Other puzzling techniques used in FAM can reflect a lack of L2 pronunciation knowledge,
such as “pause slightly before phrases, open [your] mouths when speaking, and keep a mirror
near [your] telephone to remind [you] to slow down and move [your] lips” (Freysteinson
et al., 2017, p. 301). Aside from the irrelevance of such recommendations, suggesting that L2
speakers slow their speech rates is not necessarily helpful and can sometimes hinder com-
prehensibility (Munro & Derwing, 2001). Moreover, many SLPs still promote the clearly
406
unattainable objective for most adults of eliminating foreign accents (Abrahamsson &
Hyltenstam, 2009).
Focus on L2 Clinical Assessment

To reduce linguistic and cultural biases in L2 assessments, researchers advocate for the use of
both informal and formal measures (Ortiz & Ochoa, 2005). Informal measures include self,
teacher, and parent rating scales and interviews, as well as qualitative analysis of storytelling
and spontaneous speech. Formal instruments are available for specific L2 populations,
predominantly Spanish-English speakers, making it difficult to generalize to other L2 po-
pulations. Hence, recently, cross-linguistic formal tasks (e.g., the LITMUS Sentence
Repetition tasks) have been developed providing comparable results across languages
(Marinis & Armon-Lotem, 2014). Other alternative approaches are also advocated in L2
contexts such as the use of dynamic assessment and processing-dependent measures.
Dynamic assessment (DA) is a test–teach–retest strategy for evaluating learning cap-
abilities within a short period (Gillam & Peña, 2004). In a systematic review, Orellana et al.
(2019) examined the diagnostic accuracy of DA for speech-language disorders within L2
populations. The authors found evidence supporting DA for diagnosing disorders in L2
children. Nonetheless, although very informative for L2 assessments, DA remains underused
by clinical SLPs due to a lack of appropriate training, insufficient time, and the pressure to
provide quantitative results from formal assessments in line with schools’ requirements and
medical insurance policies.
Processing-dependent measures are psycholinguistic processing tasks thought to require
minimal use of prior linguistic or cultural knowledge, thus limiting biases found in linguistic-
based instruments. One such task, the Non-Word Repetition Task (NWRT), is regarded by
SLPs as a marker for language disorder in both L1 and L2 speakers. Individuals are asked to
repeat verbatim nonsense words of up to four syllables, built using phonotactic properties
specific to a given target language, usually the majority language. However, the use of
NWRTs with L2 speakers has been extensively criticized (Washington & Craig, 1994); in-
deed, to be able to repeat non-words in L2, one must rely on their knowledge of the L2
phonotactic system. Moreover, results in NWRTs are correlated to the individual’s L2 lexical
knowledge (Service et al., 2007). Hence, NWRTs remain biased against L2 speakers (Yan &
Oller, 2007). Recently, the use of language-independent NWRTs built with cross-linguistic
phonemes has been proposed to avoid such biases. Another avenue is the use of Non-Word
Sequence Recognition Tasks which have the advantage of not requiring the verbal pro-
duction of non-words, thus, eliminating any bias related to articulatory processes, focusing
on auditory perception (O’Brien et al., 2007). Finally, others have argued that the non-words
presented should be built using the phonotactic properties of a language unknown to the L2
speakers (Nader et al., 2017). In all cases, however, processing-dependent measures remain of
limited diagnostic reliability if used alone and should be used in conjunction with other
informal and cross-linguistic formal tools.
Focus on SLP-FAM Lead Research

SLPs have provided PI services to TD L2 speakers since the profession began. However, to
date, SLP-led studies of FAM remain limited with the exception of a couple of studies (Blake
et al., 2020; Brady et al., 2016). Several points are worth highlighting.
407
Marie Nader
First, a shift in the focus of attention from accentedness to intelligibility is evident in the
effort to change the current label, FAM, to a more evidence-based related term “intellig-
ibility enhancement” (Blake et al., 2020). FAM misleads vulnerable L2 speakers into
thinking that their accent is the core problem in communication breakdowns and negative
outcomes. Shifting to “intelligibility enhancement” will focus L2 speakers’ attention (and
practitioners’ likewise) on improving intelligibility rather than modifying one’s accent. This is
an important change that should be more broadly adopted by SLPs, and more importantly,
strongly endorsed by professional colleges and associations.
Second, SLPs interested in FAM seem more critical towards the quality of available re-
search within their field. For instance, Gu and Shah (2019) reviewed 26 published studies
from 1990 to 2018 examining the effectiveness of FAM training programmes implemented
with L2 speaker healthcare professionals. The authors reported that “all included studies
were of low research quality and often had small sample sizes and few objective outcome
measures, indicating a lack of generalizability and reproducibility” (p. 391). However, even
these authors embraced a medical approach using terms such as “patient,” “clinicians,” and
“interventions.” Few references were made to leading experts in L2 pronunciation in dis-
cussing the reviewed studies.
Finally, neuroscientists have shown increased interest in understanding neural mechan-
isms underlying TD L2 speakers’ accent using neuroimaging techniques such as functional
magnetic resonance imaging (fMRI). In one such study in Québec, Ghazi-Saidi et al. (2015)
examined 12 Spanish-speaking adults who learnt 35 new Spanish–French cognates by means
of a computerized training programme for 4 weeks, followed by picture naming measures of
learned cognates during fMRI scanning in French L2 and Spanish L1. All participants self-
evaluated as being low proficient in French. The study indicated that attempting to produce
an L2 native-like pronunciation is cognitively effortful. They also outlined the role in L2
speech processing of a small brain region, the insula, that had previously been associated
with emotional processing and processing uncertainty. Although an interesting avenue to
increase our understanding of differences between TD L2 speakers’ performances in L2
pronunciation, the usefulness of neuroscience studies remains limited.

I have thus far discussed issues and contributions of clinical SLPs and SLP-FAM providers
separately. It is evident that with L2 speakers with disorders and TD L2 speakers, SLPs tend
to wear their medical L1 communication experts’ hats. As an SLP myself, the skills and
abilities we acquire in L1 communication assessment and intervention become indeed in-
herent to our approaches in all settings. Yet, it is only when coupled with appropriate L2
knowledge that these skills are efficient with L2 speakers.
Given the necessity of adequate and sufficient L2 knowledge to provide appropriate and
ethical services to L2 speakers with disorders and given the ongoing growth in communities’
linguistic and cultural diversities, we would expect our professional colleges to see an urgent
need to make L2-related education mandatory for all SLPs. Despite an effort from profes-
sional colleges and associations to make available relevant research (e.g., ASHA’s Practice
Portal), many critical issues with detrimental outcomes such as misidentification of L2
speakers are still ongoing. It is only through an increased understanding of L2 theories and
evidence-based practices that SLPs can keep paramount the best interests of L2 speakers’
being served.
SLP-FAM providers working with TD L2 speakers generally show a lack of under-
standing of the nature of foreign accents and evidence-based practices. Indeed, much of L1
408
medical-related knowledge and acquired skills with L1 speakers with disorders can be irre-
levant to PI and lead to unethical practices. Given the specialized area of practice that is PI,
official policy makers and professional colleges must come together to regulate the practice,
first, by making mandatory a specialized certification in L2 pronunciation instruction, and
second, by suggesting a recommended guideline of courses/topics and training objectives
with the collaboration of L2 experts.
6 Future Directions
We have come a long way from “speech correctionists” to professional communication ex-
pert SLPs. Throughout our journey, collaborating with professionals from related fields has
been (and continues to be) enriching. Such collaborations enable us to provide evidence-
based services to L1 speakers and to offer support and counsel to family members and
caregivers. However, much is still needed to ensure L2 speakers’ legal rights to ethical and
evidence-based services. Currently, the interaction between our field and the fields of applied
linguistics and L2 research remains extremely limited. With more L2 individuals in need of
SLP services, it is time to bridge the gap. One potential venue for such interactions is the
annual Pronunciation in Second Language and Learning (PSLLT) conference which will
open discussions with L2 experts. Moving forward, a field dedicated to L2 clinical linguistics
may pave the way for fruitful collaborations between our fields. It is through interactions
that L2 theories can shape SLP practices with L2 speakers, but also where L2 clinical re-
search and practice can inform linguistic theories.
Further Reading
Grant, L. (2014). Pronunciation myths: Applying second language research to classroom teaching. Ann
Arbor: U. of Michigan Press.
Experts in L2 pronunciation research and teaching discuss seven myths about L2 pronunciation in-
struction while covering central concepts, terms and issues.
Montrul, S. (2008). Incomplete acquisition in bilingualism: Re-examining the age factor. Amsterdam:
Benjamins.
A review of literature on non-native L2 and L1 attainment in adult L2 speakers and heritage language
(HL) children. Various degrees of incomplete acquisition are described in L2 contexts as are processes
such as attrition and incomplete acquisition.
References
Abrahamsson, N., & Hyltenstam, K. (2009). Age of onset and nativelikeness in a second language:
Listener perception versus linguistic scrutiny. Language Learning, 59, 249–306.
American Speech-Language-Hearing Association (ASHA). (1993). Definitions of communication dis-
orders and variations. Retrieved from https://www.asha.org/policy/rp1993-00208/
American Speech-Language-Hearing Association. (2014). Cultural competence. Retrieved from
www.asha.org/Practice-Portal/Professional-Issues/Cultural-Competence/
American Speech-Language-Hearing Association (ASHA). (2016a). 2016 Schools survey: SLP caseload
characteristics. Rockville, MD. Retrieved from http://www.asha.org
American Speech-Language-Hearing Association (ASHA). (2016b). Code of Ethics [Ethics]. Retrieved
from www.asha.org/policy/
American Speech-Language Hearing Association (ASHA). (2017). Issues in Ethics statement: Cultural
and linguistic competence. Retrieved from https://www.asha.org/Practice/ethics
American Speech-Language Hearing Association (ASHA). (n.d.). In Frequently Asked Questions.
Retrieved from https://www.asha.org/slp/clinical/dysphagia/dysphagia_faqs/
Anderson, J., Saleemi, S., & Bialystok, E. (2017). Neuropsychological assessments of cognitive aging in
monolingual and bilingual older adults. Journal of Neurolinguistics, 43, 17–27.
409
Marie Nader
Arias, G., & Friberg, J. (2017). Bilingual language assessment: Contemporary versus recommended
practice in American schools. Language, Speech, and Hearing Services in Schools, 48, 1–15.
Artiles, A., Harry, B., Reschly, D., & Chinn, P. (2002). Over-identification of students of color in
special education: A critical overview. Multicultural Perspectives, 4, 3–10.
Bedore, L., & Peña, E. (2008). Assessment of bilingual children for identification of language impair-
ment: Current findings and implications for practice. International Journal of Bilingual Education
and Bilingualism, 11, 1–29.
Benmamoun, E., Montrul, S., & Polinsky, M. (2013). Heritage languages and their speakers:
Opportunities and challenges for linguistics. Theoretical Linguistics, 39, 129–181.
Blake, H. L., McLeod, S., & Verdon, S. (2020). Intelligibility enhancement assessment and intervention:
A single-case experimental design with two multilingual university students. Clinical Linguistics and
Phonetics, 34(1-2), 1–20.
Bley-Vroman, R. (1990). The logical problem of foreign language learning. Linguistic Analysis, 20, 3–49.
Brady, K., Duewer, N., & King, A. (2016). The effectiveness of a multimodal vowel-targeted intervention
in accent modification. Contemporary Issues in Communication Science and Disorders, 43, 23–34.
Caesar, L., & Kohler, P. (2007). The state of school-based bilingual assessment: Actual practice versus
recommended guidelines. Language, Speech, and Hearing Services in Schools, 38, 190–200.
Cavanaugh, M. (1996). History of teaching English as a second language. The English Journal,
85, 40–44.
Cook, V. (2002). Background of L2 users. In V. Cook (Ed.), Portraits of L2 users (pp. 1–28). Cleveland,
UK.: Multilingual Matters.
Damico, J. S. (1993). Synergy in applied linguistics: Theoretical and pedagogical implications. In F.
Eckman (Ed.), Confluence: Linguistics, L2 acquisition, and speech pathology (pp. 195–212).
Amsterdam: John Benjamins Publishing Company.
Derwing, T. M., Fraser, H., Kang, O., & Thomson, R. (2014). L2 accent and ethics: Issues that merit
attention. In A. Mahboob & L. Barratt (Eds.), Englishes in multilingual contexts. Berlin: Springer.
D’Souza, C., Kay-Raining Bird, E., & Deacon, H. (2009). Survey of Canadian speech-language pa-
thology service delivery to linguistically diverse clients. Canadian Journal of Speech-Language
Pathology and Audiology, 36, 18–39.
Duchan, J. (2006). The diagnostic practices of Speech-Language Pathologists in America over the last
century. In J. Duchan & D. Kovarsky (Eds.), Diagnosis as cultural practice (pp. 200–222). Mouton
de Gruyter: Berlin.
Flege, J. (2016). The role of phonetic category formation in second language speech acquisition. In
Eight international conference on second language speech, Aarhus University, Denmark.
Freysteinson, M., Adams, D., Cesario, S., Belay, H., Clutter, P., Du, J., Duson, B., Goff, M.,
McWilliams, L., Nurse, R. P., & Allam, Z. (2017). An accent modification program. Journal of
Professional Nurses, 33, 299–304.
Freysteinson, M., Adams, D., Cesario, S., Belay, H., Clutter, P., Du, J., Duson, B., Goff, M.,
McWilliams, L., Nurse, R. P., & Allam, Z. (2017). An accent modification program. Journal of
Professional Nurses, 33, 299–304.
Ghazi-Saidi, L., Dash, T., & Ansaldo, A. (2015). How native-like can you possibly get: fMRI evidence
for processing accent. Frontiers in Human Neuroscience, 9, 1–12.
Gu, Y., & Shah, A. (2019). A Systematic review of interventions to address accent-related commu-
nication problems in healthcare. Ochsner Journal, 19, 378–396.
Guiberson, M., & Atkins, J. (2012). Speech-Language Pathologists’ preparation, practices, and per-
spectives on serving culturally and linguistically diverse children. Communication Disorders
Quarterly, 33, 169–180.
Gillam, R., & Peña, E. (2004). Dynamic assessment of children from culturally diverse backgrounds.
Perspectives on Communication Disorders and Sciences in Culturally and Linguistically Diverse po-
pulations, 11(2), 2–5.
Hedrick, J. (1922). A unique speech clinic. Position paper at Session –National Society for the Study and
Correction of Speech Disorders. Atlantic City: New Jersey.
Individuals with Disability Education Act Amendments of 2004 [IDEA]. (2004). Retrieved from https://
ideadata.org/
Kang, O., Thomson, R., & Moran, M. (2020). Which features of accent affect understanding?
Exploring the intelligibility threshold of diverse accent varieties. Applied Linguistics, 41, 453–480.
410
King, R. D. (1967). Functional load and sound change. Language, 43, 831–852.
Laing, S., & Kamhi, A. (2003). Alternative assessment of language and literacy in culturally
and linguistically diverse populations. Language, Speech, and Hearing Services in Schools, 34(1),
44–55.
Lippi-Green, R. (2012). English with an accent: Language, ideology, and discrimination in the United
States (2nd ed.). London: Routledge.
Marinis, T., & Armon-Lotem, S. (2014). Sentence repetition. In S. Armon-Lotem, N. Meir & J. de Jong
(Eds.), Assessing multilingual children: Disentangling bilingualism from language impairment
(pp. 116–143). Clevedon, UK: Multilingual Matters.
Maydosz, A., & Maydosz, D. (2020). Culturally and linguistically diverse students with disabilities:
Case law review. Multicultural Learning & Teaching, 8, 65–80.
Meisel, J. (2019). Bilingual children: A guide for parents. UK: Cambridge University Press.
Montrul, S. (2008). Incomplete acquisition in bilingualism. Re-examining the age factor. Amsterdam/
Philadelphia: John Benjamins Publishing Company.
Müller, N., Ball, M., & Guendouzi, J. (2000) Accent reduction programmes: Not a role for speech-
language pathologists? Advances in Speech-Language Pathology, 2, 119–129.
Munro, M. J., & Derwing, T. M. (1995). Foreign accent, comprehensibility, and intelligibility in the
Munro, M. J., & Derwing, T. M. (2001). Modelling perceptions of the comprehensibility and accent-
edness of L2 speech: The role of speaking rate. Studies in Second Language Acquisition, 23, 451–468.
Munro, M. J., & Derwing, T. M. (2006). The functional load principle in ESL pronunciation in-
struction: An exploratory study. System, 34(4), 520–531.
Nader, M., & Chapdelaine, C. (2022). Speech-Language Pathologists and linguistically diverse popu-
lations: Training, practice and future perspectives. [manuscript in preparation]
Nader, M., Simard, D., Fortier, V., & Molokopeeva, T. (2017). Étude de la contribution de la mémoire
de travail et de la mémoire phonologique dans la réalisation d’une tâche métasyntaxique chez des
enfants de langue d’origine. La revue canadienne de linguistiques appliquée / Canadian Journal of
language oral fluency gains. Studies in Second Language Acquisition, 29, 557–582.
Oelke, M., Sachet, L., Nagle, K., Bislick, L., Brookshire, E., & Kendall, D. (2015). Can intensive
phonomotor therapy modify accent? A phase I study. Speech, Language and Hearing, 18, 229–242.
Oller, J. W. Jr., Oller, S. D., & Badon, L. (2010). Cases: Introducing communication disorders across the
life span. San Diego, CA: Plural Publishing, Inc.
Orellana, C., Wada, R., & Gillam, R. (2019). The use of Dynamic Assessment for the diagnosis of
language disorders in bilingual children: A meta-analysis. American Journal of Speech-Language
Pathology, 28, 1298–1317.
Ortiz, S., & Ochoa, S. (2005). Cognitive assessment of culturally and linguistically diverse individuals:
An integrated approach. In R. Rhodes, S. H. Ochoa, & S. O. Ortiz (Eds.), Assessing culturally and
linguistically diverse students: A practical guide (pp. 168–201). New York: Guilford Press.
Oswald, D., Coutinho, M., Best, A., & Singh, N. (1999). Ethnic representation in Special Education:
The influence of school-related economic and demographic variables. The Journal of Special
Education, 32, 194–206.
Paradis, J., Genesee, F., & Crago, M. (2011). Dual language development and disorders: A handbook on
bilingualism and second language learning (2nd ed.). Baltimore, MD: Brookes.
Perez-Cortes, S., Putnam, M. T., & Sánchez, L. (2019). Differential access: Asymmetries in accessing
features and building representations in Heritage Language grammars. Languages, 4(4), 81.
Peña, E., Spaulding, & Plante, E. (2006). The composition of normative groups and diagnostic decision
making: Shooting ourselves in the foot. American Journal of Speech-Language Pathology, 15, 247.
Rapeer, L. (1916). Review: A speech symposium. The English Journal, 5, 519–520.
Rothman, J. (2009). Understanding the nature and outcomes of early bilingualism: Romance languages
as heritage languages. International Journal of Bilingualism, 13(2), 155–163.
Schmidt, A., & Sullivan, S. (2003). Clinical training in foreign accent modification: A national survey.
Contemporary Issues in Communication Science and Disorders, 30, 127–135.
Service, E., Maury, S., & Luotoniemi, E. (2007). Individual differences in phonological learning and
verbal STM span. Memory & Cognition, 35(5), 1122–1135.
411
Marie Nader
Sikorski, L. (2005). Foreign accents: Suggested competencies for improving communicative pro-
nunciation. Seminars in Speech and Language, 26(2), 126–130.
Sorace, A. (1993). Incomplete vs. divergent representations of unaccusativity in non-native grammars
of Italian. Second Language Research, 9, 22–47.
Speech-Language and Audiology Canada [SAC]. (2016). Scope of practice for speech-language pa-
thology. Available from www.sac-oac.ca
Sprague, J. (1925, July 4). Better Speech, Better Business. Saturday Evening Post, pp. 40–42.
Sullivan, A. (2011). Disproportionality in special education identification and placement of English
Language Learners. Exceptional Children, 77(3), 317–334.
Thomson, R. (2014). Myth 6: Accent reduction and pronunciation instruction are the same thing. In L.
Grant (Ed.), Pronunciation myths: Applying second language research to classroom teaching
(pp. 160–187). Ann Arbor, MI: University of Michigan Press.
Thomson, R., & Foote, J. (2019). Pronunciation teaching: Whose ethical domain is it anyways? In J.
Levis, C. Nagle & E. Todey (Eds.), Proceedings of the 10th Pronunciation in Second Language
Learning and Teaching Conference, ISSN 2380-9566, Ames, IA, September 2018 (pp. 226–236).
Ames, IA: Iowa State University.
Valdés, G. (2005). Bilingualism, Heritage Language Learners, and SLA research: Opportunities lost or
seized? The Modern Language Journal, 89, 410–426.
Van Riper, C. (1954). Speech correction: Principles and methods (3rd ed.). New York: Prentice-Hall.
Wagner, R., Francis, D., & Morris, R. (2005). Identifying English language learners with learning
disabilities: Key challenges and possible approaches. Learning Disabilities Research and Practice,
20(1), 6–15.
Wagner, E., & Toth, P. (2017). The role of pronunciation in the assessment of second language listening
ability. In Isaacs T. & Trofimovich P. (Eds.), Second language pronunciation assessment:
Interdisciplinary perspectives (pp. 72–92). Bristol; Blue Ridge Summit: Multilingual Matters/Channel
View Publications.
Wallace, G. (1997). Infusing multicultural content into the traditional neurogenics framework. In G.
Wallace (Ed.), Multicultural neurogenics: A resource for Speech-Language Pathologists providing
services to neurologically impaired adults for culturally and linguistically diverse backgrounds
(pp. 115–127). Austin, TX: Pro-Ed.
Washington, J.A., & Craig, H.K. (1994). Dialectal forms during discourse of poor, urban, African
American preschoolers. Journal of Speech and Hearing Research, 37(4), 816–823.
Yan, R., & Oller, J. W. (2007). Processing-dependent measures as a failed solution to the assessment of
individuals from language and dialect minorities. Communicative Disorders Review, 1(3), 1–14.
412
29
CHILD L2 SPEAKERS WITH
LANGUAGE AND
COMMUNICATION DISORDERS
Johanne Paradis
Children who are dual language learners with language and communication disorders
(LCDs) are the focus of this chapter. In school contexts, these children are often referred to
as students with special education needs. Research on this population follows two main
branches. The first is more theoretical and concerns whether children with LCDs have the
capacity to learn two languages, and whether their dual language learning has unique
characteristics. The second branch is more applied and focuses on issues of referral and
assessment with dual language children, and on which language(s) should be used in inter-
vention, educational programming, and at home for dual language children with LCDs.
The term dual language learners (DLLs) or bilinguals will often be used here because,
unlike adult L2 speakers, child L2 speakers are in the process of acquiring their two lan-
guages simultaneously, and consideration of both L1 and L2 acquisition is common in the
research with this population. Most research on DLLs with LCDs has been conducted with
children who speak a heritage-L1 at home and a majority, societal L2 at school and in the
wider community; that is, children who are primarily first- or second-generation children
from migrant families. Such children are not bilingual by choice but by necessity. To date,
there is limited research on child L2 speakers with LCDs who are learning their L2 by choice,
for example, through an immersion education programme, and so this will not be covered
here (see Kay-Raining Bird, Genesee et al., 2020 for information).
Children with LCDs have developmental disorders that cause impairment in language
development. Developmental disorders are different from acquired disorders in that children
are born with them. Developmental language disorder (DLD; formerly specific language
impairment or SLI) is one of the most common developmental disorders. Children with
DLD present with early language delay that does not resolve, and with difficulties in learning
language that persist until adulthood, but they do not have any other clinically significant
condition (Leonard, 2014). Thus, language disorder is their primary condition. Other chil-
dren present with LCDs as a consequence of another clinical condition. For example, chil-
dren with autism spectrum disorder (ASD) have core deficits in social interaction and
communication generally rather than in language learning specifically. Nevertheless, the
majority of children with ASD present with delay in onset of speaking, and for those who
become verbal, deficits in the pragmatic use of language are very common; some also exhibit
DOI: 10.4324/9781003022497-35 413

Johanne Paradis
symptoms of language disorder in preschool and school (Schwartz, 2017). Children with
Down syndrome (DS) have moderate-to-severe intellectual disabilities affecting multiple
aspects of their development, including delayed onset and protracted speech–language de-
velopment which rarely exceeds their mental age (Schwartz, 2017). As the existing research
has focused on DLLs with these LCDs, these are the ones covered here.
Studies on DLLs and on children with LCDs were pursued separately and mainly in isolation
of each other until the 1990s. Consequently, the discussion begins with early research on
DLLs with typical development and then turns to the more recent intersectional research on
dual language development and disorders. (For historical perspectives on child LCDs, see
Leonard, 2014; Schwartz, 2017.)
Up until the 1970s, research with DLLs was primarily focused on whether early bi-
lingualism suppressed intelligence and therefore was a risk factor for development (Arsenian,
1945; Darcy, 1946). Most of this research was deeply flawed methodologically in that the
“bilinguals” were often beginner learners of the L2 from lower socioeconomic status (SES)
backgrounds than the monolingual comparison groups, which were the likely reasons for
their lower performance on tests conducted in the L2 (Hakuta, 1986). The landmark study of
Peal and Lambert (1962), with French–English bilinguals at a private school in Montreal,
Canada, showed that when SES and proficiency in the L2 were controlled, bilinguals actually
displayed cognitive advantages. The cognitive consequences of dual language learning in
children has been an active line of research ever since (Bialystok, 2011). In the 1970s, sys-
tematic research on the nature and characteristics of child L2 acquisition (as opposed to
adult L2 acquisition) emerged. Developmental versus L1 transfer errors in the L2 speech of
young learners, interdependence and common underlying proficiencies between children’s
two languages, and timelines to native-like proficiency in the L2 were key topics (Cummins,
2000, Dulay et al., 1982). These are still active topics in the field of child L2 acquisition.
In the 1990s, research emerged that focused on issues in clinical practice with DLLs, for
example, challenges in the differential diagnosis of DLLs with typical and atypical devel-
opment given clinical protocols and testing materials based on monolingual mainstream
populations (Guterrez-Clellen Cole 1996; Westernoff, 1991). In the early 21st century, a
sharp increase in studies on DLD (formerly SLI) and dual language development appeared
and the past decade has seen an extension of this research focus to DLLs with ASD and DS.
These studies were not only focused on clinical issues but also on the potential developmental
costs for children with LCDs to learn two languages, which is somewhat reminiscent of the
early 20th-century view that bilingualism could be a risk factor for intellectual development.
Capacity for Dual Language Development in Children With LCDs

Parents, educators, clinicians, and researchers have asked whether DLLs with LCDs would
exhibit exceptional delays and difficulties in their language development due to the burden of
managing dual language input with a language learning disability, the so-called “cumulative
effects hypothesis” (CEH; Paradis, 2010). The CEH suggests that DLLs with LCDs would
show slower language growth and display unique profiles compared to monolinguals with
LCDs. The rationale for the CEH can be specific to the developmental disorder. For ex-
ample, because children with DLD show deficits in the perceptual, processing, and memory
414
Child L2 Language Disorders
systems implicated in language learning (Leonard, 2014), coping with dual language input
could limit uptake and/or overload the system. In the case of ASD, deficits in social inter-
action and pragmatics could limit linguistic input and uptake, and therefore impede language
learning and exacerbate early language delays. For children with DS, intellectual disability in
general, and deficits in auditory memory in particular (Schwartz, 2017), already place bar-
riers to language learning that could be increased through dual language input. Whatever the
rationale, the CEH supports the view that children with LCDs should not be exposed to two
languages, or that bilingualism should be discontinued post-diagnosis on the grounds that
this would be a risk factor in their already comprised development. Existing research does
not show evidence in favour of the CEH, similarly, it does not show evidence in favour of
bilingualism being a risk factor for intelligence.
Issues in Assessment With Dual Language Children

DLLs are often over-referred for special education and speech–language therapy, and in
turn, they are often over-identified as having language and learning disorders (Cummins,
2000; Kohnert, 2010). Over-identification refers to situations where a DLL is inappropriately
diagnosed with a language or learning disorder and receives unnecessary clinical or special
education services. Over-identification can have negative consequences for children’s self-
esteem, attitudes about schooling, and even future educational opportunities. A related
problem, under-identification, refers to situations when a DLL actually has a language or
learning disorder which goes unnoticed or undiagnosed because it is assumed that the child’s
poor performance in the majority L2, and L2-related academic activities, are the result of
learning two languages. Under-identification can also have negative consequences because a
child’s language development and academic performance can suffer without intervention.
Much research has focused on the causes of misidentification of LCDs in DLLs and on
developing strategies that could reduce misidentification.
Issues in Intervention With Dual Language Children With LCDs

One over-riding issue in intervention planning with DLLs post-diagnosis is whether these
children should learn two languages. Even though evidence does not support the CEH,
beliefs that children with LCDs cannot learn two languages without costs to their overall
development persist among many professionals. One line of research has documented the
frequency of advice for parents to use only one language, parent reaction to this advice, and
the often unencouraging consequences for the child and family when the advice was fol-
lowed. Another line of research focuses on strategies for supporting both languages as a
component of educational and intervention programming for DLLs with LCDs.
Capacity for Dual Language Development in Children With LCDs

Early studies were conducted with French–English simultaneous bilinguals with DLD in
Montreal, Canada (Paradis, 2007). Paradis and colleagues examined morphosyntactic
structures considered “clinical markers” for DLD in one or both languages, as well as
structures that are not. Clinical markers are morphosyntactic structures that are particularly
difficult for children with DLD, so children show more profound delays in the acquisition of
clinical markers than what their general language delays would indicate; clinical markers
415
Johanne Paradis
tend to be language specific in that what might be difficult for English speakers is not difficult
for French or Spanish speakers (Leonard, 2014). These studies found that the bilingual
children with DLD showed the same level of morphosyntactic abilities in each language as
their monolingual age peers with DLD, they showed the same profiles of strengths and
weaknesses with clinical and non-clinical markers as monolinguals with DLD, and their
morphosyntactic profile with clinical markers were language specific, thus no evidence of
crosslinguistic transfer.
Because French and English are both prestige and majority languages in Canada, it is
possible that these findings would not generalize to other simultaneous or early sequential
bilinguals who speak a heritage-L1 with a majority-L2, since the heritage-L1 would receive
less community support. However, research with Spanish–English DLLs in the United States
(Morgan et al., 2013), with English-L2 DLLs from diverse L1 backgrounds in Canada
(Rezzonico, Chen et al., 2015), and Dutch-L2 DLLs from diverse L1 backgrounds in the
Netherlands (Boerma 2016) show results consistent with those of French–English children
with DLD: dual language learning in the early years does not add additional difficulties for
children with DLD.
Regarding sequential bilinguals who started learning their L2 at school, comparisons to
their monolingual peers with DLD is complicated by their delay in the onset of L2 acqui-
sition; in other words, child L2 speakers would always lag behind monolinguals, whether
they have DLD or not. Therefore, studies of sequential bilinguals with DLD tend to compare
them to their typically developing (TD) bilingual peers to ascertain whether the children with
DLD show exceptional delays or unique profiles in their L2 development. A clinical marker
of monolingual English-speaking children with DLD is their protracted development of tense
morphology (e.g., past tense [-ed], or third person singular [-s]; Leonard, 2014). DLLs with
DLD also exhibit protracted development of tense morphology in their English L2, as
compared to their TD DLL peers (Blom & Paradis, 2013; Jacobson & Yu, 2018).
Furthermore, DLLs with DLD can show similar abilities to their TD DLL peers in their
acquisition of morphosyntactic structures that are not clinical markers, also parallel to
monolinguals (Paradis, 2010). Thus, in their English L2 acquisition, children with TD and
with DLD display the same profiles as their monolingual peers with TD and with DLD, it is
just that these profiles extend longer through the elementary school years. Moving beyond
morphosyntax, researchers have found that DLLs with DLD show no exceptional delays,
and have similar profiles of strengths and weaknesses in their L2 lexical and narrative skills
as those of monolingual children with DLD (Boerma et al., 2016; Govindarajan & Paradis,
2019; Sheng et al., 2013). Taken together, studies on morphosyntax, lexical, and narrative
skills indicate that the weight of evidence is against the CEH for bilingual children with DLD
(but see Verhoeven et al., 2011).
Turning to children with ASD, most research has focused on the impact of dual language
exposure on language and communication abilities in the preschool years. Comparisons
between DLLs and monolinguals with ASD have consistently found that dual language
exposure in the preschool years does not negatively impact children’s overall development
beyond the expected impacts of having ASD (Drysdale et al., 2015, Ohashi, Mirenda, et al.,
2012; Wang et al., 2018). Specifically, dual language exposure did not cause later onset of
first words, weaker expressive and receptive language skills, disadvantages in communicative
functioning or an increase in non-linguistic ASD characteristics.
In contrast to the research on DLLs with DLD, there is less focus on examining profiles in
L2 acquisition of school-age children with ASD and their TD peers. This is due, in part, to the
variable expressive language trajectories of children with ASD – some are minimally verbal
during elementary school while others have language abilities that have normalized for the
416
most part (Schwartz, 2017). Expressive narrative skills are often examined in school-age
children with ASD because they implicate all linguistic domains, crucially including discourse
pragmatics, which is a known weakness for children with ASD. Studies with DLLs from
diverse L1 backgrounds have found that children with ASD had less coherent story structures
and less frequent use of mental state vocabulary than their TD peers in their L2 narratives,
which is consistent with the profile of monolinguals with ASD compared to their neurotypical
peers (Govindarajan, 2020; Hoang et al., 2018). Furthermore, Gonzalez‐Barrero and Nadig
(2018) found that DLLs with ASD who were receiving the majority of their language input in
their French L2 had similar vocabulary and morphological skills to their neurotypical DLL
peers (see also Paradis et al., 2018).
There is even less research on DLLs with DS than on DLLs with ASD. Studies with
adolescents, either French–English or Spanish–English bilinguals, indicate that they show
abilities in their dominant language, English, in line with their mental ages, parallel to the
profile of monolingual adolescents with DS (Kay-Raining Bird, Cleave et al., 2005; Edgin
et al., 2011; Trudeau et al., 2011). DLLs with DS have similar profiles to their monolingual
peers with DS for morphosyntactic acquisition and word learning skills (Cleave et al., 2014;
Feltmate & Kay-Raining Bird, 2008). While both DS and ASD have more profound con-
sequences for children’s overall development than DLD, the limited research suggests that
the weight of evidence is not in favour of the CEH for these populations either.
Issues in Assessment With Dual Language Children

Clearly, assessment in both languages using culturally appropriate tests normed with bilin-
guals would be the optimal strategy (Peña et al., 2017); however, appropriate testing ma-
terials are rare. Furthermore, teachers, educational assistants or speech–language
pathologists might not be able to administer a test in a language other than the majority L2.
Unfortunately, the use of monolingual, norm-referenced testing materials with DLLs is a
common practice (Caesar & Kohler, 2007; Gutiérrez-Clellen et al., 2006; Roseberry-
McKibbin, 2018), and a major source of over-referral and over-identification of language
and learning disorders in DLLs. The challenges that arise in assessment with DLLs, and,
more briefly, on strategies to overcome them are presented here.
The assessment process often begins with referrals from teachers who have observed that a
child’s L2 development does not seem to be progressing normally. In addition, teachers often
complete check lists for their students to identify those at risk, e.g., the Early Development
Instrument (https://edi.offordcentre.com). Therefore, information about challenges and
strategies discussed here are relevant not only to speech–language pathologists and special
needs educators, but also to teachers, with implications for making referrals as well as
conducting assessments.
Bilingual Input and Development Factors

Due to different ages of L2 onset and the more complex language input and experience of
bilinguals (compared to monolinguals), individual variation in developmental trajectories is
greater in bilinguals and the shift from L1 to L2 dominance occurs at different ages (see
Paradis et al., 2021). Some DLLs are predominantly L2 speakers by kindergarten, whereas
others might remain dominant in the heritage-L1 for much longer due to stronger family,
community, or educational support. Tight relations between levels of L1 or L2 abilities and
chronological age cannot be expected with DLLs in the same way as they can with mono-
linguals, and yet speech–language assessment norms and benchmarks for language arts skills
417
Johanne Paradis
are usually referenced by age or grade. Therefore, use of monolingual norms in the L2 can
lead to TD children being over-identified as having a LCD because their scores are low for
their age, as referenced to their monolingual peers (Paradis et al., 2013). Similar over-
identification can occur using tests in the L1 normed with monolingual rather than heritage
speakers of that language (Barragan et al., 2018).
Profile Effects in Child L2 Acquisition

Assessment protocols often consist of a test battery, or an omnibus test with numerous
subtests, to generate a comprehensive view of a child’s linguistic competence across different
linguistic domains. L2 children’s developmental trajectories do not approach monolingual
levels of proficiency in all linguistic and literacy domains at the same pace (Chondrogianni &
Marinis, 2011; Oller et al., 2007; Paradis et al., 2013). Instead, they can display highly uneven
performance across different tests, called profile effects. Sometimes, DLLs have scores two
standard deviations below the normal range on one test or subtest, but well within the
normal range on another.
Profile effects are likely the result of whether a test probes “language-general” versus
“language-specific” abilities. Language-general abilities include perceptual, processing and
memory systems underlying language learning and use, or linguistic–cognitive interface
skills. Neither depends entirely on accumulated language-specific knowledge, and thus,
abilities could potentially be shared between the two languages of a bilingual. Examples of
these abilities would be verbal short-term memory (Sorenson Duncan & Paradis, 2016),
narrative story coherence (Paradis et al., 2013; Rezzonico, Chen et al., 2015), or phonological
awareness (Oller et al., 2007). By contrast, language-specific abilities include phonological,
lexical, or morphosyntactic knowledge particular to the target language.
In addition to uneven profiles across linguistic domains in the L2, another issue is overlap
between the most difficult-to-learn structures for monolingual children with LCDs (i.e.,
clinical markers), and those for TD L2 learners. For instance, even school-age English-
speaking children with DLD make numerous errors with verb morphology, while their TD
peers do not (Leonard, 2014). English-L2 speakers also make numerous errors with verb
morphology across 1–4 years of L2 exposure in school (Blom & Paradis, 2013;
Chondrogianni & Marinis, 2011). Overlap in profiles complicates the use of clinical markers
as a litmus test for initiating referrals and for diagnostic decision-making. Furthermore,
because clinical markers are language-specific, quick translations of English L2 tests into the
L1 of a child would not achieve interpretable results. Beyond the fact that the tests would not
be properly linguistically or culturally adapted, they are designed to target structures that
could be entirely different from those that children with LCDs find difficult in their L1.
Unique profiles across linguistic domains in the L2 can lead directly to over-identification.
Paradis et al. (2013) found 24% of L2 children aged 5–7 would be over-identified on a
narrative task, but 78% would have been over-identified on a test with verb morphology,
using monolingual norm-referencing.
Cultural and Psycho-Social Factors

Language use practices, and the values and beliefs that underlie them, vary across cultures
(see Paradis et al., 2021; Pesco & Crago, 2017). Children can grow up bicultural just as they
can grow up bilingual. How much children are spoken to directly, how much they are ex-
pected to speak with adults, as well as home literacy practices all vary across cultures;
moreover, cultural mismatches between home and school can influence classroom behaviour
418
and be a source of misunderstandings about children’s linguistic competence. In many cul-

tures, children are expected to not be talkative with adults, parents do not do a lot of book
activities with their children, or engage in question-and-answer routines to “check” the
child’s knowledge of colour words or names of farm animals. This means that many DLLs
not only need to learn the phonology, vocabulary and morphosyntax of the majority L2,
they also need to become familiar with school-based, mainstream cultural language use
practices and the value of literacy. A child who is quiet in class might be mistaken as a
struggling language learner when in fact he or she could be simply following the language use
practices of the home culture in not being talkative with adults (Pesco & Crago, 2017). Items
on a standardized test assume knowledge of foods, fairy tales, toys, and daily routines from
the mainstream culture that are not familiar to minority-culture children; therefore, the
mainstream cultural orientation of a test could make it biased against minority-culture
children.
Some differences between DLLs and their monolingual, middle class counterparts are not
directly attributable to culture per se, but more to social context factors, such as family SES.
DLLs from some ethnolinguistic communities can be from disproportionately low SES
backgrounds (DeNavas-Walt & Proctor, 2015; Yoshida & Amoyaw, 2020). Children from
lower SES backgrounds can have under-resourced home-language environments compared
with children from higher SES backgrounds, which affects language and literacy develop-
ment and academic outcomes (Lesaux et al., 2007; Prevoo et al., 2014). Many enriching home
literacy practices such as interactive book looking/reading between parents and children
occur less often in lower SES families, as they do in some minority cultures. These activities
not only promote language development, but also they socialize children into school culture.
Additional psycho-social factors pertain to first-generation refugee children in particular.
Many of these children have had adverse pre- and post-migration experiences that can put
their overall development and well-being at risk, for example, interrupted schooling in their
L1, displacement and frequent transitions, exposure to trauma and poverty (Graham et al.,
2016). These experiences can affect their acculturation to the new society, integration into the
school system, mental health, and their language learning (Stewart et al., 2019; Soto-
Corominas et al., 2020; Yohani et al., 2019). Therefore, it is possible that children from
refugee backgrounds might exhibit behaviours rooted in adjustment and mental health
problems that would interfere with classroom functioning, test-taking performance and
progress in L2 acquisition.
Ruling in or ruling out cultural and psycho-social factors as sources of low L2 perfor-
mance in the classroom or on a speech–language assessment is vital for decision-making
around referrals and diagnoses. A DLL may be in need of additional support because of
psycho-social issues, but this might be psychological rather than speech–language therapy.
Strategies for Assessment

The challenges posed by bilingual input and development factors, profile effects and cultural
and psycho-social factors are not insurmountable; several interrelated strategies can be
employed to achieve more accurate assessment. First, parent questionnaires can yield in-
formation on current language input and use, home literacy practices, as well as develop-
mental history in the L1. This information is relevant for teachers when considering whether
to make a referral as well as for clinicians in deciding on assessment tools and interpreting
results. Regarding direct assessment of the L2, ensuring test batteries include language-
general and not just language-specific tests, and putting additional weight on the results of
the language-general tests could reduce the risk of over-identification. Employing dynamic
419
Johanne Paradis
instead of static testing procedures with L2 tests would aid in gauging learning capacity
separately from existing language knowledge. Because children with LCDs have reduced
language learning capacities, the dynamic component would assist in distinguishing them
from their TD peers regardless of existing knowledge of the language of testing or cultural
mismatches. A final strategy, alternative norm referencing, focuses on changing the norm-
referencing system for tests in the L2, rather than changing the tests themselves, to reduce
bias in assessment. The logic here is that the performance of DLLs should be benchmarked
to other DLLs, not to monolinguals. Re-norming tests for child L2 speakers could be doable
in many school districts that conduct districtwide testing at regular intervals based on ages or
grades. Tests scores from DLLs could be separated from the aggregate data and used to
create local bilingual norms for subsequent testing of DLLs in the district. DLLs with scores
below a certain threshold on the bilingual norms could be considered at risk for language or
learning disorders, prompting further assessment. For more details on these strategies, tools
for implementing them and the research base underlying them, see Paradis et al. (2021).
Issues in Intervention With Dual Language Children With LCDs

A disconnect exists between the research discussed in Capacity for dual language development
in children with LCDs and the support for continued dual language development for children
post-diagnosis in the educational and clinical communities. This is because parents are often
advised to use the L2 only with their children. In contrast to this “elective monolingualism”
perspective, research grounded in support for affected children’s dual language development
has focused on examining the benefits of dual language intervention, as well as on the
challenges of supporting both languages when educators and clinicians do not speak a
child’s L1.
Switching to Monolingual Development in the L2

Heritage-L1 development in DLLs with LCDs is more precarious than for DLLs with TD.
First, DLLs with LCDs often experience more intense exposure to the L2 early in life than
their TD peers, through intervention programmes in the preschool years (Hampton et al.,
2017; Paradis et al., 2018), which might mean shifting to L2 dominance at a younger age
(Ebert et al., 2014; Simon-Cereijido et al., 2013). Second, children with LCDs might face less
support for dual language learning from educators because the societal L2 would be
prioritized. Therefore, if a bilingual child with a LCD shows rapid L1 attrition, it is possible
that the child’s capacity for bilingualism is intact, but his or her opportunities to become
bilingual have been compromised.
Studies examining parent experiences with advice from doctors, speech–language pa-
thologists, psychologists, teachers, family, and friends have found that it is relatively
common – although not universal – for parents of children with ASD to be discouraged from
continuing to use the heritage language with their child (Drysdale et al., 2015; Hampton
et al., 2017; Jegatheesan, 2011; Kay‐Raining Bird et al., 2012; Paradis et al., 2018; Yu, 2013).
Common reasons given for this advice included the following: not “confusing” the child, not
hindering the child’s language and communicative development, and promoting consistency
with the language of therapy and school. For parents who did stop speaking the heritage
language, some reported growing communication barriers between the child with ASD and
other family members. For parents who did not follow this advice, or who tried to but
returned to using both languages, did so for the following reasons: (1) They did not feel
natural speaking in their L2 to their children; (2) their L2 proficiency was insufficient for
420
doing language and communication exercises with their child, or for fully expressing
themselves more generally, and (3) they valued their child’s ability to communicate with
extended family members and to develop their ethnic identity.
A final point regards the benefits of bilingualism for heritgage-L1 children.
Interdependence between the L1 and L2 that supports L2 development and the potential
cognitive advantages of bilingualism are widely recognized, and there is no reason to believe
they apply any less to heritage-L1 children with LCDs. Another key reason for supporting
continued development of the heritage-L1 is children’s socio-emotional well-being, as an
individual and within the family unit, which is dependent on parents and children being able
to communicate easily with each other, and thus is enhanced by children’s proficiency in the
heritage-L1 (Oh & Fuligni, 2020). Because parents of children with moderate-to-severe
LCDs are likely to be long-term caregivers, and the family context likely to be a primary
source of social interaction, elective monolingualism of one family member, the child with an
LCD, could be detrimental to that child’s overall development.
Supporting Both Languages of Children With LCDs

Studies of DLLs at risk and DLLs with LCDs have found evidence for the benefit of L1-
based or dual language intervention, including home- and school-support models (Cheatham
et al., 2012; Durán et al., 2016; Ebert et al., 2014; Lim et al., 2019; Simon-Cereijido et al.,
2013). Furthermore, evidence for cross-language interdependence was often found, mainly
from L1 to L2. However, the effects of dual language intervention on L1 development were
often less robust than those on L2 development.
While there is strong evidence for supporting both languages of children with LCDs,
achieving this support when educators and clinicians do not speak the heritage-L1 might
not seem straightforward. Research indicates that parents can be an effective source of
intervention support for the heritage-L1 at home (Drysdale et al., 2015; Johnson et al.,
2012; Kohnert et al., 2005; Lim et al., 2019). Kohnert et al. (2005) suggested using team
approaches to intervention involving partners, such as cultural brokers or interpreters,
educational assistants, siblings, and peers, in addition to parents. In their systematic re-
view, Durán et al. (2016) examined studies of parent intervention with DLLs at risk and
DLLs with LCDs. Home intervention programmes centred on increasing language sti-
mulation, such as parent–child book-reading activities, including question asking and ut-
terance expansion. Durán and colleagues found evidence for positive effects of parent
intervention programmes in both the L1 and the L2. However, two cautionary notes re-
garding the use of parents in intervention stem from issues discussed earlier about cultural
and psycho-social context. First, home-language programming should be adapted to suit
the culturally rooted language use patterns particular to a family. Encouraging siblings and
peers to be actively engaged as communication partners with the affected child could fit
better with cultural norms (Kohnert et al., 2005). Second, when working with recently
arrived refugee families, it is important to realize that parents might be dealing with war-
related trauma, the stresses of poverty, and acculturation, thus engaging fully with their
children could be a struggle (Stewart et al., 2019). In these cases, recruiting other family
members or community partners for a team approach should be considered.

• Children with LCDs should be expected to have the capacity to learn two languages.
• DLLs with LCDs should be expected to have similar profiles of strengths and
421
Johanne Paradis
weaknesses in their L2 development as monolingual children with LCDs learning that

language.
• The following points should be considered in decisions to refer a student to specialists for
assessment, and for the administration and interpretation of speech–language assessments:
• DLLs have unequal exposure to their languages and differences in age of L2 onset;
these can be reasons for low L2 performance
• DLLs display an uneven profile of L2 performance when compared to monolingual
peers; they “catch up” on some linguistic domains faster than others
• DLLs with TD can overlap in L2 speech errors with monolinguals with LCDs the
same age
• DLLs can have cultural and psycho-social contexts that influence their language use
patterns, their rate of L2 acquisition, and their school performance
• Counselling parents to raise their children with LCDs as monolingual speakers of the L2
should be avoided. It can have negative consequences for the children and their families.
• Both languages of DLLs with LCDs should be supported as part of general and special
educational programming and clinical intervention; this can be achieved even if pro-
fessionals do not speak both languages. Parents and other family members can be re-
sources for supporting both languages.
6 Future Directions
Research at the intersection of child bilingualism and developmental disorders has grown
immensely since the start of the 21st century and has increased our understanding of the
capacity for children with LCDs to learn two languages, the strategies to improve ac-
curate speech–language assessment with DLLs, and the importance of and potential for
supporting both languages in intervention. Nevertheless, there are still gaps in knowledge
that future research should address. First, more information is needed on the char-
acteristics of L2 acquisition beyond the pre-school years for children with LCDs. Fully
understanding the capacity for bilingualism must include a long-term perspective, espe-
cially to see if L2 outcomes meet expectations in line with the developmental disorder,
what the connections between language and reading development are, and if the heritage-
L1 is at greater risk of attrition than for TD DLLs. Second, more research on dual
language development and disorders beyond DLD is needed. More systematic informa-
tion on DLLs with ASD, DS as well as on DLLs with hearing impairment, attention-
deficit-hyperactivity disorder and dyslexia would be useful for educators and clinicians
because these additional developmental disorders can have an impact on language and
reading development in the school years.
Further Reading
Goldstein, B., & Conboy, B. (Eds.). (2021). Bilingual language development and disorders in
Spanish–English speakers (3rd edn). Baltimore, MD: Paul H. Brookes Publishing.
An edited volume covering topics in dual language development of Spanish-English children with re-
levance for clinical practice. Chapters are organized around different language and literacy domains
providing a comprehensive research review and discussion of phonological, lexical, semantic, mor-
phosyntactic, narrative and reading development in Spanish-English TD children and children
with LCDs.
Paradis, J., Genesee, F., & Crago, M. (2021). Dual Language development and disorders: A handbook on
bilingualism and second language learning (3rd edn). Baltimore: Brookes Publishing.
422
This comprehensive research review and discussion of dual language development in children with typical and
atypical development covers simultaneous bilinguals, child L2 acquisition, heritage language acquisition and
second language education, including chapters on language and readings disorders in DLLs.
Peña, E., Bedore, L., & Baron, A. (2017). Bilingualism and child language disorders. In R. Schwartz
(Ed.), Handbook of child language disorders (2nd edn, pp. 297–327). New York, NY: Routledge.
A concise overview of the issues in clinical practice with bilingual children, focusing on dual language
assessment and intervention with Spanish-English speaking children.
Roseberry-McKibbin, C. (2018). Multicultural students with special language needs: Practical strategies
for assessment and intervention (5th edn). Oceanside, CA: Academic Communication Associates.
This book has a strong applied orientation and includes practical strategies and resources for clinicians
and educators working with culturally and linguistically diverse children in preschool and elementary
school.
References
Arsenian, S. (1945). Bilingualism in the post-war world. Psychological Bulletin, 42, 65–85.
Barragan, B., Castilla-Earls, A., Martinez-Nieto, L., Restrepo, M., & Gray, S. (2018). Performance of
low-income dual language learners attending English-only schools on the Clinical Evaluation of
Language Fundamentals–Fourth Edition, Spanish. Language, Speech, and Hearing Services in
Schools, 49(2), 292–305. doi: 10.1044/2017_LSHSS-17-0013
Bialystok, E. (2011). Reshaping the mind: The benefits of bilingualism. Canadian Journal of Experimental
Psychology/Revue canadienne de psychologie expérimentale, 65(4), 229–235. doi: 10.1037/a0025406
Blom, E., & Paradis, J. (2013). Past tense production by English second language learners with and
without impairment. Journal of Speech, Language and Hearing Research, 56, 1–14.
Boerma, T., Leseman, P., Timmermeister, M., Wijnen, F., & Blom, E. (2016). Narrative abilities of
monolingual and bilingual children with and without language impairment: Implications for clinical
practice. International Journal of Language and Communication Disorders, 51(6), 626–638.
Caesar, L., & Kohler, P. (2007). The state of school-based bilingual assessment: Actual practice versus
recommended guidelines. Language, Speech and Hearing Services in Schools, 38, 190–200.
Cheatham, G. A., Santos, R. M., & Kerkutluoglu, A. (2012). Review of comparison studies in-
vestigating bilingualism and bilingual instruction for students with disabilities. Focus on Exceptional
Children, 45(3), 1–12.
Chondrogianni, V., & Marinis, T. (2011). Asynchronous development of vocabulary, morphology and
complex syntax in successive bilingual children: Differential effects of internal and external factors.
Linguistic Approaches to Bilingualism, 1, 318–345.
Cleave, P., Kay-Raining Bird, E., Trudeau, N., & Sutton, A. (2014). Syntactic bootstrapping in children
with Down syndrome: The impact of bilingualism. Journal of Communication Disorders, 49, 42–54.
Cummins, J. (2000). Language, power and pedagogy: Bilingual children in the crossfire. Clevedon,
England: Multilingual Matters.
Darcy, N. (1946). The effect of bilingualism upon the measurement of the intelligence of children of
preschool age. Journal of Educational Psychology, 37, 21–44.
DeNavas-Walt, C., & Proctor, B. D. (2015). Income and poverty in the United States: 2014. Washington,
DC: U.S. Census Bureau.
Drysdale, H., van der Meer, L., & Kagohara, D. (2015). Children with autism spectrum disorder from
bilingual families: A systematic review. Review Journal of Autism and Developmental Disorders, 2,
26–38. doi: 10.1007/s40489-014-0032-7
Dulay, H., Burt, M., & Krashen, S. (1982). Language two. Oxford, UK: Oxford University Press.
Durán, L. K., Hartzheim, D., Lund, E. M., Simonsmeier, V., & Kohlmeier, T. L. (2016). Bilingual and
home language interventions with young dual language learners: A research synthesis. Language,
Speech, and Hearing Services in the Schools, 47, 347–371.
Ebert, K., Kohnert, K., Pham, G., Disher, J., & Payesteh, B. (2014). Three treatments for bilingual
children with primary language impairment: Examining cross-linguistic and cross-domain effects.
Journal of Speech, Language, and Hearing Research, 57, 172–186.
Edgin, J. O., Kumar, A., Spano, G., & Nadel, L. (2011). Neuropsychological effects of second language
exposure in Down syndrome. Journal of Intellectual Disability Research, 55, 351–356.
Feltmate, K., & Kay-Raining Bird, E. (2008). Language learning in four bilingual children with Down
syndrome: A detailed analysis of vocabulary and morphosyntax. Canadian Journal of Speech-
Language Pathology and Audiology, 32, 6–20.
423
Johanne Paradis
Gonzalez‐Barrero, A. M., & Nadig, A. (2018). Bilingual children with autism spectrum disorders: The
impact of amount of language exposure on vocabulary and morphological skills at school age.
Autism Research, 11, 1667–1678. doi: 10.1002/aur.2023
Govindarajan, K. (2020). Narrative abilities of bilingual children with autism spectrum disorder, devel-
opmental language disorder and typical development. Unpublished doctoral dissertation, University
of Alberta, Canada.
Govindarajan, K., & Paradis, J. (2019). Narrative abilities of bilingual children with and without de-
velopmental language disorder (SLI): Differentiation and the role of age and input factors. Journal
of Communication Disorders, 77, 1–16.
Graham, H. R., Minhas, R. S., & Paxton, G. (2016). Learning problems in children of
refugee background: A systematic review. Pediatrics, 137(6), e20153994. doi: doi: 10.1542/peds.
2015-3994
Guterrez-Clellen, V. (1996). Language diversity: Implications for assessment. In K. Cole, P. Dale, & D.
Thal (Eds.), Assessment of communication and language (pp. 29–56). Baltimore: Brookes.
Gutiérrez-Clellen, V., Restrepo, A., & Simon-Cereijido, G. (2006). Evaluating the discriminant accu-
racy of a grammatical measure with Spanish-speaking children. Journal of Speech, Language and
Hampton, S., Rabagliati, H., Sorace, A., & Fletcher-Watson, S. (2017). Autism and bilingualism: A
qualitative interview study of parents’ perspectives and experiences. Journal of Speech, Language and
Hakuta, K. (1986). The mirror of language. The debate on bilingualism. New York: Basic Books.
Hoang, H., Gonzalez-Barrero, A. M., & Nadig, A. (2018). Narrative skills of bilingual children with
autism spectrum disorder. Discours, 23. Retrieved from http://journals.openedition.org/discours/985
6. doi: 10.4000/discours.9856
Jacobson, P. F., & Yu, Y. H. (2018). Age-related changes in English past tense by bilingual children
with and without developmental language disorders. Journal of Speech-Language and Hearing
Research, 61(10), 2532–2546.
Jegatheesan, B. (2011). Multilingualism and autism: Perspectives of South Asian Muslim immigrant
parents on raising a child with a communicative disorder in multilingual contexts. Bilingual Research
Journal, 34, 185–200.
Johnson, Y. U., Martinez-Cantu, V., Jacobson, A. L., & Weir, C.-M. (2012). The home instruction for
parents of preschool youngsters ’program’s relationship with mother and school outcomes. Early
Education & Development, 23(5), 713–727.
Kay-Raining Bird, E., Cleave, P., Trudeau, N., Thordardottir, E., Sutton, A., & Thorpe, A. (2005). The
language abilities of bilingual children with Down Syndrome. American Journal of Speech-Language
Pathology, 14, 187–199.
Kay-Raining Bird, E., Genesee, F., Sutton, A., Chen, X., Oracheski, J., Pagan, S., Squires, B., Burchell,
D., & Sorenson, T. D. (2020). Access and outcomes of children with special education needs in early
French immersion. Journal of Immersion and Content-Based Language Instruction. doi: 10.1075/jicb.2
0012.kay
Kay‐Raining Bird, E., Lamond, E., & Holden, J. (2012). Survey of bilingualism in autism spectrum
disorders. International Journal of Language & Communication Disorders, 47, 52–64. doi: 10.1111/j.14
60-6984.2011.00071.x
Kohnert, K. (2010). Bilingual children with primary language impairment: Issues, evidence and im-
plications for clinical actions. Journal of Communication Disorders, 43, 456–473. doi: 10.1016/
j.jcomdis.2010.02.002
Kohnert, K., Yim, D. S., Nett, K., Kan, P. F., & Duran, L. (2005). Intervention with linguistically
diverse preschool children: A focus on developing home language(s). Language, Speech and Hearing
Services in Schools, 36, 251–263.
Lesaux, N. K., Rupp, A. A., & Siegel, L.S. (2007). Growth in reading skills of children from diverse
linguistic backgrounds: Findings from a 5-year longitudinal study. Journal of Educational
Psychology, 99(4), 821–834.
Leonard, L. (2014). Introduction (Chapter 1; pp 3-35). Children with specific language impairment (2nd
edn). Cambridge, MA: MIT Books.
Lim, N., O’Reilly, M. F., Sigafoos, J., Ledbetter-Cho, K., & Lancioni, G. E. (2019). Should heritage
languages be incorporated into interventions for bilingual individuals with neurodevelopmental
disorders? A systematic review. Journal of Autism and Developmental Disorders, 49, 887–912. doi: 1
0.1007/s10803-018-3790-8
424
Morgan, G., Restrepo, M., & Auza, A. (2013). Comparison of Spanish morphology in monolingual
and Spanish–English bilingual children with and without language impairment. Bilingualism:
Language and Cognition, 16(3), 578–596. doi: 10.1017/S1366728912000697
Oh, J. S., & Fuligni, A. J. (2010). The role of heritage language development in the ethnic identity and
family relationships of adolescents from immigrant backgrounds. Social Development, 19, 202–220.
doi: 10.1111/j.1467-9507.2008.00530.x
Ohashi, J. K., Mirenda, P., Marinova-Todd, S., Hambly, C., Fombonne, E., Szatmari, P., &
Thompson, A. (2012). Comparing early language development in monolingual- and bilingual-
exposed young children with autism spectrum disorders. Research in Autism Spectrum Disorders,
6(2), 890–897.
Oller, D. K., Pearson, B., & Cobo-Lewis, A. B. (2007). Profile effects in early bilingual language and
literacy. Applied Psycholinguistics, 28, 191–230.
Paradis, J. (2007). Bilingual children with SLI: Theoretical and applied issues. Applied Psycholinguistics,
28, 512–564.
Paradis, J. (2010). The interface between bilingual development and specific language impairment.
Keynote article for special issue with peer commentaries. Applied Psycholinguistics, 31, 3–28.
Paradis, J., Genesee, F., & Crago, M. (2021). Dual language development and disorders: A handbook on
bilingualism and second language learning (3rd edn). Baltimore: Brookes Publishing.
Paradis, J., Govindarajan, K., & Hernandez, K. (2018). Bilingual development in children with autism
spectrum disorder from newcomer families. Education and Research Archive. 10.7939/R31V5BT9X
Paradis, J., Schneider, P., & Sorenson Duncan, T. (2013). Discriminating children with language im-
pairment among English language learners from diverse first language backgrounds. Journal of
Speech, Language and Hearing Research, 56, 971–981.
Peal, E., & Lambert, W. E. (1962). The relation of bilingualism to intelligence. Psychological
Monographs, 76, 1–23.
Peña, E., Bedore, L., & Baron, A. (2017). Bilingualism and child language disorders. In R. Schwartz
(Ed.), Handbook of child language disorders (2nd edn, pp. 297–327). New York, NY: Routledge.
Pesco, D., & Crago, M. B. (2017). Language socialization in Canadian indigenous communities. In P.
Duff & S. May (Eds.), Language socialization. Encyclopedia of language and education (3rd edn,
pp. 291–307). Springer, Cham.
Prevoo, M. J. L., Malda, M., Mesman, J., Emmen, R. A. G., Yeniad, N., Van Ijzendoorn, M. H., &
Linting, M. (2014). Predicting ethnic minority children’s vocabulary from socioeconomic status,
maternal language and home reading input: Different pathways for host and ethnic language.
Journal of Child Language, 41(5), 963–984.
Rezzonico, S., Chen, X., Cleave, P. L., Greenberg, J., Hipfner-Boucher, K., Johnson, C. J., Milburn, T.,
Pelletier, J., Weitzman, E., & Girolametto L. (2015). Oral narratives in monolingual and bilingual
preschoolers with SLI. International Journal of Language and Communication Disorders, 50(6), 830–841.
Roseberry-McKibbin, C. (2018). Multicultural students with special language needs: Practical strategies
for assessment and intervention (5th edn). Oceanside, CA: Academic Communication Associates.
Schwartz, R. (2017). Handbook of child language disorders. New York, NY: Routledge.
Sheng, L., Bedore, L. M., Peña, E. D., & Taliancich-Klinger, C. (2013). Semantic convergence in
Spanish-English bilingual children with primary language impairment. Journal of Speech, Language,
and Hearing Research, 56(2), 766–777.
Simon-Cereijido, G., Gutiérrez-Clellen, V. F., & Sweet, M. (2013). Predictors of growth or attrition of
the first language in Latino children with specific language impairment. Applied Psycholinguistics,
34(6), 1219–1243. doi: 10.1017/S0142716412000215
Sorenson Duncan, T ., & Paradis, J. (2016). English language learners’ nonword repetition perfor-
mance: The influence of L2 vocabulary size, length of L2 exposure and L1 phonology. Journal of
Speech Language and Hearing Research, 59, 39–48.
Soto-Corominas, A., Paradis, J., Al Janaideh, R., Vitoroulis, I., Chen, X., Georgiades, K., Jenkins, J.,
& Gottardo, A. (2020). Socioemotional wellbeing influences bilingual and biliteracy development in
Syrian refugee children. In M. Brown & A. Kohut (Eds.), Proceedings of the 44thBoston university
conference on language development (pp. 620–633). Somerville, MA: Cascadilla Press.
Stewart, J., El Chaar, D., McCluskey, K., & Borgardt, K. (2019). Refugee student integration: A focus
on settlement, education, and psychosocial support. Journal of Contemporary Issues in Education,
14(1), 55–70.
Trudeau, N., Kay-Raining Bird, E., Sutton, A., & Cleave, P. (2011). Développement lexical chez les
enfants bilingues ayant le syndrome de Down. Enfance, 2011(3), 383–404.
425
Johanne Paradis
Verhoeven, L., Steenge, J., & van Balkom, H. (2011). Verb morphology as a clinical marker of specific
language impairment: Evidence form first and second language learners. Research in Developmental
Disabilities, 32, 1186–1193.
Wang, M., Jegathesan, T., Young, E., Huber, J., & Minhas, R. (2018). Raising Children with
Autism Spectrum Disorders in Monolingual vs Bilingual Homes: A Scoping Review. Journal of
Developmental and Behavioral Pediatrics, 39(5), 434–446.
Westernoff, F. (1991). The assessment of communication disorders in second language learners. Journal
of Speech-Language Pathology, 15(4), 73–79.
Yohani, S., Brosinsky, L., & Kirova, A. (2019). Syrian refugee families with young children: An ex-
amination of strengths and challenges during early resettlement. Journal of Contemporary Issues in
Education, 14(1), 13–32.
Yoshida, Y., & Amoyaw, J. (2020). Looking beyond labour market integration: Household conditions
surrounding refugee children in Canada. In A. Korntheuer, D. B. Maehler, P. Pritchard, & L.
Wilkinson (Eds.), Refugees in Canada and Germany: Responses in policy and practice. Köln: GESIS -
Leibniz-Institut für Sozialwissenschaften.
Yu, B. (2013). Issues in bilingualism and heritage language maintenance: Perspectives of minority-
language mothers of children with autism spectrum disorders. American Journal of Speech-Language
Pathology, 22(1), 10–24.
426
30
TRAINING INTERPRETERS
Jim Hlavac
Interpreter training is a younger area of research than second language acquisition (SLA).
The oldest schools were established in the 1940s and 1950s, and it was not until the 1980s
that interpreting pedagogy emerged as an area distinct from translation pedagogy. Similar to
SLA, interpreter training has been influenced by various approaches describing linguistic
production, but there has been relatively little cross-over. This is surprising as both dis-
ciplines have a strong focus on the linguistic and extra-linguistic abilities of learners.
Learning and information processing strategies utilized by successful interpreter trainees are
those that many language learners employ (Zannirato, 2008). In interpreter pedagogy,
strategies that successful learners should develop are commonly found in SLA research: self-
motivation (Dörnyei, 1994), segmentation of input (Pica, 1994), anticipation and inferencing
(Laviosa, 2014), restructuring and paraphrasing (Nabei & Swain, 2002), use of prosodic and
non-verbal features (Jenkins & Parra, 2003), memorizing input (Gu & Johnson, 1996), and
monitoring output of production and repairing errors (Kormos, 2006). Reflecting on SLA
from an Interpreter Studies perspective, Dejean (2000, p. 9) considers that “methods used by
interpreting students to perfect a language can obviously be of interest to those who wish to
achieve a true command of a foreign language.”
Essentially, interpreting is the transfer of verbal or signed messages from one language
into another. There are also situational factors pertaining to how most interpreting is per-
formed: immediacy and finality. Thus, a widely accepted contemporary definition of inter-
preting is: “Interpreting is a form of Translation in which a first and final rendition in another
language is produced on the basis of a one-time presentation of an utterance in a source
language” (Pöchhacker, 2016, p. 11. Original emphasis).
Terms such as “one-time presentation” and “first and final rendition” refer to the
mental and verbal dexterity that interpreters must have to readily understand, remember,
transfer, and re-produce speech from one language into another. This is the hallmark of
interpreting and why it is a “special” kind of speaking with at least six levels; components of
the first four are addressed here:
• the linguistic features of spoken language: phonology, lexicon, morphosyntax, prosody,

and pragmatics
DOI: 10.4324/9781003022497-36 427

Jim Hlavac
• presentation features: phonation, rhetoric, pragmatics and kinesics, discourse-

management skills
• interactional features: setting and scene, participants, ends, act sequence, key, in-
strumentalities, norms, and genre (Hymes, 1974)
• inter-lingual transfer: comprehension, retention, conversion, re-organization, and re-
production
• inter-cultural communication: knowledge of the discourse and pragmatic norms of the
speech communities of both languages such that the illocutionary acts of the original
utterances are matched with equivalent illocutionary acts in interpreted speech
• role-based features: ethical principles (e.g., accuracy and impartiality), workplace pro-
tocols and normative notions of an interpreter’s role (e.g., when re-presenting others’
speech, when speaking as oneself).
Here, I examine interpreting between spoken languages only but not sign language inter-
preting; almost all sign interpreters are L1 speakers of the spoken language. The interested
reader is referred to Nicodemus and Emmorey (2015).
Here, we focus on individuals interpreting from the L1 into their L2. Interpreters with
three or more languages typically still work into their L1 or L2 only. Within Interpreting
Studies, A, B, and C are used to refer to interpreters’ L1, L2, and L3, respectively (with C
encompassing L3, L4, and L5) (AIIC, 2012). To align this chapter with the rest of this book,
the terms L1 and L2 are used, with the latter term employed as a hypernym for all non-
dominant languages.
Terms specific to interpreting are source and target. Source refers to the original speech
form that the interpreter hears, or source speech/text. By analogy, the language from which
they interpret is the source language. The term target refers to the interpreter’s spoken
output, that is, their target speech/text, and the language into which they interpret, the target
language. Target is to be understood in this way, and should not be confused with the use of
target in SLA where it refers to an optimal form that a learner typically aspires to.
Four modes of interpreting exist: dialogue interpreting stretches of source speech of 1–50
words using short-term memory skills with minimal or no note-taking from and into both
languages or bidirectionally; consecutive interpreting of source speeches of >50 words
(usually between 150 and 1,000 words) using notes and memory skills and working either
monodirectionally or bidirectionally; simultaneous interpreting of source speech that is al-
most contemporaneous, with a delay of 0.5–5 seconds (called décalage) usually working
monodirectionally, less often bidirectionally; sight translation which is the reading of a
written source text and giving a spoken interpretation of it, usually monodirectionally. All
modes of interpreting can require interpreters to work bidirectionally to some extent, thus
giving spoken output in their L2 as well as their L1.
Interpreting is commonly defined according to the field in which the interpreter works,
principally two main fields with some overlap: conference interpreting encompassing inter-
national meetings, high-level government and diplomats’ meetings, business and commerce,
media and press conferences; and public-service or community interpreting encompassing
healthcare, social welfare and other government services, police/legal/asylum/courtroom/
prison settings, education, housing and employment, family violence, sport, faith-based
organizations, humanitarian and emergency situations, retail and customer service, sport,
and some media events. Conference interpreting modes include simultaneous and con-
secutive interpreting (very common), and dialogue interpreting and sight translation (less
common). The modes in public service interpreting are dialogue and consecutive interpreting
(very common), and sight translation and simultaneous interpreting (less common). While
428
Training Interpreters
many interpreters favour working in one or two modes only (most untrained interpreters can
undertake dialogue interpreting and sight translation only) and many specialize in one field
or thematic area or receive work mainly in this field, the spread of fields means that most
interpreters work into their L2 with some frequency.
The training of interpreters dates back to the mid-20th century; models on interpreting
performance are even more recent. Attempts to locate speaking in either pedagogically based
or practice-based descriptions of interpreting yield few incidences in which it is overtly
mentioned. Speaking is conceived of as a feature of performance where attention is focused
on other things – usually the fidelity of transfer of the referential content from one language
into another. This is unsurprising for a discipline which by name implicitly refers to the
transfer of messages cross-linguistically.
Examining the professionalization of interpreting and comments received by the “first
generation” of professional interpreters about their performance, Baigorri-Jalón (2004, p. 82)
surmises that “the average listener appreciated more the rhetorical fluency than the presumed
accuracy of their interpretations.” This statement applies to almost all users of interpreting
services: it is the interpreter’s speaking skills that strongly determine others’ notions of their
performance rather than accuracy of translation; the latter is something that most users are
unable to ascertain (Kurz, 2001).
Interpreting into an L2 compared to an L1 has been a hotly debated issue in both training
and practice. Fluency, grammatical accuracy, and rhetorical skills are more easily displayed
in one’s L1, and interpreting into the L1 as the preferred direction became a principle ad-
vocated by one of the earliest interpreter educators, Danica Seleskovitch (1978). But an
equally influential school of interpreter training focuses on the source speech being the in-
terpreter’s L1. The reasoning behind this is that the interpreter must fully understand the
source speaker and this is most likely when the language of the source speaker is also the
interpreter’s L1. Thus, the interpreter’s interpretation into the L2 may contain phonological,
grammatical or other shortcomings, but the level of accuracy in the transfer of referential
content is likely to be higher because there is less chance of misunderstanding the source
speech (Denissenko, 1989).
Empirical studies on directionality were not undertaken until this century; the evidence
from these is mixed. Studies examining linguistic accuracy in target speech production
amongst the same cohorts of interpreters have found that interpreting into the L1 is superior
(Chang & Schallert, 2007). Other studies that tested for the variables of anticipation, that is,
predicting source-speech constituents not yet available for the interpreter’s output planning
(Kurz & Färber, 2003) or working memory taking in L1 input (Gorton, 2012) have found
interpreting into the L2 to be superior. Kalina (2005) and Gile (2009) argue that factors such
as the language–pair combinations, topic area, and familiarity with content can outweigh the
issue of directionality into one’s L1 or L2. Surveys of users such as conference delegates
reveal no strong preferences of listening to interpreters speaking into the interpreters’ L1
compared to their L2 (Donovan, 2004). These developments are mirrored by similar ones in
SLA and teacher training that show that the existence of an L2 accent should not prevent a
person from working as an educator (Derwing et al., 2014).
The debate is now largely over; most university programmes teaching conference or si-
multaneous interpreting include training for and assessment of students to work into their L2
(Lim, 2005), albeit with a lower weighting of assessment than into the L1 (EMCI, 2018). The
debate has also been put to rest due to the emergence of public-service interpreting in which
429
Jim Hlavac
interpreters work bidirectionally in settings feature dyadic (dialogic) or multi-party ex-

changes consisting mainly of turns that interpreters interpret consecutively using short-term
memory (i.e., 2–4 utterances). Generalist interpreter certification tests require demonstration
of all interpreting modes bidirectionally (NAATI, 2019).

Both the emergence of occupation-based benchmarks in regard to the skill level of inter-
preters and the popularization of interpreting courses in SLA in some countries are critical
issues.
As the language services industry expands in many countries and more people work in it
or are consumers of its services, industry-based bodies have set descriptions of interpreter
skill level. Supra-national, regional and national standards exist that describe (rather than
prescribe) behaviours and actions interpreters display. These standards often contain a mix
of descriptions of speaking in terms of competence and performance.
ASTM (American Society for Testing and Materials) International, a standards authority
for North America, provides a competence and performance-based description of an inter-
preter’s L2 speaking proficiency:
Full Functional Proficiency. Able to use language fluently and accurately on all
levels pertinent to professional needs. Examples—Understands the details and ra-
mifications of concepts that are culturally or conceptually different from one’s own.
Can set the tone of interpersonal, official, semi-official, professional, and non-
professional verbal exchanges with a representative range of native speakers (for all
audiences, purposes, tasks, and settings). Can play an effective role among native
speakers in such contexts as negotiations, conferences, lectures, and debates on
matters of disagreement. Can advocate a position at length, both formally and in
chance encounters, using sophisticated verbal strategies.
(ASTM International, 2007, p. 2).
Some certifying authorities identifying macro-skills include speaking as an embedded and/or

discreet aspect of test candidate performance. Assessing the performance of certification
candidates in tests, the National Accreditation Authority of Translators and Interpreters
(NAATI) in Australia has rubrics-based descriptions for performance. Language competence,
that is, “language proficiency enabling meaning transfer” in both English and a language
other than English (LOTE) are assessed, one of which is the test taker’s L2. The band for the
highest language level is as follows: “consistently uses spoken language and idiomatically,
demonstrated by accomplished use of pragmatics, lexicon, grammar, syntax, style and reg-
ister” (NAATI, 2019, p. 1).
The second critical issue relating to interpreter training is the procedure of learners un-
dertaking oral, inter-lingual transfer exercises with other SLA exercises to advance their L2
oral production skills. Thus, interpreting is a means to an end, and neither learner nor in-
structor has the desired learning outcome that the learner at a later point will be equipped to
work as an interpreter. This is analogous to a practice in SLA that lasted for decades: written
translation (Laviosa, 2014).
Interpreting exercises undertaken as an L2 learning activity have been advocated by
educators such as Cho (2007), while Park (1999) reports that such exercises enjoy popularity
amongst students too: “[u]ndergraduate students chose interpretation courses for developing
their communicative competence rather than interpretation skills” (p. 47). Interpreting as an
430
SLA activity has been most widely researched in relation to English as the L2: several such
studies have appeared in East Asian countries such as Japan, South Korea, and China (Lee,
2014). Interpreting exercises are employed to counter some students’ reticence to speak or
other students’ difficulties in “knowing what to say”; thus L1 input is used as a catalyst for
learners to speak in the L2 via inter-linguistic transfer. For instance, Lee (2014) reports the
use of sight translation and consecutive interpreting of sentence-long L1 segments that
learners recorded on their smartphones. Learners then compared their L2 target speech with
aspirational L2 target renditions provided by the educator. Students’ confidence levels are
reported to increase but no other outcomes in measuring spoken L2 are reported. In general
interpreter training exercises of transferring utterance-length source speech into target speech
are restricted to the early stages of learning dialogue interpreting only.
Another development specific to some universities in East Asian countries is interpreting
streams (usually 2–6 semesters) designated as an English Language Major or Translation and
Interpreting Specialisation. In Japan, a national policy, the “Action plan to foster Japanese
who can use English,” was launched in 2003 which led to a large expansion of interpreting
and translation programmes in universities (Komatsu, 2016).
In a study of an interpreting programme at a Japanese university with data from 8 in-
structors and 19 students in a 6-semester programme, Giustini (2020, pp. 8–11) identified a
sub-optimal level of speaking proficiency in English at the entry point as one factor in
students’ performance (B2 on the CEFR). Great difficulties were encountered and reported
by students (and instructors) in interpreting consecutively and simultaneously into their L2,
English. Only dialogue interpreting was well-performed, providing learners with a sense of
improving their spoken proficiency in English (Giustini, 2020). These findings make sense
only when one considers that the overall goal of the programme was not to equip learners to
become interpreters. Trainers were unconcerned if learners could not adequately perform
consecutive and simultaneous interpreting. According to one instructor, “the aim of the
course is to provide language instruction through practice-related interpreting subjects…We
use this innovative training system so that students pursue a systematic acquisition of
communicative skills” (Giustini, 2020, p. 7). In a wider context, this statement also makes
sense when one considers that most practising interpreters in Japan are not graduates of
university interpreting programmes but graduates of private-sector vocational colleges af-
filiated with corporations or interpreting agencies (Komatsu, 2016). Those students in uni-
versity programmes who do hope to become interpreters are reported to “experience
disappointment and disheartenment over their training, with possibly more negative than
positive effects on their ultimate acquisition of communicative competence in English”
(Giustini, 2020, p. 12).

In the past 20 years, SLA spoken production studies have increased, identifying not only
speakers’ perceptions of intelligibility, but also that of listeners, both L2 and NS (Pickering,
2006). User perceptions of interpreting services have been a part of Interpreting Studies
pedagogy for some time, but research in this area is more recent. Some descriptions come not
from “naïve listeners” but from fellow interpreters, and the perspective of such “insiders” is
likely different from that of non-interpreters. An overview follows of features of interpreters’
spoken performance with data on L2 target speech where available.
Approaching the construct of interpreting quality from the perspective of users and how
an interpreter’s performance fulfils expectations of (conference) interpreting services, Bühler
(1986) proposed 16 features, seven of which relate to spoken production skills: accent,
431
Jim Hlavac
pleasant voice, fluency of delivery, logical cohesion of utterance, correct grammatical usage,
use of correct terminology, and use of appropriate style. In regard to speaking skills, Bühler
(1986) reports that fellow interpreters rate logical cohesion as most important (75%) followed
by correct terminology, fluency of delivery and correct grammar with ratings of 50% or
more. In another survey of interpreters, “sense consistency” (i.e., full transfer of referential
content in a coherent way) was rated most important (Chiaro & Nocella, 2004). The same
survey showed that interpreter colleagues perceive other colleagues’ “foreign accent” in their
L2 to be the least important impediment to accomplished interpreting, but this view may not
be shared by non-interpreters.
Kurz and Pöchhacker’s (1995) survey of 19 lay clients listed a native accent, together with
a pleasant voice, and fluency of delivery as important qualities. Cheung’s (2013) survey of lay
clients’ perceptions also records preferences for interpretations to be in the interpreter’s L1.
When presented with a choice, users’ preferences are likely to yield higher ratings for L1 over
L2. This is but one feature of performance that is largely disregarded if other presentational
features are performed well. For example, Hale et al. (2011) found that a non-native accent
had no effect on how source speakers in courtroom settings were perceived.
Fluency, referring to a speaker’s ability to draw on a wide variety of alternative turns and
to deliver these with an appropriate speech rate and prosody with few “disfluencies” is a
difficult quality to elicit specifically in users’ perceptions. Users sometimes identify pauses
and hesitations rather than fluency in an overall sense (Pradas Macías, 2006). Tissi (2000)
showed that source speakers’ disfluencies are not replicated in interpreters’ target speech; this
suggests that fluency is a characteristic of individual speech behaviour. Perceived lack of
fluency is reported to strain users’ comprehension of target speech (Ahrens, 2004).
Intonation and prosody are under-studied areas in Interpreting Studies research.
Intonation, pitch movement across an utterance, has been examined in terms of interpreters’
replication of source-speakers’ intonation and whether there are intonation patterns char-
acteristic of interpreting in general. One identified by Shlesinger (1994, p. 229) relating to
simultaneous interpreting is low-rise final pitch movement. Ahrens (2004) reports similar
findings regarding final pitch features and suggests that “interpreters do not know how the
source text will continue and therefore avoid intonational closure, in favour of a final pitch
movement that indicates continuation” (Ahrens, 2015, p. 213). By implication, interpreters
view a slightly rising pitch in utterance-final position as less infelicitous than falling in-
tonation at a juncture point that does not, in hindsight, mark the end of an utterance.
Prosody, the acoustic parameters of pitch, loudness, tempo and rhythm, can be a con-
spicuous feature of interpreters’ speech, especially when interpreting simultaneously.
Anomalies include hesitation pauses, changeable tempo, monotonous or “levelled out” in-
tonation, mismatches with the illocutionary force of the speech act, and prosodic features
inappropriate to the genre of the target speech (Lenglet & Michaux, 2020). Interpreters find it
easier to replicate prosodic features in consecutive interpreting but the processing requirements
of memory retrieval (or note-taking) can still result in target speech sounding “levelled out.”
Targeted training of L2 prosody patterns has an effect on interpreting students’ performance,
not only in recognizing and understanding the function of prosodic features when listening to
source speech, but also in target speech production when working into the L2. Yenkimaleki
and van Heuven (2018) show higher ratings for prosody-related features such as accentedness,
pace, and voice amongst students who received such training in their L2 compared to a control
group of students who did not. This finding is congruent with SLA research: the compre-
hensibility of L2 speakers is enhanced with prosodic instruction (Derwing & Rossiter, 2003).
Coherence refers to the underlying functional connectedness or identity of spoken text. It
is a feature noted usually by consumers of texts, who perceive a text’s coherence in terms of
432
their knowledge of the world, and the inferences and assumptions they make (Gernsbacher &
Givón, 1995). Textual coherence of source speech is crucial to the interpreter who listens and
makes sense of it, and in turn the coherence of their target speech is important to the au-
dience. In users’ judgements of the quality of interpreting, “logical coherence of utterances”
is consistently one of the top-ranking criteria (Grbić, 2008, p. 235). Using Rhetorical
Structure Theory (RST) as a framework, Peng (2009, p. 236) reports that a greater degree of
coherence in the target speech of trainee interpreters is observable when they work into their
L1 compared to their L2, but this contrast was less marked in the performance of profes-
sional interpreters.
Cohesion concerns linguistic features of speech signalling underlying concepts and rela-
tions which aid listeners in “making sense” of what they hear. The observation that trans-
lated texts tend to be more explicitly marked with cohesive devices than their source texts
(Pym, 2005) is also true for interpreted speeches compared to their source speeches although
the frequency of cohesive device markers decreases when the interpreter is working into their
L2 (Peng, 2009).
Pauses can be naturally occurring junctures in interpreted speech as they are in mono-
lingual speech. Depending on rhetorical or intonation features in their environment, they aid
and augment comprehension. But pauses, filled or unfilled, can be seen as disfluencies oc-
curring due to online production of (interpreted) talk. In qualitative studies of simultaneous
interpreters’ performance, Cecot (2001) found that rapidly delivered source speeches result in
equivalently rapidly delivered target speeches with a commensurately small number of pauses
in both, while Ahrens (2004) found that interpreters paused less often than the source
speakers, but their pauses were longer. These findings relate to interpreting into the L1. In
regard to differences in interpreting into one’s L2 compared to L1, Mead (2015) reports that
pauses are more frequent and longer.
Repairs are instantiations of speaker monitoring that function as corrections of speech
that is “erroneous.” Repairs also relate to disruptions in speech production that are probably
naturally occurring. Interpreting requires two levels of monitoring – that of checking the
fidelity of target speech produced in relation to its source speech and that of speech pro-
duction in general. An example of online monitoring in Polish-English simultaneous inter-
preting of interpreters working into their L2 is provided by Kopcyński (1980, p. 85), “our
common…eh…aims will come true…will be achieved.” Such an utterance is more an ex-
ample of “fine-tuning” than actual correction as suggested by Mead (2015, p. 349). It is also
hard to distinguish causality given that repairs in L2 speech are often indicators that en-
coding processes in the L2 have not become fully automatized.
Voice quality relates to the description of phonation types. Supralaryngeal features de-
termine whether a person’s voice sounds breathy, creaky or somehow conspicuous such that
it elicits an aesthetic-evaluative response in listeners. Voice training for interpreting students
remains an under-studied area (Flerov & Jacobs, 2016). A pleasant interpreter voice can have
a more persuasive effect on users than the actual referential content of the target speech and
conversely, an unpleasant voice can undermine good content (Shlesinger, 1994). Iglesias
Fernández (2013) reports that high pitch and nasality can be associated with an interpreter’s
perceived lack of maturity and competence, whereas lower pitch, wider pitch range and
higher resonance are associated with positive attributes such as credibility and reliability.
Specific comments on vocal quality commonly mention prosody and fluency, meaning that
distinctly phonational features such as timbre may be harder to tease out (Iglesias
Fernández, 2013). Although these studies relate mostly to interpreters working into their L1,
results from studies of phonational features of L2 show that L2 speakers have a narrower
pitch range than in their L1 (Zimmerer et al., 2014).
433
Jim Hlavac
When interpreting, individuals are contemporaneously engaging in cognitive activities

that enable spoken production and those that perform other functions related to the re-
presentation of another’s speech. Early models (e.g., Kalina, 1998) seeking to map the in-
terpreting process contained step-by-step descriptions tracking spoken inter-lingual transfer
in a linear way that tracks a sequence of source speaker > encoder > interpreter > decoder >
receptor. Cokely’s (1992) sociolinguistically sensitive process model was the first to show
influence from sociolinguistics, namely, Hyme’s (1974) SPEAKING framework.
Beyond this, Setton and Darwant’s (2016) processing model for simultaneous inter-
pretation is the only one drawing on a psycholinguistic speaking model, namely Levelt’s
(1999) model for L1 speech production Their model is very similar to Levelt’s (see De Bot
this volume). No models adopt elements of Kormos’s (2006) Integrated Model of Speech
Production in relation to interpreting into an L2 (or L1). However, elements of it can be
identified that are recognizable in L2 interpreting, for instance, production in the L2 is
characterized by a lack of automaticity in the encoding processes carried out by the for-
mulator. Kormos (2006, p. 166) writes, “as long as an encoding process requires conscious
attentional control, encoding can only work serially.” While L1 encoding more or less
“cascades,” in L2 encoding, four main stages are identified: syntactic properties of con-
ceptual chunks, phrase and clause-structure building, phonological encoding and monitoring
(Kormos, 2006). Elements of her work are relevant to outlining an Interpreting Studies
model to account for optimal and non-optimal spoken output.
The Effort Model, as the name suggests, is based on a quantification of cognitive capacity
and the premise that it is not infinite (Gile, 2009). Originally used to examine multi-task
performance in simultaneous interpreting, the Effort Model is concerned with the process of
inter-lingual transfer of referential content only. Gile (2009, p. 168) identifies four “efforts”
(or processes): (1) “listening and analysis effort”; (2) “memory effort”; (3) “production ef-
fort” which is spoken delivery into the target language; and (4) “coordination effort” re-
ferring to expenditure of cognitive resources to monitor all three efforts contemporaneously.
Curiously, conversion of meaning from the source language into the target language is not
identified as an effort.
The model assumes explanatory power when examining why, for instance, an interpreter’s
spoken production (in the L2 or L1) is non-optimal. The model predicts that either the
interpreter’s spoken production capacity itself is overwhelmed (i.e., the interpreter does not
have the proficiency to readily produce certain forms), or that another effort capacity (or
capacities) is overwhelmed (e.g., speedily delivered source speech beyond listening and
comprehension capacity) and, consequently, the extra effort expended elsewhere results in
less effort capacity for spoken production.
The implications of Kormos’s (2006) Integrated Model of Speech Production on Gile’s
(2009, p. 167) Effort Model is that if production in the L2 is serial and “requires conscious
attentional control,” then this expenditure of effort when interpreting into the L2 means
there is commensurately less effort available for listening, memory, and coordination. At the
same time, if listening is performed in the L1, then the effort required to listen and com-
prehend is commensurately less. This is a widely reported occurrence amongst interpreters
who reflect on working from their L2 into their L1 compared to the opposite (Gorton, 2012).

A critical issue of interest is the relationship of SLA to interpreting training in a sequential
sense. Here, we consider L2 spoken proficiency as a pre-requisite to training and those as-
pects of entrance tests that relate to demonstrations of L2 speaking skills. Some earlier
434
training programmes set a high level of L2 proficiency as a condition for entry as Keiser
(1977) indicated: “interpretation courses are not language courses, in other words…the would-
be student must have mastered his (sic) language before entering into the course…he must
have the required mastery of his active and passive languages before starting the inter-
pretation course otherwise he will constantly stall and stumble under the tremendous pres-
sure of interpretation per se” (p. 13. Original emphasis). Few interpreter courses today set
such a high requirement for L2 spoken proficiency. Interpreter training has expanded
greatly; varying L2 levels and vocational expectations are now a feature of the student profile
of many training courses. I provide an overview of a cross-part of training courses, beginning
with short courses of approximately 50 contact hours and concluding with Master degree
courses. Levels of spoken proficiency may be ascertained via language tests for L2 learners,
such as IELTS for English, or levels aligned to the Common European Framework of
Reference for Languages (CEFR). Training institutions may conduct their own entrance
tests targeting L2 speaking skills, either with or without other formal tests.
Some courses in public-service interpreting have very general or even unspecified formal
linguistic prerequisites often because they target speakers of several L1s who undertake
language neutral training, or training with a common language of instruction, usually English
(the L2 for most learners). Where such courses have an entrance test, applicants’ L2 skills
may be assessed through spoken interviews and sight translation of a non-complex 100-word
text in a language other than English into spoken English (Hlavac et al., 2012).
Other vocationally focused courses with interpreting subjects specific to students’ specific
language-pairs, such as the one-year full-time course Diploma of Interpreting offered at
RMIT University (Melbourne) may have no formal academic prerequisites, but L2 language
requirements. For English, these are an IELTS (Academic) overall score of 6.0 (speaking not
lower than 5.5), and where the L2 is a language other than English, the requirement is
passing a secondary school final year exam in that language, or completion of an entrance
test (RMIT, 2020).
For 3-year Bachelor degrees providing a grounding in interpreting, the entry-level re-
quirement for L2 spoken proficiency is typically C1 (e.g., University of Vienna, 2017). For
postgraduate Master courses in interpreting, a proficiency level of C2 in the L2 is commonly
required (e.g., University of Vienna, 2018). An equivalent MA in Interpreting at a university
in a predominantly Anglophone country requires an IELTS score of 7.0 (overall) with scores
of no less than 6.5 for speaking for English L2 applicants (Newcastle University, 2020).
The toughest admission requirements of L2 speaking skills are for a specialist 2-year
Master in Simultaneous Interpreting, where applicants experience a 10-week entrance course
testing students’ speaking skills through activities such as paraphrasing, summarising and
performing impromptu role-plays in English (the L2 of most trainees) (Moser-Mercer, 1985).
Spoken proficiency in English, followed by pronunciation/enunciation are the first two cri-
teria for admission. Of the four remaining criteria, one of them is “assertiveness,” that is,
trainees’ confidence in their presentation skills and overall communicative competence
(Moser-Mercer, 1985, pp. 98–99).
In general, many Master courses have entrance-test requirements. In a survey of 18
postgraduate programmes, mainly in Europe, Timarová and Ungoed-Thomas (2008) list
spoken interviews and 2- to 5-minute oral presentations (with preparation time) in the L2 as
well as short consecutive interpreting and sight translation into the L2 as entrance-test tasks.
Interestingly, the authors do not identify shadowing as a task given to course applicants.
Shadowing is an exercise used in the training of simultaneous interpreters since the 1960s. It
is a monolingual exercise performed in the L2 (or L1) and consists of the spoken repetition of
another’s speech with little lagtime (phonemic shadowing) or at longer latencies (phrase
435
Jim Hlavac
shadowing) (Riccardi, 2015). As a dual-task exercise requiring the trainee to follow and
replicate the form of the source speaker’s speech, shadowing draws trainees’ attention to the
form of others’ speech.
In the past decade, diagnostic tools have been trialled to gauge potential trainees’ spoken
ability in L2 (and L1). The syncloze test was developed based on a recording of a 660-word
information text on a well-known topic (e.g., mobility and health) played to entrance-test
candidates. Gaps at regular intervals require candidates to utter forms in the L2 to make the
sentence thematically and/or grammatically complete. Their spoken performance is recorded
and their insertions in the gaps are assessed for lexical, collocational, and grammatical ac-
curacy (Pöchhacker, 2011).
Coming back to the issue of where interpreter training stands sequentially vis-à-vis SLA,
the variation in L2 proficiency for interpreting courses follows a general pattern: the shorter
and less demanding the course, the lower the level of required L2 skills. Still, in the content of
the courses surveyed above, there are no components focusing on language acquisition. The
rapid expansion of interpreting courses at pre-university level, and at undergraduate and
postgraduate levels suggests that many more students undertake training, but it is likely that
their L2 speaking skills vary substantially. This is a feature identified by Angelelli and
Degueldre (2002) who bemoan the paucity of bridging or superior-level courses addressing
the need to further develop the L2 speaking skills of prospective interpreting trainees.
Interpreting trainees typically report their training leads to development of L2 skills, for
instance, learning specialist vocabulary, knowledge of forms used in specific discourse genres,
command of multiple registers, and so on, but these are byproducts of the training. In the
post-training period and in the language services industry, interpreters typically report their
L2 skills continue to advance together with work-related skills such as discourse-
management and the development of business acumen.
6 Future Directions
The geographical spread of English, globalization, and the increasing numbers of speakers of
English as an L2 has led to the term “World Englishes” as a hypernym for all varieties of
English, whether L1 or L2. In many work interactions, English is no one’s L1 and functions
as a lingua franca (see Llurda, this volume). The growing use of English as a lingua franca
(ELF) has had three consequences for interpreting: less work for interpreters in bi- or in-
ternational settings where interlocutors increasingly communicate via English; interpreters
receiving source speech input in English now more often work with input coming from L2
users of English; interpreters providing target speech output into English are doing so for a
target audience of more L2 users of English. The last scenario is the focus here, where in-
terpreters working into English as their L2 know that many recipients of their interpretations
are English L2 users. In a descriptive model of conference interpreter competence containing
processes congruent with those identified by Gile (2009) in his Effort Model and others,
including pre- and post-interaction skills, Albl-Mikasa (2013a, p. 19) identifies two pro-
duction skills relating to the above situation: “balancing between high fidelity and audience
design” and “ELF accommodation” (Figure 30.1).
The term “high fidelity,” referring to production, is described by Albl-Mikasa (2013a,
p. 28) as an “ultra-completed rendition.” This metaphor refers to an English L2 rendition
fully reflective of the high register and lexically and/or phraseologically complex structure of
the source speech. Quoting conference interpreters working into English as their L2, Albl-
Mikasa (2013b, 10) lists anecdotes such as, “What is the use of throwing in expressions like,
‘that’s a sticky wicket’ when no one understands them?”. Experienced interpreters
436
Para-process skills
- business know-how, customer relations, professional standards
- lifelong learning predilection
- meta-reflection
Peri-process skills
- teamwork, cooperation
- unimposing extrovertedness
Pre-process skills - instinct and realism
- high-level command - pressure resistance Post-process skills
of languages - terminology
- low-level terminology In-process skills wrap-up
management
Comprehension skills - quality control
- informed semi- - below-expert scanning,
knowledge identifying, matching
- streamlined - contextualization
preparation
- ELF compensation
Transfer skills
- simultaneity
- capacity relief measures
Production skills
- synchronicity and
décalage modulation
- reduction
- balancing between high
fidelity and audience
design
- ELF accommodation
- performance,
presentation, prosody
Figure 30.1 Process- and experience-based model of interpreter competence. Reprinted from Albl-
Mikasa (2013b, p. 10) with permission
consciously avoid target speech constructions matching the source speech but which are
unlikely to be understood by the recipients of their target speech.
Further, Albl-Mikasa (2013b) invokes Kalina’s (1998) justification for “changing regis-
ters” in target speech for non-native audiences, and to sociolinguistic models describing shift
in style according to audience design (Bell, 1984) and communicative accommodation theory
(Giles & Coupland, 1991). In the case of audience design, interpreters modify their language
style to L2 audiences particularly in dialogic, public-service interpreting settings (i.e., to
“addressees”), and in monologic, conference interpreting settings (i.e., to “auditors”). In the
case of communicative accommodation theory, Giles and Coupland’s (1991, p. 88) de-
scription of means to “modify the complexity of speech (e.g., by decreasing diversity of
vocabulary) or simplifying syntax and increase clarity (by changing pitch, loudness [and]
tempo)…” are strategies that interpreters working into their L2 for a non-native audience
reportedly engage in (Albl-Mikasa, 2013b). Data on interpreters discussing their behaviour
437
Jim Hlavac
show this, with 72% (n = 23) of a sample of mainly conference interpreters reporting that
they “adjust [their] English (consciously or unconsciously to [their] listener/addressee” par-
ticularly in instances “when they had evidence that no native speakers were in the audience”
(Albl-Mikasa, 2010, p. 132).
A circumstance of the profile of interpreters in general is that relatively few have English as
their L1. Due to the linguistic profile of users of interpreting services, which increasingly en-
compasses English L2, Albl-Mikasa makes the call for interpreting into ELF to be a component
of training in terms of trainees’ spoken production, and as a research area with empirical data on
interpreters’ style and register used in L2 interpreting and on L2 users’ receptions of their in-
terpretations. Such research could benefit from cross-fertilization from SLA where some tradi-
tional target models are being re-evaluated, such that features of production like pronunciation
are re-conceived according to international intelligibility (Pickering, 2006).
Further Reading
Albl-Mikasa, M. (2013b). Teaching Globish? The need for an ELF pedagogy in interpreter training.
International Journal of Interpreter Education, 5(1), 3–16.
Conference interpreters report on working into English as their L2 and the reception of their interpreta-
tions amongst English L2 users.
Pöchhacker, F. (2016). Introducing interpreting studies (2nd edn). Abingdon, Oxon: Routledge.
Chapter 4 contains an overview of models, while chapters 6 and 7 deal with cognitive processes and
speech production respectively.
Setton, R., & Dawrant, A. (2016). Conference interpreting. A trainer’s guide. Amsterdam: John Benjamins.
Chapter 7 outlines language enhancement in the interpreting curriculum and discusses challenges and
strategies when interpreting into an L2 with a focus on quality of production and different speech and
event types.
References
Ahrens, B. (2004). Prosodie beim Simultandolmetschen. Frankfurt: Peter Lang.
Ahrens, B. (2015). Intonation. In F. Pöchhacker (Ed.), Routledge encyclopedia of interpreting studies
(pp. 212–214). Abingdon, Oxon: Routledge.
AIIC. International Association of Conference Interpreters. (2012). Working languages. https://aiic.net/
node/6/working-languages/lang/1 (accessed 1April2020).
Albl-Mikasa, M. (2010). Global English and English as a Lingua Franca (ELF): Implications for the
Interpreting Profession. Trans-Kom, 3(2), 126–148.
Albl-Mikasa, M. (2013a). Developing and cultivating expert interpreter competence. The Interpreters’
Newsletter, 18, 17–34.
Albl-Mikasa, M. (2013b). Teaching Globish? The need for an ELF pedagogy in interpreter training.
International Journal of Interpreter Education, 5(1), 3–16.
Angelelli, C., & Degueldre, C. (2002). Bridging the gap between language for general purposes and
language for work: An intensive Superior-level language/skill course for teachers, translators, and
interpreters. In B. Leaver & B. Shekhtman (Eds.), Developing professional-level language proficiency
(pp. 77–95). Cambridge, UK: Cambridge University Press.
ASTM International. (2007). Standard guide for Language interpretation services. (F2089). West
Conshohocken, PA: ASTM International.
Baigorri-Jalón, J. (2004). Interpreters at the United Nations: A history [Trans. by A. Barr] Salamanca:
Ediciones Universidad de Salamanca.
Bell, A. (1984). Language style as audience design. Language in Society, 13(2), 145–204.
Bühler, H. (1986). Linguistic (semantic) and extra-linguistic (pragmatic) criteria for the evaluation of
conference interpretation and interpreters, Multilingua, 5(4), 231–235.
Cecot, M. (2001). Pauses in simultaneous interpreting: A contrastive analysis of professional inter-
preters’ performances. The Interpreters’ Newsletter, 11, 63–85.
Chang, C.-C., & Schallert, D. (2007). The impact of directionality on Chinese/English simultaneous
interpreting. Interpreting, 9(2), 137–176.
438
Cheung, A. (2013). Non-native accents and simultaneous quality perceptions. Interpreting, 15(1), 25–47.
Chiaro, D., & Nocella, G. (2004). Interpreters’ perception of linguistic and non-linguistic factors af-
fecting quality: A survey through the World Wide Web. Meta, 49(2), 278–293.
Cho, S. (2007). Curriculum development in the undergraduate interpretation and translation program.
The Journal of Translation Studies, 8(2), 163–191.
Cokely, D. (1992). Interpretation: A sociolinguistic model. Burtonsville, MD: Linstok Press.
Dejean, L. (2000). Perfecting active and passive languages. Conference Interpretation and Translation,
2, 7–23.
Denissenko, J. (1989). Communicative and interpretative linguistics. In L. Gran & J. Dodds (Eds.), The
theoretical and practical aspects of teaching conference interpretation. (pp. 155–158). Udine:
Campanotto.
Derwing, T. M., & Rossiter, M. J. (2003). The effects of pronunciation instruction on the accuracy,
fluency and complexity of L2 accented speech. Applied Language Learning, 13, 1–18.
Derwing, T. M., Fraser, H., Kang, O., & Thomson, R. I. (2014). L2 accent and ethics: Issues that merit
attention. In A. Mahboob & L. Barratt (Eds.), English in a multilingual context, (pp. 63–80). New
York: Springer.
Donovan, C. (2004). European Masters Project Group: Teaching simultaneous interpretation into a B
language. Interpreting, 6(2), 205–216.
Dörnyei, Z. (1994). Motivation and motivating in the foreign language classroom. The Modern
EMCI [European Masters in Conference Interpreting] (2018). Examinations: Admission and diploma
tests. https://www.emcinterpreting.org/examinations. (Accessed 2April2020).
Flerov, C., & Jacobs, M. (2016). Improving the interpreter’s voice. Morrisville, NC: Lulu Press.
Gernsbacher, M., & Givón, T. (1995). Coherence in spontaneous text. Amsterdam: John Benjamins.
Gile, D. (2009). Basic concepts and models for interpreter and translator training (2nd edn). Amsterdam:
John Benjamins.
Giles, H., & Coupland, N. (1991). Language: Contexts and consequences. Milton Keynes, UK: Open
University Press.
Giustini, D. (2020). Interpreter training in Japanese higher education: An innovative method for the
promotion of linguistic instrumentalism. Linguistics and Education, 56, 100792.
Gorton, A. (2012). ‘B’ language interpreting: The interpreter’s perspective. Forum, 10(2), 61–88.
Grbić, N. (2008). Constructing interpreting quality. Interpreting, 10(2), 232–257.
Gu, Y., & Johnson, R. (1996). Vocabulary learning strategies and language learning outcomes.
Hale, S., Bond, N., & Sutton, J. (2011). Interpreting accent in the courtroom. Target, 23(1), 48–61.
Hlavac, J., Orlando, M., & Tobias, S. (2012). Intake tests for a short interpreter-training course: design,
implementation, feedback. International Journal of Interpreter Education, 4(1), 21–45.
Hymes, D. (1974). Foundations in sociolinguistics: An ethnographic approach. Philadelphia: University of
Pennsylvania Press.
Iglesias Fernández, E. (2013). Unpacking delivery criteria in interpreting quality assessment. In D.
Tsagari & R. Van Deemter (Eds.), Assessment issues in language translation and interpreting
(pp. 51–66). Frankfurt: Peter Lang.
Jenkins, S., & Parra, I. (2003). Multiple layers of meaning in an oral proficiency test. The com-
plementary roles of nonverbal, paralinguistic and verbal behaviors in assessment decisions. The
Kalina, S. (1998). Strategische Prozesse beim Dolmetschen. Theoretische Grundlagen, empirische
Fallstudien, didaktische Konsequenzen. Tübingen: Gunter Narr.
Kalina, S. (2005). Quality assurance for interpreting processes. Meta, 50(2), 768–784.
Keiser, W. (1977). Selection and training of conference interpreters. In D. Gerner & W. Sinaiko (Eds.),
Language interpretation and communication (pp. 11–24). New York: Plenum Press.
Komatsu, T. (2016). A brief history of interpreting and interpreter training in Japan since the 1960s.
In Y. Someya (Ed.), Consecutive notetaking and interpreter training (pp. 15–38). London:
Routledge.
Kormos, J. (2006). Speech production and second language acquisition. Mahwah, NJ: Lawrence
Erlbaum.
Kopcyński, A. (1980). Conference interpreting: Some linguistic and communicative problems. Poznań:
Adam Mickiewicz University Press.
Kurz, I. (2001). Conference interpreting: Quality in the ears of the user. Meta, 46(2), 394–409.
439
Jim Hlavac
Kurz, I., & Färber, B. (2003). Anticipation in German-English simultaneous interpreting. Forum, 1(2),
123–150.
Kurz, I., & Pöchhacker, F. (1995). Quality in TV interpreting. Translation. FIT Newsletter, 14(3/4),
350–358.
Laviosa, S. (2014). Translation and language education. Pedagogic approaches explored. Abingdon,
Oxon: Routledge.
Lee, T. (2014). Using computer-assisted interpreter training methods in Korean undergraduate English
classrooms. The Interpreter and Translator Trainer, 8(1), 102–122.
Lenglet, C., & Michaux, C. (2020). The impact of simultaneous-interpreting prosody on comprehen-
sion. Interpreting, 22(1), 1–34.
Levelt, W. (1999). Speaking: From intention to articulation. Cambridge, MA: MIT Press.
Lim, H.-O. (2005). Working into the B language: The condoned taboo? Meta, 50(4), CD – ROM.
Mead, P. (2015). Pauses. In F. Pöchhacker (Ed.), Routledge encyclopedia of interpreting studies,
Mead, P. (2015). Repairs. In F. Pöchhacker (Ed.), Routledge encyclopedia of interpreting studies,
(pp. 348–350). Abingdon, Oxon: Routledge
Moser-Mercer, B. (1985). Screening potential interpreters Meta, 30(1), 97–100.
NAATI. (2019). Certified Interpreter Test Assessment Rubrics. Available at: https://www.naati.com.au/
media/2245/ci_spoken_assessment_rubrics.pdf.
Nabei, T., & Swain, M. (2002). Learner awareness of recasts in classroom interaction: A case study of
an adult ESL student’s second language learning. Language Awareness, 11(1), 43–63.
Newcastle University. (2020). Interpreting MA. entry requirements. https://www.ncl.ac.uk/postgraduate/
courses/degrees/interpreting-ma/#entryrequirements (accessed 1April2020).
Nicodemus, B., & Emmorey, K. (2015). Directionality in ASL-English interpreting: Accuracy and
articulation quality in L1 and L2. Interpreting, 17(2), 145–166.
Park, H. (1999). A study on developing an interpretation track for undergraduate students. Conference
Interpretation and Translation, 1, 47–74.
Peng, G. (2009). Using rhetorical structure theory (RST) to describe the development of coherence in
interpreting trainees. Interpreting, 11(2), 216–243.
Pica, T. (1994). Research on negotiation: What does it reveal about second-language conditions,
processes and outcomes. Language Learning, 44(3), 493–527.
Pickering, L. (2006). Current research on intelligibility in English as a lingua franca. Annual Review of
Pöchhacker, F. (2011). Assessing aptitude for interpreting: The Syncloze test. Interpreting, 13(1),
106–120.
Pöchhacker, F. (2016). Introducing interpreting studies (2nd edn). Abingdon, Oxon: Routledge.
Pradas Macías, M. (2006). Probing quality criteria in simultaneous interpreting: The role of silent
pauses in fluency. Interpreting, 8(1), 25–43.
Pym, A. (2005). Explaining explicitation. In K. Károly & Á. Fóris, (Eds.), New trends in translation
studies: In honour of Kinga Klaudy (pp. 29–34). Budapest: Akadémiai Kiadó.
Riccardi, A. (2015). Shadowing. In F. Pöchhacker (Ed.), Routledge encyclopedia of interpreting studies
RMIT. (2020). Diploma of interpreting (LOTE-English). https://www.rmit.edu.au/study-with-us/levels-
of-study/vocational-study/diplomas/diploma-of-interpreting-loteenglish-c5364
Seleskovitch, D. (1978). Interpreting for international conferences [Trans. by S. Dailey & E. McMillan]
Leesburg,VA: Pen and Booth.
Setton, R., & Dawrant, A. (2016). Conference interpreting. A trainer’s guide. Amsterdam: John
Benjamins.
Shlesinger, M. (1994). Intonation in the production and perception of simultaneous interpretation. In S.
Lambert & B. Moser-Mercer (Eds.), Bridging the gap: Empirical research in simultaneous inter-
pretation (pp. 225–236). Amsterdam: John Benjamins.
Timarová, Š., & Ungoed-Thomas, H. (2008). Admission testing for interpreting courses. The Interpreter
and Translator Trainer, 2(1), 29–46.
Tissi, B. (2000). Silent pauses and disfluencies in simultaneous interpretation: A descriptive analysis.
The Interpreters’ Newsletter, 10, 103–127.
University of Vienna. (2017). Bachelorstudium Transkulturelle Kommunikation. https://
transvienna.univie. ac.at/fileadmin/user_upload/z_translationswiss/Studium/Curricula/Curriculum_
Bachelorstudium_Transk ulturelle_Kommunikation_2016_Stand2017.pdf (Accessed 3April2020).
440
University of Vienna. (2018). Zulassung zum Masterstudium Translation https://transvienna.univie.ac.at/

studium/masterstudium-translation/voraussetzungen/ (Accessed 3April2020).
Yenkimaleki, M., & van Heuven. V. (2018). The effect of teaching prosody awareness on interpreting
performance: An experimental study of consecutive interpreting from English into Farsi.
Perspectives, 26(1), 84–99.
Zannirato, A. (2008). Teaching interpreting and interpreting teaching: A conference interpreter’s
overview of second language acquisition. In J. Kearns (Ed.), Translator and interpreter training:
Issues, methods and debates (pp. 19–38). London: Continuum.
Zimmerer, F., Jügler, J., Andreeva, B., Möbius, B., & Trouvain, J. (2014). Too cautious to vary more?
A comparison of pitch variation in native and non-native productions of French and German. In
Proceedings of the 7th speech prosody conference (pp. 1037–1041). Ireland: Trinity College Dublin.
https://ids-pub.bsz-bw.de/files/5917/Zimmerer_Juegler_Andreeva_Too_cautious_to_vary_more_A_
comparison_of_pitch_variation_2014.pdf
441
31
FIRST LANGUAGE ATTRITION
Monika Schmid
This chapter not only concludes the “Emerging issues” part of this volume on second
language acquisition (SLA) and speaking, but is also the final chapter overall. In many ways
this is appropriate: First language (L1) attrition has struggled for a long time to emerge from
its status as a niche subject and is often regarded as an afterthought. Many researchers
assume that substantial traffic and interference from the second language (L2) to the L1 will
affect only a small number of bilinguals under very specific conditions such as long-term,
immersed bilingualism, dominant use of and near-native proficiency in the L2 and extremely
limited use of the L1. With such a view, effects of the second language on the first are not
necessarily relevant to all research on L2 development, and, in particular, to research on
instructed L2 learning. However, the cumulative evidence from research over the past dec-
ades indicates that measurable changes in the L1 become established very early in both
immersed and instructed late bilinguals. The bidirectional nature of cross-linguistic inter-
action should therefore be a consideration in most, if not all, contexts of bilingualism.
The view of attrition as a phenomenon on the periphery of bilingual development is
evident from the fact that in the 1990s and early 2000s, most reference works on L2 ac-
quisition and bilingual development did not include chapters dedicated to this phenomenon
(e.g., Bhatia & Ritchie, 2004; Doughty & Long, 2003; Kroll & de Groot, 2005). This changed
only gradually (see part on encyclopedia chapters in Schmid, 2020), starting with Cook’s
(2003) theory of linguistic multicompetence, which is based on the notion of an “integration
continuum” in which all languages of the multilingual language user exist somewhere be-
tween the extreme points of entire separation and entire integration. Where exactly the
languages are situated on this continuum may vary – across individuals, across linguistic
levels, and across time – but some form of connection, however tenuous, will always exist,
and the L2 will therefore also always affect the L1 (see Lowie & Verspoor, this volume).
The differences observed in the L1 between bilingual and monolingual speakers are often
much more subtle than differences between (monolingual) natives and L2 users. They relate
mainly to issues of processing and activation rather than representation, and therefore rely
on fine-grained, online measures for their detection. This has led to an increase in interest not
only in language attrition from the perspective of psycho- and neurolinguistics, but also in
the area that is most reliant on and most characteristic of online and naturalistic language
442 DOI: 10.4324/9781003022497-37

First Language Attrition
processing: speaking. In this chapter, I follow the terminology established in Schmid and
Köpke (2017) in that I assume the term “attriter” to broadly refer to any individual who
became bilingual or multilingual after the onset of puberty. The term most commonly, but
not uniquely, refers to a late bilingual who has been living in an L2 environment for some
time and, where not otherwise indicated, that is how I will be using it here.
In the early phases of language attrition research in the 1980s and early 1990s, attrition
phenomena were almost exclusively understood and investigated in terms of representational
changes, that is, changes to underlying “competence” (e.g., Seliger & Vago, 1991; Sharwood
Smith & van Buren, 1991). In this context, any kind of “interference” or speech error pro-
duced by attriters was taken as an indication that attrition had taken place, that is, that the
underlying lexical or grammatical system had been changed or simplified so that obligatory
rules were no longer fully applied correctly or words were used inappropriately (e.g.,
Olshtain & Barzilay, 1991). The same interpretation was given to erroneous responses in
experimental tasks, such as grammaticality judgements (e.g., Altenberg, 1991). The as-
sumption that non-attrited native speech is, by default, homogenous and error-free was so
strong that, more often than not, these studies did not establish a control group to assess
whether error rates among attriters were higher than among monolinguals (see Köpke &
Schmid, 2004). This is spelled out, for example, by Vago (1991):
On the assumption that the subject’s first language dialect agreed in essential detail
with her parents’ dialect from the period of initial acquisition in Hungary through
the onset of attrition in Israel, the standard dialect of Hungarian may reasonably be
identified as the base-line grammar with which the subject’s grammar may be
compared. On this basis, any structural deviation from the standard may be identified
as an attrition phenomenon. (p. 241f., emphasis added)
This prevailing view that attrition is anything that deviates from an idealized monolingual
norm, which, in turn, is tacitly assumed to be 100% accurate, was challenged by Köpke
and Schmid (2004) and Schmid (2004), who highlighted a number of conceptual and
methodological difficulties, among them establishing what an “error” is in the first place
(as “right” and “wrong” in free speech are not necessarily discrete categories), what the
unattrited baseline looks like in terms of error-rates and other features, and what the
distributional properties are of the feature under investigation in both attrited and un-
attrited data. In particular, Schmid (2004) drew attention to the topic of avoidance
strategies and the fact that speakers who have lost confidence in their grammatical in-
tuitions may construct their utterances in a manner allowing them not to use those
structures about which they are uncertain, potentially leading to superficially error-free
but simplified discourse.
In the late 1990s and early 2000s, technological and methodological advances as well as an
increase in research networks led to more empirical rigour and larger sample sizes in attrition
research (e.g., Köpke & Schmid, 2004). The small-group case studies with relatively sim-
plistic experimental tasks that had largely prevailed until then were replaced by larger in-
vestigations typically comprising 20–50 participants, an unattrited control group, and a
variety of measures, usually a combination of elicited/free speech and controlled experi-
mental tasks, as suggested in the Language Attrition Test Battery (Schmid, 2011; see also
https://languageattrition.org).
443
Monika Schmid
The findings that arose from these studies paint a rather more complex picture of L1
attrition than was originally assumed, suggesting that what is at stake is not so much the
erosion of underlying knowledge leading to inaccuracies and a “deviant” grammar, but the
target-like application of fundamentally intact knowledge under the cognitive and time
pressures incurred in online speech production. All speakers experience these pressures (and
occasionally make mistakes or slips of the tongue), but for bilinguals they increase due to the
higher cognitive load incurred by managing two language systems (Simard, this volume). In
language attrition, a relatively low level of resting activation of the language due to lack of
exposure and use may further contribute to the cognitive load. That being the case, it makes
sense to assume that attriters will develop strategies to alleviate such pressures in real-time
speech production by avoiding more costly operations or using time-buying strategies, which
may in turn influence the complexity and fluency, alongside the accuracy, of their speech
output. Such phenomena were anticipated at the earliest stages of attrition research, for
example, by Andersen (1982), who predicted that attrited speech would be characterised not
only by a “lack of adherence to the linguistic norm” (p. 91) but also by reductions affecting
lexicon, phonology and grammar and by “linguistic insecurity” (p. 111), leading to slowed-
down linguistic interactions and the overuse of compensatory strategies and searches for
words or phrases.

The insights and developments emerging in the field of language attrition studies briefly
described earlier necessarily led to a re-framing of fundamental questions relating to the
scope of language attrition effects. The most controversially debated issue in this context is
probably whether attrition can ever affect a mature linguistic system at the representational
level, that is, cause a restructuring of the underlying grammar, or whether the phenomena we
witness remain confined to online processing and do not extend to the knowledge that un-
derpins it. To date, virtually all available evidence points to the latter interpretation, which
recently prompted Schmid and Köpke (2017) to suggest that the search for representational
attrition in healthy late bilinguals may be futile, as native language knowledge likely becomes
impervious to erosion around puberty, even in the absence of any further input (Schmid &
Köpke, 2017). This finding is in stark contrast with empirical evidence that, up until late
childhood, L1 knowledge is extremely vulnerable and may be lost largely or even entirely if
input ceases (e.g., Bylund, 2019).
The issue at stake here is closely related to the debate about the qualitative versus
quantitative nature of the difference between (monolingual) native speakers and advanced
late L2 learners. Here, too, the question of representational differences and maturational
constraints has long since loomed large (e.g., Granena & Long, 2013). Language attrition has
a big role to play in this debate, as clear evidence for a fundamental change in the stability
and flexibility of the L1 system around puberty would go a long way towards explaining why,
after this stage of life, it becomes more difficult for an L2 to fully establish itself (e.g., Bylund,
2019; Schmid, 2014).
The shift in research questions from representational deficits to differences in online
processes (which originally were dismissed as “mere” performance and thus uninteresting,
e.g., Seliger & Vago, 1991) has engendered a similar shift in research methods towards more
subtle aspects of the linguistic repertoire: while investigations of qualitative differences be-
tween monolinguals and bilinguals (irrespective of whether they are second language learners
or attriters) tend to focus mainly on accuracy, investigations of how these populations deal
with cognitive load and cross-linguistic influence tend to also take into account strategies of
444
simplification – that is, reductions in lexical or grammatical complexity - and hesitation – a

reduction in fluency – which these speakers may employ to mitigate the cognitive load as-
sociated with the real-time production of a language that has become less accessible.
Complexity
The concept of complexity has been widely studied in bilingualism research, most often in the
context of L2 writing (e.g., Bulté & Housen, 2012; Ortega, 2012), but also in L1 attrition and
spoken language. For example, a range of studies have attempted to measure the lexical
complexity of attrited naturalistic speech production (see Jarvis, 2019 for an overview), in
line with Andersen’s prediction that attriters would have a smaller, less accessible productive
lexicon consisting of common, highly frequent and unmarked lexical items (Andersen, 1982).
In the early stages of attrition research, studies tended to focus on the specificity of particular
lexical items. For example, Olshtain and Barzilay (1991) collected a corpus of spoken English
from a group of American attriters in Israel by means of two picture-story booklets popular
in child language acquisition research (the “Frog stories”). They focus on how various items
occurring in the story are named, for example, pond, deer and gopher, and conclude that
there is more variability in the Israel-based than in the US-based population (it should be
said here, however, that the attriting group was 2.5 times the size of the monolingual one),
with attriters often preferring more general terms (e.g., “body of water” instead of “pond”).
A similar approach was adopted in studies by Pavlenko (e.g., 2004; 2010; Pavlenko & Malt,
2011), who assessed how entire semantic fields, such as verbs of emotion or motion and
household objects, may shift their overall meaning through cross-linguistic influence and
semantic extension from the L2. Based on these and other findings, Schmid (2011, Chapter 3)
provides an overview and taxonomy of the different types of cross-linguistic influence that
may occur in L1 lexical attrition.
Andersen’s (1982) position paper also makes a number of predictions for syntactic at-
trition: he hypothesizes that attriters will preserve and overuse those syntactic constructions
that more transparently reflect underlying semantic and syntactic relations, and furthermore,
where applicable, they will tend to collapse different surface structures into one, except where
such a collapse would lead to informational loss (Andersen 1982, p. 99). Empirical studies on
such syntactic simplifications in free speech are much rarer than investigations of lexical
complexity and sophistication, but there are some tentative findings indicating that the most
complex options may indeed become dispreferred in speech production by attriters when less
complex alternatives exist. For example, Yılmaz (2011) investigated the use of building
(such as the grammatical genderive types of Turkish complex embedding constructions
among a population of attriters in the Netherlands and controls in Turkey (matched for age,
education, and region of origin) in a naturalistic interview. Yılmaz concluded that the most
complex of these constructions, postpositional clauses, had decreased to some extent among
the Turkish–Dutch bilinguals relative to their use by the reference population, but that the
four other types remain unaffected. Similarly, Jackson et al. (2011) found that film retellings
from German attriters, on average, contained fewer constituents in the inner field of the
German “verbal bracket” (formed by the finite and the nonfinite part of the verb), but that
this tendency to extrapose information outside the Verb Phrase was modulated by the
typological proximity between the languages in that it was stronger for those attriters whose
second language was Dutch than for German–English bilinguals. Finally, Karayayla (2020)
conducted an investigation of the extremely productive Turkish inflectional morphology
system. Turkish agglutinative morphology is characterised by the use of frequently
co-occurring “suffix templates”, that is, formulaic strings of up to four individual suffixes
445
Monika Schmid
which, due to their association, develop relationships that are similar to lexical collocations.
For example, the suffix chain A3pl + P2pl + Abl as in kitap-lar-ınız-dan (“from your books”)
appears to be stored in a separate mental representation and accessed with the same speed as
a single suffix (Bilgin, 2016). Karayayla found that, while verb phrase templates remained
unaffected by attrition, nominal suffix templates were used less productively in language
attrition. In particular, the less frequent templates were not applied to as large a range of
lexical lemmas of different frequencies by the attriters than by the monolingual controls.
Interestingly, the frequency of the lemma itself also played a role, with less frequent nouns
being less productive in terms of the suffix templates they were paired up with. This again
hints at a possible effect of a high cognitive load incurred by retrieving less accessible items
from memory that takes its toll on other parts of the computational system.
Taken together, these findings suggest that, in terms of syntactic strategies, there are no
sweeping or dramatic simplification effects to be observed even in long-term attrition (all of
the studies mentioned earlier investigated attriting populations with a minimum of 15 years
and an average of over 35 years of residence in an L2 environment), as was suggested by
Andersen (1982), but that some more subtle shifts may occur. The effect of these simplifi-
cations is to relieve pressure on the computational system through stronger reliance on more
frequent constructions or a distributional realignment of certain constructions that is more
closely in line with that supplied by the L2. In general, morphological features with a limited
range of values – for instance, case, tense, and evidentiality – do not become reduced overall
in the attritional process, as attriters tend to continue to make use of all available forms (i.e.,
there is no evidence of the nominal case overall replacing the oblique, or one tense sup-
planting another). However, as the example from Turkish shows, when a very large range of
formulaic suffix sequences is available, their use may become somewhat less productive in the
process of L1 attrition, leading to surface-level simplification.
Fluency
Attrition effects with respect to speech fluency (Kahng, this volume) were also anticipated by
Andersen (1982), with the prediction that attriters “will be less capable […] of being quick and
easy and of being expressive in the language”, leading to slowed-down linguistic interactions and
the overuse of compensatory strategies, searches for words or phrases, etc. (Andersen, 1982,
p. 111). This “linguistic insecurity” was first examined by Schmid and Beers Fägersten (2010),
who investigated the distribution of filled and empty pauses as well as repetitions and self-
corrections within a corpus of film retellings collected from German and Dutch attriters and
controls. With the exception of filled pauses, all disfluency markers had increased in the attriting
populations, and their placement had also changed in that the attriting populations made more
use of empty pauses immediately preceding nouns, articles, and pronouns. Schmid and Beers
Fägersten ascribed these patterns not only to a reduction of the accessibility of lexical items in
themselves, but to a potential weakening of other properties of the lemma which feed into lexical
retrieval and sentence building (such as the grammatical gender of nouns), leading to increased
insecurities on which article or pronoun to select. For filled pauses, on the other hand, Schmid
and Beers Fägersten found a change in distribution among the attriters that appeared to re-
semble the statistical and distributional properties of such items in the L2.
A number of other studies have since corroborated these findings, confirming that
monolingual-like fluency is one of the most vulnerable factors in L1 attrition (e.g.,
Badstübner, 2011; Bergmann et al., 2015; Dostert, 2009; Opitz, 2011; Varga, 2012; Yılmaz &
Schmid, 2012). Interesting results are reported by Schmid and Keijzer (2009), who concluded
that the increase in disfluency markers and the reduction in lexical accessibility found among
446
attriters is similar to what happens in healthy aging, to the extent that older monolinguals
appear to “catch up” with their attrited age-matched peers around age 70, while the bilin-
guals do not show any further age-related patterns of increased disfluency. There also ap-
pears to be a clear effect of re-exposure: Stolberg and Münch (2010) report on a longitudinal
single-case study in which an elderly native German speaker living in an L2 English en-
vironment (with a length of residence of over 50 years) was interviewed repeatedly over a
period of four years, and showed a marked recovery in terms of the number of hesitation
markers (as well as semantic accessibility and morphosyntactic accuracy) across this time.
Accentedness and Accuracy of Pronunciation

A final aspect of speech production that is important in the context of L1 attrition relates to
target-like pronunciation. The development of a foreign accent is possibly the symptom of
L1 attrition that is least expected by immigrants, since having been dominantly or exclusively
exposed to a language in childhood is usually seen as the guarantor not only of becoming a
native speaker but also of being perceived as such by other speakers of the language (e.g.,
Schmid, 2019). However, several studies have shown that some of the acoustic characteristics
of spoken language that feed most strongly into perceived nativeness, such as Voice Onset
Time, may shift away from native norms in the L1 of bilinguals. Such a shift was first
reported in French–American English and American English-French bilinguals by Flege
(1987), and this observation forms an important cornerstone of his Speech Learning Model,
which assumes that speech sound categories remain adaptive throughout the lifespan and
that L1 and L2 sounds exist in a shared phonetic space. In this space, those speech sounds
that are perceived to be sufficiently similar to each other are linked by “equivalence classi-
fication,” resulting in phonetic convergence (e.g., Flege, 2002). Similar findings on the
adaptability of the phonetic space have been reported in other studies (e.g., Chang, 2012;
Flege & Eefting, 1987; Major, 1992; Mayr et al., 2012; see de Leeuw, 2019 for an overview).
In addition to such investigations of the acoustic properties of particular speech sounds as
produced by bilinguals and monolinguals, there are also studies focusing more broadly on
the phenomenon of perceived nativeness (see de Leeuw, 2019, for an overview). These ty-
pically demonstrate that populations of L1 attriters usually tend to contain a subset of
speakers that monolingual natives classify as non-native, or about whose native status they
are no longer sure, alongside other speakers who are perceived to be unambiguously native.
The first such investigation was presented by de Leeuw, Schmid & Mennen (2010), and their
findings have been replicated multiple times for other populations (e.g., Hopp & Schmid,
2013; Karayayla & Schmid, 2019; Schmid & Yılmaz, 2018; Varga, 2012). However, a
comparison of the foreign accent ratings given to individual speakers and acoustic mea-
surements of their productions of sounds most similar in the L1 and L2 (and therefore
expected to be most strongly affected by equivalence classification) did not show any cor-
respondences between the two measures (Bergmann et al., 2016). It thus remains unclear
which characteristics of attrited speech native speakers base their perceptual ratings on, al-
though suprasegmentals are likely to play a role (e.g., Mennen, 2004; Mok, this volume).

The findings summarized earlier demonstrate both the extent to which a native language may
remain flexible and adaptive under conditions of intensive contact with another language and
the limits of this malleability. While it has become clear that early predictions about re-
presentational restructuring of an L1 are, in practice, rarely or never substantiated,
447
Monika Schmid
cumulative findings from a range of empirical studies of attrited speech production show
consistent, albeit subtle, differences in complexity, accuracy and fluency between attrited and
non-attrited populations. To date, there is only one single-case study (Iverson, 2012) of a
user of two typologically closely related languages (Spanish and Portuguese) which has
presented convincing evidence of actual restructuring, though under highly unusual
circumstances.
The growing realization that language attrition does not usually lead to contact-
induced changes or underlying restructuring of the linguistic system has brought about a
change in both research paradigms and attempts to integrate findings into theories of
language development. With respect to the former, empirical research has shifted from
using mainly behavioural tasks (such as offline grammaticality judgements) to tap into
shifts in representational knowledge, to online and neurocognitive research technologies,
alongside a greater reliance on data collected from naturalistic speech production. In
terms of theoretical approaches, questions about the vulnerability versus the stability of
different aspects of the L1 linguistic system have been shown to have important im-
plications for the broader view of language development and the architecture of the
human capacity for language and bilingualism.
Research Paradigms
Language attrition research has often found that in targeted behavioural research designs
and elicitation paradigms which allow the language user to focus attention entirely on one
specific linguistic process (e.g., lexical retrieval, offline grammaticality judgements, and
picture naming), attriting populations do not differ, or differ only very slightly, from
monolingual controls. More consistent differences have been found through methods
capable of capturing not so much the deterioration of a grammatical system and its rules
but of how the additional cognitive load incurred by the competition between languages
on the one hand and lack of exposure on the other feeds into accessibility and processing.
Neurocognitive methods such as eyetracking (see Dussias et al., 2019) and EEG (see
Steinhauer & Kasparian, 2019) allow fine-grained insights into shifts in language pro-
cessing caused by these pressures even in the absence of more overt changes, such as a
decrease in the ability to detect grammatical violations. In particular, these studies have
shown how bilinguals and attriters may parse sentences differently from monolinguals, for
instance, in areas such as relative clause attachment, or how other features of processing
and evaluation may change (for a more detailed discussion of online and neurocognitive
methods in studies of L2-to-L1 influence see Schmid & Köpke, 2017 as well as the
part on psycholinguistic and neurolinguistic approaches to L1 attrition in Schmid &
Köpke, 2019).
A further window into the more subtle aspects of change usually witnessed in language
attrition is provided by online, real-time speech production. This process forces the speaker
to rapidly integrate information from all linguistic levels under pressure from time con-
straints and limited cognitive resources. In-depth comparisons between speech produced by
monolinguals and attriters almost invariably shows differences in terms of the phenomena
discussed in Part 3 of this chapter. While the analysis of naturalistic spoken data presents a
range of methodological challenges in terms of the quantification of features and items which
are more easily controlled in studies employing targeted elicitation methods (Iwashita, this
volume), it has been proposed that a combination of both elicited (online or offline) and
naturalistic data is better suited to detecting and describing the full range of attrition phe-
nomena (e.g., Schmid, 2011) than is one of these paradigms alone.
448
Theoretical Frameworks
The findings described earlier present a clear explanatory challenge for theoretical frame-
works of bilingualism: any theory of how the configuration of the human mind allows
speakers to become proficient in one or more languages, and how pre-established languages
influence and constrain the acquisition of languages acquired later, should also be able to
account for the deterioration of proficiency and facility through the cognitive pressure and
competition incurred by learning and using another.
Some researchers favour explanatory frameworks based largely on the accessibility and
the deterioration of unused knowledge, viewing attrition as the result of a long-term lack of
stimulation (e.g., Paradis, 2007), which affects less frequent items and forms more than
highly frequent ones. A related account, also linked to the domain-general operation of
memory and cognition, assumes that those grammatical features that were learned earliest
and/or learned best will be the ones that are most resilient in the attrition process (the
Regression Hypothesis, e.g., Keijzer, 2007). A more recent development of this approach is
presented in MacWhinney’s (2019) extension of his Competition Model to the attrition
context, which takes into account psycho- and neurolinguistic processes such as entrench-
ment, transfer and resonance to arrive at a model more capable of predicting specific out-
comes of the attritional process.
Other theoretical approaches attempting to account for the selectiveness of attritional
processes have their roots less in the distributional properties of morphosyntactic forms
and more in the role they play within a grammatical theory. The model that has been
empirically tested most often in language attrition research is probably the Interface
Hypothesis (IH, e.g., Sorace, 2005). This hypothesis has evolved somewhat since it was
first proposed, but in its essence, it predicts that grammatical features representing core
syntactic properties will be less vulnerable to language attrition than structures situated at
the interface with other cognitive domains (e.g., pragmatics). Evidence supporting this
hypothesis comes from, among others, a study of subject–verb inversion in Spanish
(Perpiñán, 2011). While Spanish main clauses follow a rigid subject–verb word order (La
maestra escribió un libro vs. *Escribió un libro la maestra, “The teacher wrote a book,”
Perpiñán, 2011, her example 3) subject–verb inversion is obligatory in questions (¿Qué dijo
Juan? vs. *¿Qué Juan dijo? “What did John say?” Perpiñán, 2011, her example 1). In
embedded relative clauses, however, inversion is optional and regulated by pragmatic
factors (Pedro no leyó el libro que la maestra escribió vs. * Pedro no leyó el libro que escribió
la maestra, “Pedro did not read the book that the teacher wrote”, Perpiñán, 2011, her
example 4). Perpiñán’s investigation shows that purely syntactic inversion (in wh-
questions) is unaffected by attrition, while pragmatically licensed inversion differs be-
tween attriters and monolinguals. Furthermore, a number of studies on the use of null vs.
overt pronouns (e.g.,Tsimpli et al., 2004) and Differential Object Marking (e.g.,
Chamorro et al., 2016) provide evidence for the IH. However, criticisms have been raised
about the lack of specificity of the IH, that is, the inability to predict a hierarchy of
vulnerability to attrition among interface phenomena (e.g., Gürel, 2011). Other syntactic
accounts, most notably the Feature Reassembly Hypothesis (e.g., Lardiere, 2009) have
recently been invoked in this discussion in search of more fine-grained models (e.g., Hicks
& Domínguez, 2020; Putnam et al., 2019). Within these frameworks, it is assumed that
morphological forms may represent language-specific “feature bundles,” which are only
gradually adjusted to fully native-like settings in L2 acquisition, and which may then also
shift and change in a language that is already established, leading to non-target-like use
and intuitions (e.g., Putnam & Sánchez, 2013).
449
Monika Schmid
5 Recommendations for Practice and Future Directions

As this chapter has tried to show, fundamental insights into the underlying nature of lan-
guage attrition, alongside methodological and statistical advances, have led to an important
evolution in the sophistication and rigour with which attrition studies have been conducted.
This has allowed further insights and a better understanding of the scope and limits of the
flexibility of a mature L1 system. It has become clear that fine-grained online methods of
data elicitation are more suited to capturing changes in L1 accessibility and processing, and
that a careful analysis of freely spoken data can complement such findings.
The growth in the scale and reliability of the data collected in language attrition research
has been paralleled by an increase in the overall interest in this development: a search on
Google Scholar for the terms “Language Attrition” and “Second Language Acquisition”
shows that attrition studies accounted for only about 2.5% of the total publications in the
five years between 1981 and 1985. This proportion has increased steadily and consistently,
reaching 10% between 2016 and 2020 (Figure 31.1).
While both of these developments have led to a much sounder empirical basis for insights
into the developmental process that is L1 attrition, two substantial shortcomings should be
urgently addressed in future research. The first of these is a lack of theoretical sophistication: it
has often been pointed out that, in comparison with SLA research, attrition studies tend to be
theory-poor, adopting a largely descriptive approach (e.g., Schmid & Köpke, 2019). In this
context, several theoretical proposals have suggested how theoretical frameworks might be
productively applied to a fuller understanding of the linguistic and extralinguistic factors that
drive and constrain the range of attrition phenomena observed in the empirical practice, such
as the Competition Model (MacWhinney, 2019), the Feature Reassembly Hypothesis (Hicks &
Domínguez, 2020; Putnam et al., 2019) and Complex Dynamic Systems (Opitz, 2019). It is to
40000
35000
30000
25000
20000
15000
10000
5000
0
1981-1985 1986-1990 1991-1995 1996-2000 2001-2005 2006-2010 2011-2015 2016-2020
Second Language Acquisition Language Attrition
Figure 31.1 Number of references on Google Scholar to “Second Language Acquisition” and
“Language Attrition,” 1981–2020
450
be hoped that future studies will avail themselves of the groundwork laid in these important
contributions to fully integrate our understanding of what goes on in language attrition with
the wider perspective on bilingual development.
The second gap in knowledge relates to the puzzling dichotomy noted above between the
extreme vulnerability to forgetting of childhood languages and their apparent stability post
puberty. A fuller understanding of how, when and why this change takes place has the potential
to contribute very substantially to our understanding of how age of learning and ultimate success
interact, and thus provide an answer to questions about maturational constraints and critical or
sensitive periods in bilingual development. To gain such an understanding, studies are needed
which assess the full range of ultimate proficiency in second language learners and native lan-
guage forgetters across all ranges of ages of learning and forgetting.
6 Conclusion
Since the 1980s, a well-established body of research on the spoken L1 use of immigrants who
use their L2 in daily life has demonstrated that such speakers come to show cross-linguistic
influence, a phenomenon known as language attrition. Such speakers differ from mono-
linguals with respect to the complexity, accuracy, and fluency of their native language, and
they may also develop a foreign accent. However, the differences tend to be minor and
subtle, indicative more of the online pressure of managing two highly active linguistic sub-
systems under time pressure than of any underlying representational changes to the native
system. As such, language attrition research presents a valuable added perspective to in-
vestigations of bilingualism and second language acquisition, as it allows researchers to se-
parate out the effects of online cross-linguistic influence from those of any potential failure to
establish native-like underlying representations.
Further Reading
Hicks, G., & Domínguez, L. (2020). A model for L1 grammatical attrition. Second Language Research,
36(2), 143–165.
Describes a model of language attrition based on the Feature Reassembly Hypothesis, which seeks to
explain under what conditions language attrition can be integrated into the developmental process
sustained by the language faculty.
Schmid, M. S., & Köpke, B. (2017). The relevance of first language attrition to theories of bilingual
development. Linguistic Approaches to Bilingualism, 7(6), 637–667.
Addresses the long-held distinction between attrition at the level of underlying structure (‘competence’)
and online processing (‘performance’) and examines whether it is methodologically possible and the-
oretically appropriate to distinguish the two.
Schmid, M. S. & Köpke, B. (Eds.) (2019). The Oxford handbook of language attrition. Oxford: Oxford
University Press.
Presents the state of the art of research in language attrition, showcasing different theoretical and
methodological approaches, and features of attrition at different phases of the lifespan, on different
linguistic levels and under different contexts and circumstances.
References
Altenberg, E. P. (1991). Assessing first language vulnerability to attrition. In H. W. Seliger & R. M.
Vago (Eds.), First language attrition (pp. 189–206). Cambridge: CUP.
Andersen, R. W. (1982). Determining the linguistic attributes of language attrition. In R. D. Lambert &
B. F. Freed (Eds.), The loss of language skills (pp. 83–118). Rowley, MA: Newbury House.
Badstübner, T. (2011). L1 attrition: German immigrants in the U.S. PhD dissertation, University of
Arizona at Tuscon.
451
Monika Schmid
Bergmann, C., Nota, A., Sprenger, S. A., & Schmid, M. S. (2016). L2 immersion causes non-native-like
L1 pronunciation in German attriters. Journal of Phonetics, 58, 71–86.
Bergmann, C., Sprenger, S., & Schmid, M. S. (2015). The impact of language co-activation on L1 and
L2 speech fluency. Acta Psychologica, 161, 25–35.
Bhatia, T. K., & Ritchie, W. C. (Eds.) (2004). The handbook of bilingualism. Oxford: Blackwell.
Bilgin, O. (2016). Frequency effects in the processing of morphologically complex Turkish words. Master
Thesis, Boğaziçi University, Istanbul. Retrieved from http://st2.zargan.com/public/resources/
turkish/frequency_effects_in_turkish.pdf
Bulté, B., & Housen, A. (2012). Defining and operationalising L2 complexity. In A. Housen, F. Kuiken,
& I. Vedder (Eds.), Dimensions of L2 performance and proficiency (pp. 21–46). Amsterdam: John
Benjamins.
Bylund, E. (2019). Age effects in language attrition. In M. S. Schmid & B. Köpke (Eds.), The Oxford
handbook of language attrition (pp. 277–286). Oxford: Oxford University Press.
Chamorro, G., Sturt, P., & Sorace, A. (2016). Selectivity in L1 attrition: Differential object
marking in Spanish near-native speakers of English. Journal of Psycholinguistic Research, 45(3),
697–715.
Chang, C. B. (2012). Rapid and multifaceted effects of second-language learning on first-language
speech production. Journal of Phonetics, 40(2), 249–268.
Cook, V. (2003). The changing L1 in the L2 user’s mind. In V. Cook (Ed.), Effects of the second
language on the first (pp. 1–18). Clevedon: Multilingual Matters.
de Leeuw, E. (2019). Phonetic attrition. In M. S. Schmid & B. Köpke (Eds.), The Oxford handbook of
language attrition (pp. 204–217). Oxford: Oxford University Press.
de Leeuw, E., Schmid, M. S., & Mennen, I. (2010). The effects of contact on native language pro-
nunciation in an L2 migrant setting. Bilingualism: Language and Cognition, 13(1), 33–40.
Dostert, S. (2009). Multilingualism, L1 attrition and the concept of ‘native speaker’. PhD thesis,
Heinrich-Heine Universität Düsseldorf.
Doughty, C. J., & Long, M. H. (Eds.) (2003). The handbook of second language acquisition. Oxford:
Wiley Blackwell.
Dussias, P. E., Valdés Kroff, J. R., Johns, M., & Villegas, Á. (2019). How bilingualism affects
syntactic processing in the native language: Evidence from eye-movements. In M. S. Schmid & B.
Köpke (Eds.), The Oxford handbook of language attrition (pp. 98–107). Oxford: Oxford
University Press.
Flege, J. E. (1987). The production of ‘new’ and ‘similar’ phones in a foreign language: Evidence for the
effect of equivalence classification. Journal of Phonetics, 15, 47–65.
Flege, J. E. (2002). Interactions between the native and second-language phonetic systems. In P.
Burmeister, T. Piske & A. Rohde (Eds.), An integrated view of language development: Papers in honor
of Henning Wode (pp. 217–244). Trier, Germany: Wissenschaftlicher Verlag.
Flege, J. E., & Eefting, W. (1987). Cross-language switching in stop consonant perception and pro-
duction by Dutch speakers of English. Speech Communication, 6(3), 185–202.
Granena, G., & Long, M. H. (2013). Age of onset, length of residence, language aptitude, and ultimate
L2 attainment in three linguistic domains. Second Language Research, 29(3), 311–343.
Gürel, A. (2011). In search for a unified model of L2 acquisition and L1 attrition: A commentary for the
Interface Hypothesis. Linguistic Approaches to Bilingualism, 1(1), 39–42.
Hicks, G., & Domínguez, L. (2020). A model for L1 grammatical attrition. Second Language Research,
36(2), 143–165.
Hopp, H., & Schmid, M. S. (2013). Perceived foreign accent in first language attrition and second
language acquisition: The impact of age of acquisition and bilingualism. Applied Psycholinguistics,
34(2), 361–394.
Iverson, M. (2012). Advanced language attrition of Spanish in contact with Brazilian Portuguese. PhD
thesis, University of Iowa.
Jackson, C. N., McDermott, L., & Schmid, M. S. (2011). Changing syntactic preferences in L1 attriters
of German. Paper presented at the 7th International Symposium on Bilingualism, Oslo, June 2011.
Jarvis, S. (2019). Lexical attrition. In M. S. Schmid & B. Köpke (Eds.), The Oxford handbook of lan-
guage attrition (pp. 241–250). Oxford: Oxford University Press.
Karayayla, T. (2020). Effects of first language attrition on heritage language input and ultimate at-
tainment: Two generations of Turkish immigrants in the UK. In B. Brehmer, J. Treffers-Daller, &
D. Berndt (Eds.), Lost in transmission: The role of attrition and input in heritage language develop-
ment (pp. 34–69). Amsterdam: John Benjamins.
452
Karayayla, T., & Schmid, M. S. (2019). First language attrition as a function of age at onset of bi-
lingualism: First language attainment of Turkish–English bilinguals in the United Kingdom.
Keijzer, M. (2007). Last in first out? An investigation of the Regression Hypothesis in Dutch emigrants in
anglophone Canada. PhD thesis, Vrije Universiteit, Amsterdam.
Köpke, B., & Schmid, M. S. (2004). Language attrition: The next phase. In M. S. Schmid, B. Köpke,
M. Keijzer, & L. Weilemar (Eds.), First language attrition: Interdisciplinary perspectives on metho-
dological issues (pp. 1–43). Amsterdam: John Benjamins.
Kroll, J. F., & de Groot, A. M. B. (Eds.) (2005). Handbook of bilingualism. Oxford: Oxford University
Press.
Lardiere, D. (2009). Some thoughts on the contrastive analysis of features in second language acqui-
sition. Second Language Research, 25(2), 173–227.
MacWhinney, B. (2019). Language attrition and the competition model. In M. S. Schmid & B. Köpke
(Eds.), The Oxford handbook of language attrition (pp. 7–17). Oxford: Oxford University Press.
Major, R. C. (1992). Losing English as a first language. The Modern Language Journal, 76(2), 190–208.
Mayr, R., Price, S., & Mennen, I. (2012). First language attrition in the speech of Dutch-English
bilinguals: The case of monozygotic twin sisters. Bilingualism: Language and Cognition, 15(4),
687–700.
Mennen, I. (2004). Bi-directional interference in the intonation of Dutch speakers of Greek. Journal of
Phonetics, 32, 543–563.
Olshtain, E., & Barzilay, M. (1991). 10 Lexical retrieval difficulties in adult language attrition. In H. W.
Seliger & R. M. Vago (Eds.), First language attrition (pp. 139–150). Cambridge University Press.
Opitz, C. (2011). First language attrition and second language acquisition in a second language en-
vironment. PhD thesis, Trinity College Dublin.
Opitz, C. (2019). A complex dynamic systems perspective on personal background variables in L1
attrition. In M. S. Schmid & B. Köpke (Eds.), The Oxford handbook of language attrition
Ortega, L. (2012). Interlanguage complexity: A construct in search of theoretical renewal. In B.
Kortmann & B. Szmrecsanyi (Eds.), Linguistic complexity: Second language acquisition, in-
digenization, contact (pp. 127–155). Berlin: Walter de Gruyter.
Paradis, M. (2007). L1 attrition features predicted by a neurolinguistic theory of bilingualism. In
Köpke, B., Schmid, M. S., Keijzer, M., & Dostert, S. (Eds.), Language attrition: A theoretical
perspective (pp. 121–134). Philadelphia/Amsterdam: John Benjamins.
Pavlenko, A. (2004). L2 Influence and L1 attrition in adult bilingualism. In M. S. Schmid, B. Köpke,
M. Keijzer, & L. Weilemar (Eds.), First language attrition: Interdisciplinary perspectives on metho-
dological issues (pp. 47–59). Amsterdam: John Benjamins.
Pavlenko, A. (2010). Verbs of motion in L1 Russian of Russian-English bilinguals. Bilingualism:
Pavlenko, A., & Malt, B. C. (2011). Kitchen Russian: Cross-linguistic differences and first-language
object naming by Russian-English bilinguals. Bilingualism, 14(1), 19.
Perpiñán, S. (2011). Optionality in bilingual native grammars. Language, Interaction and Acquisition,
2(2), 312–341.
Putnam, M. T., & Sánchez, L. (2013). What’s so incomplete about incomplete acquisition?: A prolego-
menon to modeling heritage language grammars. Linguistic Approaches to Bilingualism, 3(4), 478–508.
Putnam, M. T., Perez-Cortes, S., & Sánchez, L. (2019). Language attrition and the feature reassembly
hypothesis. In M. S. Schmid & B. Köpke (Eds.), The Oxford handbook of language attrition
Schmid, M. S. (2004). First language attrition: The methodology revised. International Journal of
Bilingualism, 8(3), 239–255.
Schmid, M. S. (2011). Language attrition. Cambridge: Cambridge University Press.
Schmid, M. S. (2014). The debate on maturational constraints in bilingual development: A perspective
from first-language attrition, Language Acquisition, 21(4), 386–410.
Schmid, M. S. (2019). Language attrition as a problem for LADO. In P. L. Patrick, K. Zwaan, & M. S.
Schmid (Eds.), Language analysis for the determination of origin (pp. 155–165). Chan: Springer.
Schmid, M. S. (2020). First language attrition. Oxford bibliographies in linguistics. Oxford: Oxford
University Press.
Schmid, M. S., & Beers Fägersten, K. (2010). Disfluency markers in L1 attrition. Language Learning,
60(4), 753–791.
453
Monika Schmid
Schmid, M. S., & Keijzer, M. (2009). First language attrition and reversion among older migrants.
International Journal of the Sociology of Language, 200, 83–101.
Schmid, M. S., & Köpke, B. (2017). The relevance of first language attrition to theories of bilingual
development. Linguistic Approaches to Bilingualism, 7(6), 637–667.
Schmid, M. S., & Köpke, B. (2019). Introduction. In M. S. Schmid & B. Köpke (Eds.), The Oxford
handbook of language attrition (pp. 1–4). Oxford: Oxford University Press.
Schmid, M. S., & Köpke, B. (Eds.) (2019). The Oxford handbook of language attrition. Oxford: Oxford
University Press.
Schmid, M. S., & Yılmaz, G. (2018). Predictors of language dominance: An integrated analysis of first
language attrition and second language acquisition in late bilinguals. Frontiers in Psychology,
9, 1306.
Seliger, H. W., & Vago, R. M. (1991). The study of first language attrition: An overview. In H. W.
Seliger & R. M. Vago (Eds.), First language attrition (pp. 3–15). Cambridge: Cambridge University
Press.
Sharwood Smith, M. A., & van Buren, P. (1991). First language attrition and the parameter setting
model. In H. W. Seliger & R. M. Vago (Eds.), First language attrition (pp. 17–30). Cambridge:
Sorace, A. (2005). Selective optionality in language development. In L. Cornips & K. Corrigan (Eds.),
Syntax and variation: Reconciling the biological and the social (pp. 46–111). Amsterdam: John
Benjamins.
Steinhauer, K., & Kasparian, K. (2019). Electrophysiological approaches to L1 attrition. In M. E.
Schmid, B. E. Köpke, M. C. Cherciov, T. C. Karayayla, M. C. Keijzer, E. C. De Leeuw, T. H.
Mehotcheva, S. C. Montrul & M. C. Polinsky (Eds.), The Oxford handbook of language attrition
(pp. 146–165). Oxford University Press.
Stolberg, D., & Münch, A. (2010). “Die Muttersprache vergisst man nicht”–or do you? A case study in
L1 attrition and its (partial) reversal. Bilingualism: Language and Cognition, 13(1), 19–31.
Tsimpli, I. M., Sorace, A., Heycock, C., & Filiaci, F. (2004). First language attrition and syntactic
subjects: A study of Greek and Italian near-native speakers of English. International Journal of
Bilingualism, 8(3), 257–277.
Vago, R. (1991). Paradigmatic regularity in first language attrition. In H. W. Seliger & R. M. Vago
(Eds.), First language attrition (pp. 241–252). Cambridge: Cambridge University Press.
Varga, Z. (2012). First language attrition and maintenance among Hungarian speakers in Denmark. PhD
thesis, Aarhus University, Denmark.
Yılmaz, G. (2011). Complex embeddings in free speech production among late Turkish-Dutch bilin-
guals. Language, Interaction and Acquisition, 2(2), 251–275.
Yılmaz, G., & Schmid, M. S. (2012). L1 accessibility among Turkish-Dutch bilinguals. Mental Lexicon,
7(3), 249–274.
454
INDEX
Page numbers followed by “n” indicate a note
Abe, M. 178 processing, individual differences in 75–76;

Abercrombie, D. 148, 162 critical issues and topics 71–73; future
accent discrimination 165 directions 78–79; historical perspectives 70–71;
accentedness 147, 201 inhibitory control, individual differences in 75;
acoustic analysis 378 L2 speech learning, individual differences in
acoustic processor 27 73–74; L2 speech measures 76; L2 speech
ACTFL Oral Proficiency Interview (ACTFL production mechanisms and processes 69–70;
OPI) 132 recommendations for practice 77–78; working
ACT theory 190 memory (WM), individual differences in 74
Adams, C. 202 Arevart, S. 276
Adolphs, S. 222 Arshad 337
age of onset of L2 learning (AOL) 71 Arslan, L. M. 205
Ahmadian, M. J. 35n2 Artemeva, N. 62
Ahrens, B. 432, 433 articulatory gestural scores 27
Akiyama, Y. 234, 237, 305 articulatory score 27
Albl-Mikasa, M. 436, 437, 438 artificial intelligence (AI) 310
Alexander, R. 316, 324, 325 ASTM (American Society for Testing and
Alexander, S. T. 250, 251 Materials) International 430
Al-Hoori, A. H. 50 Atkins, J. 402
Al Masaeed, K. 60 attention 31; individual differences in 75;
Altenberg, B. 288 management 29; shifting 29
American Association of Corpus Linguistics attentional capacity 33
(AACL) 122 attention–control system 33
American Council on the Teaching of Foreign Attitude Motivation Test Battery 85
Languages (ACTFL) 331 Audiolingual Method 148
Ammar, A. 233 auditory processing, individual differences
Analysis of Speech (AS) units 193 in 75–76
Andersen, R. W. 444, 445, 446 Austin, J. L. 246
Anderson, J. R. 190 authenticity and native-speaker standard 218
Anderson-Hsieh, J. 175 autism spectrum disorder (ASD) 413
Angelelli, C. 436 automatic speech recognition (ASR) technology
Anya, U. 61 302, 381
aptitude and individual differences 68; attention, Automatization in Communicative Contexts of
individual differences in 75; auditory Essential Speech Segments (ACCESS) 196
455
Index
automatized processes 29 Brown, R. 100

Aw, H. T. 317 Brunner, M-L. 349
Azuma, T. 373 Bruton, A. 276
Bühler, H. 431, 432
Bachman, L. F. 131 Bui, T. 319
Backman, N. 202 Burdelski, M. J. 64
Baddeley, A. 74 Burkhauser, S. 332
Baddeley, A. D. 28, 29, 31, 34 Burnham, D. 207
Baigorri-Jalón, J. 429 Burns, A. 244, 317, 324, 344
Baker, A. 169 Burt, M. K. 100
Baker, W. 149, 204, 205, 375, 376 Buysse, L. 116
Ballinger, S. 333
Baran-Lucarz, M. 88 Cadierno, T. 286
Barcomb, M. 306 Calderón, M. 280
Barcroft, J. 279 CALF (complexity, accuracy, lexis, and fluency)
Bardovi-Harlig, K. 105, 246, 247, 248, 287 research 196
Barzilay, M. 445 CALL tools 300, 303
Bayes, S. 254 CAL Oral Proficiency Exam (COPE) 332
Bayyurt, Y. 351 Cambridge and Nottingham Corpus of Discourse
Beers Fägersten, K. 446 in English (CANCODE) 277
Benrabah, M. 168 Cameron, L. 41
Bent, T. 163, 164 Canagarajah, S. 346
Bergeron, A. 176 CAPA model 339–340
Biber, D. 216, 217, 220, 221, 288 Cardoso, W. 304, 306, 307
bilingualism 9 Caregiver Speech (CS) 101
bilingual models of speaking 9, 12–14; code- Carless, D. R. 317
switching (CS) 17–18; critical issues and topics Carlet, A. 381
14–17; current contributions and research Carroll, J. B. 70
17–19; future directions 20; historical Carter, R. 216, 217, 222
perspectives 9–14; individual differences (ID), Caspi, T. 48
role of 18–19; Levelt model 10–12; research Cecot, M. 433
methods 19 Cedergren, H. 308n3
bilingual production model 16, 30 Center for Applied Linguistics (CAL) 332
Bilingual Syntax Measure (BSM) 100 Center for Applied Second Language Studies
bimodal bilinguals 17 (CASLS) 332
Björkman, B. 266, 349 central executive 28
Blueprint of the Speaker; and L2 oral production chalk talk 62
29–30; working memory and attention in 28–29 Chan, H. 43
Boers, F. 287, 288, 289, 292, 293 Chan, K. Y. 153
Bohlke, D. 309 Chang, L. Y. 207
Bolander, M. 286 Chang, S.-Y. 263
Bond, Z. S. 203, 204, 205 Change Point Analyses 47
Bongaerts, T. 13, 30 Chanier, T. 269
Borden, G. 380 Chapelle, C. 299, 300, 305, 308, 309
Borg, S. 324 CHAT (Codes for Human Analysis of
Bosker, H. R. 191, 192 Transcripts) transcription 120
Botes, E. 89 Chen, F. 163
Bourdieu, P. 62 Chen, H. C. 205, 209
Bradlow, A. 163, 164 Chen, Q. 317
Bradlow, A. R. 375, 380 Cheung, A. 432
Braun, B. 205 Chien, S. 307
Brazil, D. 215, 216 child L2 speakers with LCDs see language and
Brezina, V. 120 communication disorders (LCDs),
British National Corpus (BNC) 277 children with
Broersma, M. 17 Chiu, T.-L. 308
Brofenbrenner, U. 91 Cho, S. 430
Browman, C. P. 41 Chomsky, N. 112, 189
Brown, A. 135, 136, 169 Chopin, K. 352
456
Index
classroom-based face-to-face learning 57 context-sensitive “stress-deafness” 204

Clifford, R. 132 contextualization, awareness, practice, and
Cobb, T. 280 autonomy (CAPA) 337–339
code-blending 18 controlled processes 29
code-switching (CS) 9, 12, 14, 17–18, 361; control control mechanisms in code-switching 17
mechanisms in 17 conversational analysis (CA) 101
cognate 279 conversational interaction studies 229; corrective
cognitive fluency, utterance fluency and 194–195 feedback (CF) 232–233; corrective feedback
cognitive psychology research 279 (CF), L2 learning via 233–234; future
cognitive resources 33 directions 237; historical perspectives 230–231;
Cogo, A. 347, 348, 349, 350 input, negotiated interaction, and output
Cohen, A. D. 102, 244, 253 231–232; language aptitude 235–236; noticing
coherence 432–433 233; oral synchronous computer-mediated
cohesion 433 communication (SCMC) 234; task complexity
Cokely, D. 434 234–235; working memory (WM) 235
Collins, L. 108, 219, 223 conversation analysis (CA) method 136, 244
Common European Framework of Reference for Cook, V. 402, 442
Languages (CEFR) 152, 332, 435 corpus-based descriptions 221
communication apprehension 86 corpus building and research design 118–119
communication strategies 261 Corpus of Collaborative Oral Tasks (CCOT)
communication units (CUs) 88 117–118
communicative demands of the workplace Corpus of Contemporary American English
360–362 (COCA) 277
communicative language teaching (CLT) 217, corpus revolution 276
287, 316, 317 corrective feedback (CF) 232–233; L2 learning via
community of practice 266 233–234
Complex Adaptive Systems 41 Costa, A. 15
Complex Dynamic Systems Theory (CDST) 39; Cots, J. M. 353
critical issues and topics 42–43; current Council of Europe 152, 332
contributions and research 43–45; future Coupland, N. 437
directions 49; historical perspectives 40–42; covert monitoring 27
recommendations for practice 48–49; research covert self-repairs 28
methods 45–48 COVID-19 pandemic 64, 155, 306
complexity 445–446 Crago, M. 58
Complexity, Accuracy, and Fluency (CAF) 43, 44 Craig, D. A. 137
complex memory tasks 31 Crawford, W. 118
comprehensibility 147, 162, 174, 202; critical Crossley, S. A. 289
issues 175–178; current contributions and cross-linguistic influence (CLI) 9, 12, 16, 17, 18,
research 178–180; dynamic 179; future 388, 389–390
directions 183; historical perspectives 174–175; Crowther, D. 152
and (imagined) interlocutor 177–178; and Cucchiarini, C. 191
linguistic content of speech 176; pedagogically culturally and linguistically diverse (CLD)
relevant 178–179; and processing fluency individuals 401–402
176–177; recommendations for practice culture and workplace 362–363
181–182; research methods 180–181; socially Cummins, J. 275
flexible 179–180; and understanding 175–176 cumulative effects hypothesis (CEH) 414
comprehension system 27–28 curriculum issues in teaching L2 speaking 314;
computer-assisted pronunciation teaching speaking lesson, story of 315–318; teacher
(CAPT) 151 research on L2 speaking 318–324; way forward
computer-mediated communication (CMC), 324–325
speaking and 305–306 Cutler, A. 204
concept-based instruction 56
concept–lemma mappings 13 Daller, H. 275
conceptualization 11, 15, 29 Daly, N. 361, 363
consonants 163 Darcy, I. 74, 75, 209
construct irrelevant trait 135 Davis, L. 136, 137
content and language integrated learning Dawrant, A. 434
(CLIL) 328 Day, E. 335
457
Index
debilitative anxiety 84 Duff, P. 365

de Bot, K. 12, 13, 15, 16, 17, 30, 41 Dulay, H. C. 100
declarative memory 30 Dupoux, E. 204
Declerck, M. 17 Durán, L. K. 421
De Cock, S. 287 Dutch-English bilinguals 17
De Costa, P. 62 Dykstra-Pruim, P. 43
Degueldre, C. 436 dynamic assessment (DA) 407
Dejean, L. 427 dynamic comprehensibility judgements 179
De Jong, N. H. 191, 193, 194, 195, 197 Dynamic Systems Theory (DST) 83
De la Fuente, M. J. 275 Dynamic Turn 40, 41
Delais-Roussarie, E. 205
de Leeuw, E. 103, 447 Echevarría, J. 336
Delft, L. E. Van. 204 Effort Model 433
Dellwo, V. 204 EFL learners 205, 208
Dénes, M. 191 Egbert, J. 300, 309
De Ruyter, J. P. 20 Ehrenreich, S. 266
Derwing, T. M. 3, 4, 105, 108, 149, 150, 152, 154, ELAN (EUDICO Linguistic Annotator) 120
155, 162, 167, 174–175, 176, 177, 178, 180, 190, Electromagnetic Articulograph (EMA) 208
191, 192, 193, 195, 196, 197, 204, 253, 254, 344, Elgort, I. 280
352, 353, 364, 365, 366, 374, 375, 380, 381 ELIZA 301, 302
descriptive corpus analyses 221–222 Ellis, R. 43, 233, 286
designer methods 148 embedded subsystems 40
Desimone, L. M. 324 Emmorey, K. 428
de Souza, H. K. D. 381 Eng, K. 207
Deterding, D. 346 English as a foreign language (EFL) learners
developmental language disorder (DLD) 413 314, 350
Dewaele, J.-M. 85, 87, 178 English as a Lingua Franca (ELF) interactions
Dewey, M. 347, 348, 349, 350 263, 266, 344, 346–347, 348
Diao, W. 59 English–Hungarian–Persian children 16
Diemer, S. 349 English language teaching (ELT) 346
Diepenbroek, L. 254 English-medium university settings 60
digital games, speaking and 306 English sounds, vowel diagram for 45
digital tools 119–121 episodic buffer 28, 29
Ding, S. 151 episodic memory 30
discourse completion tasks (DCTs) 105, 246 ergodicity 46
discourse focus 25 Eriks-Brophy, A. 58
discourse management 220 Erman, B. 288
Disner, S. 160 error correction 12
domain-general auditory processing skills 75 Escamilla, K. 331
Dörnyei, Z. 18, 41, 92, 135, 287, 318 Eskildsen, S. W. 286
Doughty, C. J. 102 ESL learners 205, 209
Douglas Fir Group (DFG) 56, 57 ESP (English for Specific Purposes) class 314–315
Down syndrome (DS) 414 event-related potentials (ERP) 376
dual language children, issues in assessment with
415, 417; bilingual input and development face-to-face communication 131
factors 417–418; cultural and psycho-social face-to-face interviews 136–137
factors 418–419; profile effects in child L2 face-to-face learning 57
acquisition 418; strategies for assessment facilitative anxiety 84
419–420 Fafulas, S. 219
dual language children with LCDs 414–417, 420; Felker, E. R. 194
supporting both languages of children with Ferguson, C. A. 101
LCDs 421; switching to monolingual Field, J. 203, 279
development in the L2 420–421 Fillmore, C. 188
dual language learners (DLLs) 413, 414, 415, Fillmore, C. J. 102
419–420 first language (L1) speaking 69
Ducasse, A. M. 136 first language attrition 442; accentedness and
458
Index
accuracy of pronunciation 447; complexity Frota, S. N. 101

445–446; fluency 446–447; future directions Fulcher, G. 131, 136
450–451; historical perspectives 443–444; functional load principle 169
recommendations for practice 450–451; Fung, H. S. H. 206
research paradigms 448; theoretical
frameworks 449 Gabriel, C. 209
Firth, A. 41, 266, 347, 349 Galaczi, E. D. 135
Flege, J. E. 42, 147, 202, 374, 377, 379, 447 Galante, A. 155
fluctuating 29 Galts, T. 205
fluency 188, 446–447; critical issues and topics Gananathan, R. Y. 206
190–191; current contributions and research Garcia, C. 151
191–196; future directions 197; grammar García, O. 347
supporting 220–221; historical perspectives Gardiner, I. A. 346
189–190; L1–L2 relationship in 195–196; Gardner, R. C. 83, 85
recommendations for practice 196–197; Garner, J. R. 114
see also utterance fluency Gass, S. 102, 150
fluency training, output activities and 293 Gatbonton, E. 308n3
Fogerty, D. 163 Georgiadou, E. 33
Fokes, J. 204 gestures and speaking in L2 learning 386; critical
Folse, K. 223 issues and topics 388–389; current
Forced Choice Identification (FCID) task 375, contributions and research 389–391; future
379–380 directions 392–393; historical perspectives
foreign accent 399 387–388; recommendations for practice 392;
foreign accent modification/management/ research methods 391–392
reduction (FAM) 399; assessment issues Ghanem, R. 121
405–406; training issues 406–407 Ghazi-Saidi, L. 408
foreigner talk (FT) 101 Gile, D. 429, 434, 436
foreign language anxiety 83 Giles, H. 437
foreign language classroom anxiety (FLCA) 83 Gilquin, G. 114, 221
Foreign Language Classroom Anxiety Scale Giustini, D. 431
(FLCAS) 86, 88–89, 91 Gkonou, C. 91
foreign-language instruction 78 Godfroid, A. 207
Foreign/Second Language Anxiety 85 Goh, C. 244
Foreign Service Institute (FSI) 132 golden speaker 151
formal learning 365–367 Goldinger, S. D. 373
form-focused instruction (FFI) 334 Goldman-Eisler, F. 189
formulaic sequences (FSs) 285; current Goldstein, L. 41
contributions and research 291; difficulties Goldstein, T. 361
290–291; future directions 294; historical Gonzalez-Barrero, A. M. 417
perspectives 286–288; in oral fluency 289–290; Goo, J. 233, 237
output activities and fluency training 293; Gordon, J. 209
provide authentic input authentic input 292; Götz, S. 114, 117
recommendations for practice 292–293; Gradman, H. L. 286
research methods 291–292; and speaking grammar 215; authenticity and native-speaker
proficiency 289; teaching explicitly 292–293; standard 218; descriptive corpus analyses
ubiquity of 288 221–222; exploring grammatical choices in
formulator 25 context 224; frequent and useful constructions
Fortkamp, M. B. M. 32, 33 223–224; future directions 225–226; historical
Fortune, T. W. 332, 333 perspectives 216–217; multidimensionality and
Fouz-González, J. 155, 305 language change 217–218; oral practice,
Fox, J. 62 different types of 224; pedagogical approaches
fractals 40 225–226; perception and judgement studies
Freed, B. F. 190, 194 222–223; and phonology 221; research methods
French Learner Language Oral Corpora 221–223; and sociolinguistic competence 225;
(FLLOC) 225 for speaking across languages 225; speaking
frequent and useful constructions 223–224 and assessment 218–219; speaking goals,
Friginal, E. 115 selecting features that support 223; of speech
Frost, D. 209 219–220; speech events, differentiating 216;
459
Index
spoken grammar as non-standard 218; spoken Hiver, P. 50

registers, grammatical features and 220; Hoang, H. 288
supporting fluency 220–221 Hoenkamp, E. 25
grammar supporting fluency 220–221 Hong Kong English (HKE) 206
grammatical encoding 26 Hooper, J. 286
Granena, G. 70 Horst, M. 279, 280
Granger, S. 294 Horwitz, E. K. 83, 85, 86, 88
graphical tools 47 House, J. 244, 247, 248, 287, 347, 349
Green, D. W. 14, 15, 18, 30 Howard, K. M. 64
Gregersen, T. 90, 91, 92, 100 Hu, C.-F. 279
Grenon, I. 373 Hu, X. 74
Grice, H. P. 246 Huang, B. H. 204, 205
Griffin, G. F. 280 Huensch, A. 108, 193, 195
Grosjean, F. 15, 16, 20 Hulstijn, J. 42
Gu, Y. 408 human judges versus acoustic measures 378–379
Gudmestad, A. 116, 219 Humes, L. 163
Guiberson, M. 402 Humphries, S. 317, 324
Guion, S. G. 205 Hutchinson, S. P. 202
Gullberg, M. 20, 269
Gumperz, J.J. 169, 202, 203, 247, 362 IELTS 218
Gut, U. 117, 204 Iglesias Fernández, E. 433
illocutionary force indicating device (IFID)
Hahn, L. 150, 161, 167, 168, 169 249, 251
Hakuta, K. 286 immersion and dual language (ImDL)
Hale, S. 432 classrooms, oral language use in 333–334
Hall, M. D. 153 immersion and dual language (ImDL) education
Hamman, L. 334 328–329
Hanania, E. A. 286 incidental acquisition 275
Hanau, C. 167 individual differences (ID) 83; in attention 75; in
Handford, M. 266 auditory processing 75–76; in inhibitory
Hansen-Edwards, J. H. L. 156, 163, 164, 205 control 75; in L2 speech learning 73–74;
Hao, Y. C. 205 measures 76–77; role of 18–19; in working
Hardison, D. M. 208 memory (WM) 74
Harley, B. 329, 330, 334, 335 inhibitory control, individual differences in 75
Harley, T. A. 280 Inhibitory Control Model (ICM) 14, 15
Hartsuiker, R. 16, 17 Initiate-Respond-Follow-up (IRF) sequences 317
Hassani, K. 304 inner speech 56
He, X. 206 intelligent personal assistants (IPAs) 302
Hepford, E. A. 44 intelligibility 147, 160, 162, 163, 399; defined 162;
Her (2013) 299 listener-based account of 160; measuring
Herd, W. 381 165–167; speech see speech intelligibility
Herdina, P. 41 intelligibility judgements, linguistic variables in
heritage-language (HL) individuals 402 163–165
Hermans, D. 16 interactional competence (IC) 134, 136, 137;
Hernández, A. M. 333 current studies on 135–138
hesitation markers 11 intercultural Communicative Language Teaching
Higgs, T. 132 (iCLT) 322, 323
High-Level Language Aptitude Battery inter-language (IL) 233
(Hi-LAB) 70 interlanguage speech intelligibility benefit
high-variability phonetic training (HVPT) 151, (ISIB) 163
155, 207 interlocutor, comprehensibility and 177–178
High Variability Pronunciation Training internal speech 27
(HVPT) 380 International Corpus of Learner English (ICLE)
Hilton, H. E. 220, 273 corpus 116
Hirano-Cook, E. 209 International Journal of Learner Corpus Research
Hirata, Y. 207 (IJLCR) 114
Hirst, D. 204 interpreter training 427; critical issues and topics
460
Index
430–431; current contributions and research L1–L2 relationship in fluency 195–196

431–434; future directions 436–438; historical L1 phonetics 148
perspectives 429–430; recommendations for L1 speech perception and production:
practice 434–436 development 373; relationship between
intersubjectivity 54, 56 373–374
intonation 202–203, 250–251, 432; in L2 L2 prosody: perception 206–207; production 206;
agreements and disagreements 250; learners’ training 207–208
use of 251–252 L2 speaking 2–3
intra-learner variability 43 L2 speech learning, individual differences in:
In’nami, Y. 178 cognitive aptitude sources of 74; experience-
Isaacs, T. 175, 181, 221 related sources of 73; sociopsychological
Ishihara, N. 244, 253 sources of 73–74
Iwashita, N. 193 L2 speech perception and production,
relationship between 372; acoustic analysis 378;
Jackson, C. N. 445 critical issues and topics 374–375; development
Jamieson, J. 305, 308 374–375; future directions 382; historical
Jenkins, J. 161, 168, 218, 346, 350, 351 perspectives 373–374; human judges versus
Jessner, U. 41 acoustic measures 378–379; L2 speech
Joe, A. 279, 367 perception and production at a fixed point in
Jonasson, J. 202 time 379–380; L2 speech perception training
Ju, Z. 332 and its impact on L2 production 380–381;
Jun, S. A. 204, 205 measuring perception 375–376; measuring
production 377–379; nature of L2 speech
K-12 classroom research 59 learning 374; recommendations for practice
Kahng, J. 192, 193, 195 381–382; speaking tasks 377–378; stimulus
Kainada, E. 205 characteristics 376; stimulus characteristics
Kaiser, D. J. 209 378; task type 375–376
Kalina, S. 429, 437 L2 willingness to communicate (L2 WTC) 87
Kalmar, T.(2001 361 Labov, J. 167
Kang, O. 153, 176, 244, 245, 252, 253, 255 Labov, W. 218
Karayayla, T. 445, 446 Ladefoged, P. 160
Kartchava, E. 233, 236 La Heij, W. 15
Kasper, G. 101, 134, 135, 136, 243 Lam, D. M. K. 135
Keijzer, M. 447 Lambert, W. E. 414
Keiser, W. 435 language and communication disorders (LCDs),
Kempen, G. 25 children with 413; capacity for dual language
Kerekes, J. 367 development in 415–417; dual language
Kermad, A. 244, 245 children, issues in assessment with 415,
Key Word In Context (KWIC) analyses 120 417–420; dual language children, issues in
Kim, J. 137 intervention with 415; dual language children
Kim, Y. 44, 235 with LCDs, issues in intervention with
King, J. 317 420–421; dual language development in
Kinginger, C. 61 414–415; future directions 422; historical
Kitano, K. 89 perspectives 414; recommendations for practice
Kobayashi, M. 60 421–422
Koester, A. 360 language anxiety (LA) 83; causes and correlates of
Kohn, K. 352 LA 86–87; current contributions and research
Kohnert, K. 421 89–91; dynamic approach 90–91; early models
Kopcynski, A. 433 84; future directions 92; historical perspectives
Köpke, B. 443, 444 84–85; and L2 achievement 89; nature and
Kormos, J. 12, 14, 17, 18, 25, 30, 74, 135, 191, conceptualization of LA 85–86; and oral
235, 434 performance 87–89; phases of development of
Kramsch, C. 134 84–85; pronunciation anxiety, concept of
Krashen, S. D. 2, 230 89–90; recommendations for practice 91–92
Kroll, J. 16, 20 language aptitude 235–236
Kroll, J. F. 15 language change, multidimensionality and
Kurz, I. 432 217–218
language control 17
461
Index
language-general abilities 418 listener-based account of intelligibility 160

language marginal 361 Littre, D. 115
language programmes 365–367 Liu, M. 265
language-related episodes (LREs) 320 Liu, Y. 263
languages, grammar for speaking across 225 LLAMA subtests 236
Languages and Social Networks Abroad Project Llanes, A. 353
(LANGSNAP) 116 Llurda, E. 351, 353
language selective models 15 Loewen, S. 234
language tags 15 logistic regression 119
language teaching, spoken vocabulary in 274–276 Lombard, E. 164
languaging 56 London-Lund corpus 112
Lantolf, J. P. 63 Long, M. H. 2, 101, 102, 230, 232, 235, 237
Larsen-Freeman, D. 41, 220, 224 loudness 201
Laufer, B. 281 Louvain Corpus of Native English Conversation
Laury, R. 225 (LOCNEC) 116
LeaP corpus 117 Louvain International Database of Spoken
learner corpora for spoken SLA enquiry 113 English Interlanguage (LINDSEI) 225
learners’ use of intonation 251–252 Low, E. L. 169
Lee, A. H. 380, 381 Lowie, W. M. 44, 48
Lee, A. R. 88 Lu, P-Y, & Corbett, J. 364
Lee, C. Y. 203, 205 Luk, J. 164
Lee, E. J. 237 Lumley, T. 135
Lee, T. 431 Lynch, T. 352
Lengeris, A. 205 Lyster, R. 232, 333, 335, 336, 337, 338, 339,
Lennon, P. 188, 189, 193 380, 381
Leopold, W. 100
Levelt, W. J. 194, 262, 434 MacIntyre, P. D. 83, 84, 86, 102
Levelt, W. J. M. 1, 9, 10–12, 14, 15, 24, 25, 26, 27, Mackey, A. 102, 230, 232, 233
28, 29, 30, 32, 34, 262 macroplanning 25
Levine, G. 41 MacWhinney, B. 449
Levis, J. M. 149, 150, 161 Main Language Frame model 14
Lewis, M. 287, 292 Makoni, S. 346
lexemes 27 Man, E. 63
lexical items 11, 30 Mandelbrot sets 40
lexical selection 15 Mann–Whitney U tests 89
lexical tone, L2 203 manual foreign accent 388
lexicon 30 Marcel, F. 307
Li, D. 55, 364 Martin, I. A. 155
Li, S. 235, 236 Martin–Beltrán, M. 57
Liakin, D. 381 Matsumoto, Y. 349
Liebert, R. M. 84 Mauranen, A. 349
Lima, E. D. 207 May, L. 136
Lima Júnior, R. M. 41 Maydosz, A. 402
Lin, A. 63 Maydosz, D. 402
Lin, C. Y. 207 McAllister, R. 202
Lin, P. M. S. 292 McCarthy, M. 216, 217, 222
Linck, J. A. 18 McEnery, T. 115
LINDSEI corpus 116–117 McGory, J. T. 204
linearization 25 McKay, S. L. 350
Lingua Franca Core (LFC) 346 McNamara, T. F. 131, 135
Linguistic Coding Differences Hypothesis 86 Mead, P. 433
linguistic content of speech, comprehensibility mediation 56
and 176 Meierkord, C. 349
linguistic insecurity 444 Mennen, I. 203, 205, 447
linguistic variables in intelligibility judgements mental lexicon 25
163–165 metalinguistic function 231
Linn, A. 217 metalinguistic talk 56
Lintunen, P. 189 Meunier, F. 115
462
Index
microplanning 25 Nicodemus, B. 428

Milton, J. 281 Niebuhr, O. 207
misidentification 403 Nilsen, A. P. 149
Mitchell, R. 286 Nilsen, D. L. F. 149
Miyake, A. 18 non-modular models 25
Mocanu, V. 351 non-native speakers (NNSs) 101, 399
modern language aptitude test (MLAT) 70 non-target language items 15
modularity 24–25 non-verbal gestures 40
modular models 25 Non-Word Repetition Task (NWRT) 407
Mojavezi, A. 35n2 Norman, D. A. 29
Mok, P. P. K. 204, 206 norm-referenced tools 403
Mompean, J. 305 noticing 233
monolingual development, switching to 420–421 noun phrases (NPs) 390
monolingualism 9 Nyugen, T. T. M. 247
Monte Carlo simulations 47
Moorman, C. 380 O’Brien, I. 193
Mora, J. C. 74, 193, 194, 292 O’Brien, M. G. 155, 176
Morett, L. M. 207 Observer Paradox 102
morpho-phonological information 12 O’Dell, F. 276
Morris, L. W. 84 O’Keeffe, A. 277
Mosca, M. 17 Olshtain, E. 445
motor instructions 27 oral fluency, role of FSs in 289–290
Moussalli, S. 304 oral language development 328;
multidimensionality and language change contextualization, awareness, practice, and
217–218 autonomy (CAPA) 337–339; future directions
Multilingual Corpus of Assignments – Writing 339–340; historical perspectives 329–331;
and Speech (MACAWS) 122 immersion and dual language (ImDL)
multilingual processing 9 classrooms, oral language use in 333–334;
multilingual resources 56; and repertoires 57 immersion and dual language (ImDL)
multiple morphological forms of a word 274 education 328–329; quasi-experimental
multi-word chunks 277–278 research 334–335; recent assessments of
Münch, A. 447 331–333; research on 329–331; scaffolding
Munro, M. J. 3, 4, 108, 149, 150, 151, 152, 153, 336–337
154, 162, 164, 165, 167, 168, 174–175, 176, 177, oral practice, different types of 224
178, 180, 192, 193, 196, 204, 253, 344, 353, 374 oral production (OP) 24
Munro, R. R. 202 oral proficiency interview (OPI) 132
Murakami, A. 48 oral synchronous computer-mediated
Murphy, J. 169 communication (SCMC) 234
Myers-Scotton, C. 14 Orellana, C. 407
My Fair Lady 148 Ortega-Llebaria, M. 204
Myles, F. 113, 286 ostensible apologies 250–251
outlanders 148
Nadig, A. 417 Output Hypothesis 231
Nagle, C. L. 178, 179, 382 over-identification 403
naive listener ratings 104 overt self-repairs 28
Nakatsuhara, F. 137 Oxford, R. 91
Nation, I. S. P. 196, 274, 276, 277, 278, 288, 293
National Accreditation Authority of Translators paired speaking tests 63
and Interpreters (NAATI) 430 Palmer, A. S. 131
native speaker (NS) 101, 248, 399 Paradis, J. 415, 418
native-speaker standard, authenticity and 218 Paradis, M. 30
Natural Language Processing (NLP) tools 43 paradoxical language effect 17
Navracsics, J. 16 parallel activation of L1 and L2 languages 16
negative evaluation, fear of 86 Park, G. P. 86
Newbold, D. 352 Park, H. 88, 430
Newgarden, K. 265 Parlak, Ö. 234
Newton, J. 196, 275, 320 Parmaxi, A. 306
Nguyen, B. T. T. 320
463
Index
PAROLE (PARallèle, Oral en Langue practice 253–255; research methods 253;

Etrangère) 276 sincere and ostensible apologies, perception of
parser 27 250–251; and speaking 244–245; speech acts
Participatory Action Research (PAR) project 322 and speech rate 249–250
pauses 433 PREFER approach 254
Pavlenko, A. 445 prelexical representation 27
Pawlak, M. 34 presentation–practice–production (PPP)
Peal, E. 414 approach 318–319
pedagogically oriented investigations of preverbal message 13–14
comprehensibility 178–179 preverbal plans 25
Pekrun, R. 84 processing-dependent measures 407
Pellicer-Sánchez, A. 292 processing fluency, comprehensibility and
Peng, G. 433 176–177
Pennington, M. C. 303 production system 25–27
Pennycook A. 346 proficiency, utterance fluency and 192–194
perceived fluency, utterance fluency and 191–192 pronunciation 147, 399; accentedness and
perception and judgement studies 222–223 accuracy of 447; critical issues and topics
perceptual theory 27 149–150; current contributions and research
Pérez Castillejo, S. 73 150–152; future directions 154–156; historical
performance drive approach (PDA) 136 perspectives 148–149; interventionist studies
Perpiñán, S. 440 149; naturalistic studies 149; recommendations
personal and interpersonal anxieties 86 for practice 153–154; research 155–156;
perspective taking 25 research methods 152–153; teaching 154–155
Petraki, E. 254 pronunciation anxiety (PA) 89–90
Phillips, E. 88 Pronunciation in Second Language Learning and
phonetic encoding 27 Teaching (PSLLT) conference 122, 409
phonetic plan see articulatory score pronunciation instruction (PI) 399
Phonetics Learning Anxiety Scale 88 prosody 201, 432; critical issues and topics
phonological encoding 27 202–205; current contributions and research
phonological loop 28 206–208; factors contributing to the acquisition
phonological memory 29 of 205; future directions 209–210; historical
phonological processing differences 69 perspectives 202; L2 intonation 202–203; L2
phonological score 27 lexical tone 203; L2 speech rhythm 204; L2
phonological short-term memory 18 stress 203–204; recommendations for practice
phonology, grammar and 221 208–209; research methods 208 see also L2
Pica, T. 231 prosody
Picavet, F. 209 psycholinguistic processes in L2 oral production
Pickering, L. 203, 250, 253, 254, 255 24; Blueprint of the Speaker and L2 oral
Pickering, M. 16 production 29–30; comprehension system
picture–word Stroop tasks 19 27–28; critical issues and topics 25–29; future
Piechurska-Kuciel, E. 89 directions 33–34; historical perspectives 24–25;
Pine, D. S. 84 production system 25–27; working memory
pitch 201, 250–251 and attention during L2 oral production 31–33;
Pöchhacker, F. 432 working memory and attention in the Blueprint
Poehner, M. E. 63 of the Speaker 28–29
Polat, B. 44 Pujiasututi, A. 365
post-focus compression (PFC) 206 Pygmalion 148
Potowski, K. 333 Pytlyk, C. 203
Poulisse, N. 13, 30
Praat 120 Qin, Z. 206
pragmatics 243–244; critical issues and topics 249; Quené, H. 204
data collection. means of 246–247; future
directions 255; historical perspectives 245–249; Rallon, G. 288
intonation in L2 agreements and disagreements Ranta, L. 232
250; L2 pragmatics research, speaking in Rapeer, L. 400
247–249; learners’ use of intonation as a cue of Rassaei, E. 233
illocutionary force 251–252; proficiency and Received Pronunciation (RP) 164
study abroad 252; recommendations for
464
Index
Reform Movement of 19th century Britain and self-monitoring 27, 32

Europe 217 self-organization 39
relation management 220 self-repairs 27, 28, 33
relative contribution model (RCM) 132 semantic memory 30
repairs 433 Setter, J. 204
reparandum 28 Setton, R. 434
reparatum 28 Shah, A. 408
researched pedagogy 318 Shahrokni, S. 300, 309
retrieval-induced forgetting (RIF) tasks 77 Shallice, T. 29
Révész, A. 234 Shapson, S. 335
Rhetorical Structure Theory (RST) 433 Shaw, G. B. 148
rhythm 201, 204 Sheldon, A. 374, 380
Riddiford, N. 367 Sheppard, B. E. 179
Riggenbach, H. 189 Shin, D. 278
Robinson, P. 234 Shively, R. 248
Rodgers, M. P. H. 277 Shlesinger, M. 432
Roehr-Brackin, K. 33 short-term phonological memory (PSTM) 74
Roever, C. 134, 135, 136 Sifakis, N. 351
Rojczyk, A. 155 Silent Way 148
Römer, U. 114 Silpachai, A. 207
Rosansky, E. J. 100 Simard, D. 28, 31, 33
Rose, K. R. 243 Simon task 77
Rosen, A. 117 simple memory tasks 31
Ross, S. J. 135 Simsek, E. 92
Rossiter, M. J. 197 sincere and ostensible apologies 250–251
Rothman, J. 404 Sinclair, J. 285
Rubin, D. L. 179 Sippel, L. 155
Rubio-Alcala, F. D. 91 Siyanova-Chanturia, A. 292, 294
Rühlemann, C. 216, 220 Skehan, P. 131
Ruivivar, J. 108, 219, 223 SketchEngine 120
Slobin, D. I. 25
Saffran, J. R. 373 Smiljanic, R. 164
Saito, K. 73, 178, 192, 209, 234, 274, 305 Smit, N. 50
Sakai, M. 380, 381 Smith, B. 149
Salaberry, M. R. 303, 308 Smith, K. A. 179
Santiago, F. 205 socially and contextually flexible construct,
Sapon, S. 70 comprehensibility as 179–180
scaffolding 56, 336–337 Social Turn 40, 41
Scarcella, R. C. 287 sociocultural approaches to speaking in SLA 54;
Schachter, J. 101 critical issues and topics 56–58; future
Schmid, M. S. 443, 444, 445, 446, 447 directions 64–65; historical perspectives 55–56;
Schmidt, R. W. 101, 286 K-12 classroom research 59; recommendations
Schmitt, N. 277 for practice 62–64; research methods 60–62;
Schmitz, J. 379 study abroad (SA) research 58–59; university
Schneider, K. P. 246 programme research 60
school-based additive bilingual programmes 328 sociocultural research, design of 62
Schreuder, R. 13 Sociocultural Theory (SCT) 41
Scovel, T. 84, 85, 88, 374 sociolinguistic competence, grammar and 225
Searle, J. R. 246 Song, W. 332
second language acquisition (SLA) 54, 56, 68, 83, Sorace, A. 404
189, 215, 334, 344 Soto, I. 280
Second Language Research Forum (SLRF) 122 speaker, short-term developmental process of 40
Segalowitz, N. 30, 188, 189, 194 speaking 10, 54, 99, 244, 299; bilingual models of
segmentation 120 see bilingual models of speaking; and
Seidlhofer, B. 346, 347, 348, 350, 351, 352 computer-mediated communication 305–306;
selection of language 15 critical issues and topics 303–304; current
selective attention 31 contributions and research 304–307; and digital
Seleskovitch, D. 429 games 306; future directions 309–310; historical
465
Index
perspectives 301–303; pragmatics and 244–245; assessment issues 403–404; L2-pronunciation

recommendations for practice 308–309; instruction 404–405; L2-related education and
research methods 307–308; and speech training 402–403; ongoing misidentification
technologies 305; and virtual reality 306–307 and disproportionality 403; recommendations
speaking and English as a lingua franca 344; for practice 408–409; SLP-FAM lead research,
critical issues and topics 347–348; current focus on 407–408; terminology considerations
contributions and research 348–349; English as 401–402
a Lingua Franca (ELF) interactions 346–347; Speech Learning Model (SLM) 42, 68
future directions 353; historical perspectives speech perception 372
345–347; recommendations for practice speech production process 10, 11, 12, 373
350–353; research methods 349–350 speech recognition systems 136
speaking assessment 130; critical issues and topics speech rhythm, L2 204
134; current contributions and research speech sound disorders (SSDs) 399
134–138; future directions 139; historical speech technologies, speaking and 305
perspectives 131–134; interactional spoken corpora 112; corpus building and research
competence, current studies on 135–138; design 118–119; Corpus of Collaborative Oral
recommendations for practice 138–139; Tasks (CCOT) 117–118; critical issues and
research methods 138 topics 113–116; current contributions and
speaking goals, selecting features that support 223 research 116–118; current gaps in the literature
Speaking Model 9, 11, 14 115; digital tools 119–121; future directions
speaking proficiency, formulaic sequences (FSs) 122; historical perspectives 112–113;
and 289 Languages and Social Networks Abroad
speaking research methodologies 99; analysis, Project (LANGSNAP) 116; LeaP corpus 117;
approaches to 104–105; critical issues and learner corpora for spoken SLA enquiry 113;
topics 105–106; current contributions and LINDSEI corpus 116–117; recommendations
research 106–107; future directions 108–109; for practice 121–122; research methods
historical perspectives 100–102; 118–121; research questions using corpora to
recommendations for practice 107–108; investigate spoken SLA 113–115
research methods 102–105; speech data, spoken grammar 221
collection of 102–104 spoken grammar as non-standard 218
speaking strategies 261 spoken registers, grammatical features and 220
speaking tasks 377–378 spoken vocabulary in language teaching 274–276
speech: grammar of 219–220; linguistic content “spontaneous” speech 103
of 176 Sridhar, K. K. 344
speech acts and speech rate 249–250 Sridhar, S. N. 344
speech comprehensibility see comprehensibility Standards-Based Measurement of Proficiency
speech correctionists 400 (STAMP) 332–333
speech errors 19 Staples, S. 118
speech events, differentiating 216 Stengers, H. 289
speech intelligibility 160; current contributions stimulus characteristics 378
and research 162–165; future directions stimulus onset asynchrony (SOA) 19
169–170; historical perspectives 162; Stolberg, D. 447
intelligibility, measuring 165–167; linguistic Strange, W. 374, 380
variables in intelligibility judgements 163–165; strategies, L2 speaking 261; critical issues and
recommendations for practice 167–169 topics 263–265; current contributions and
Speech-Language Pathologists (SLPs) and L2 research 265–266; future directions 269;
speakers 399; accent modification, focus on historical perspectives 261–263;
400–401; differential diagnosis, focus on 401; recommendations for practice 267–269;
foreign accent modification/management/ research methods 266–267
reduction (FAM) assessment issues 405–406; stress 201, 203–204
foreign accent modification/management/ stress-deafness 204
reduction (FAM) training issues 406–407; Strömmer, M. 364
future directions 409; historical perspectives stronger language 14
400–401; incomplete acquisition and attrition stroop tasks 77
in L2 clinical assessment 404; issues related to Student Oral Proficiency Assessment (SOPA) 332
SLP-FAM providers 404–405; L2 clinical study abroad (SA) research 58–59
assessment, focus on 407; L2 clinical Subasi, S. 89
466
Index
Sullivan, A. 403 transcription 119

Sundberg, R. 307 Trebits, A. 235
Sundqvist, P. 308 Tremblay, A. 206
supralaryngeal features 433 Trofimovich, P. 149, 176, 179, 204, 205, 221,
surface structure 26 308n3, 375, 376
Surtees, V. 59 Tsuchiya, K. 266
Suzuki, R. 225 Turco, G. 206
Swain, M. 2, 231, 275, 330, 333, 334 typically developing (TD) bilingual peers 416
Swan, M. 149
Sweet, H. 217 Uchihara, T. 274, 290
Sylvén, L. 308 Ullman, M. T. 16
syncloze test 436 under-identification 403
Szyszka, M. 87, 88 understanding, comprehensibility and 175–176
Ungoed-Thomas, H. 435
Taguchi, N. 244, 248, 249, 253 unintelligibility 160
Tajima, K. 204 university programme research 60
TalkBank system 120, 121 Ushioda, E. 318
Talmy, S. 59 utterance fluency: and cognitive fluency 194–195;
Tao, L. 203, 205 and perceived fluency 191–192; and proficiency
target language (TL) 90 192–194
Tarone, E. 333
task-based language teaching (TBLT) Vago, R. 443
275–276, 317 Valls-Ferrer, M. 193
task complexity 234–235 van Batenburg, E. 136
Tateyama, Y. 247, 248 Van Compernolle, R. A. 63
Tauroza, S. 164 van Heuven. V. 432
Tavakoli, P. 189, 290 Van Lancker-Sidtis, D. 288
Taylor Reid, K. 179, 180 Van Moere, A. 135
teacher professional development (TPD) 324 Van Riper, C. 401
teaching for intelligibility 168 Van Zeeland, H. 277
teaching L2 speaking: in a Vietnamese high school Varonis, E. M. 150
320–322; in a Vietnamese primary school Vasa, R. A. 84
318–320; in a Vietnamese University 322–324 verbal scaffolding 336
teaching vocabulary 273; corpus revolution 276; Verspoor, M. H. 41, 49
future directions 282; increasing lexical access Vettorel, P. 349
speed 280–281; multi-word chunks 277–278; video-conference technologies 136, 137
recommendations for practice 281–282; spoken virtual reality (VR) 303; speaking and 306–307
vocabulary in language teaching 274–276; visual sketch pad 29
teaching the spoken form of words 278–280; visual-spatial sketch pad 28
words needed for speaking and listening 277 vocabulary knowledge 30, 275
Tedick, D. J. 332, 333, 336, 338 voice onset time (VOT) 104
Teimouri, Y. 89 voice quality 433
test anxiety 86 vowel diagram for English sounds 45
text-to-speech synthesizers (TTS) 302 Vygotskian-inspired L2 studies 60, 62
Thai, C. 288 Vygotskian theory 54, 55
Thir, V. 350
Thomson, R. I. 105, 154, 155, 167, 375, 376, 379, Wade, L. 151
380, 381 Wagner, J. 41, 101
Timarová, Š. 435 Walker, N. 308n3
Timmis, I. 223, 224 Walker, R. 351–352
Timpe-Laughlin, V. 367 Wallace, G. 402
Tissi, B. 432 Wan, K. 251, 252
Tortel, A. 204 Wang, X. 205
Tóth, Z. 89 Warren, B. 288
Towell, R. 190, 193 Watts, P. 161, 168
Tracy-Ventura, N. 108, 193, 195 Waugh, E. 253
Trail Making Test 32 Wayland, R. P. 375, 376
training interpreters see interpreter training
467
Index
weaker language 14 365–367; future directions 367; historical

Webb, S. 277, 293 perspectives 359; learning to speak effectively
Wei, L. 347 at work 363–365
Weizenbaum, J. 301, 302 Wright, C. 189, 317
Welsh-English conversational speech 17 Wright, R. 335
Wennerstrom, A. 203 Wu, X. 209
White, J. 279
Widdowson, H. G. 345, 352 Xu, X. 333
Wiener, S. 207 Xue, H. 275
Wigham, C. R. 269
Willems, N. 202 Yan, X. 290
Wingate, U. 316, 317 Yates, L. 154, 244, 253, 254, 363, 364
Wisniewska, N. 292 Yenkimaleki, M. 432
Wolff, D. 234 Yilmaz, G. 445
Wolfram, W. 165 Yilmaz, Y. 236
Wood, D. 289, 293 Young, D. J. 86, 88
word family 274 Yu, H. 44
word-level intelligibility 169 Yuan, F. 43
words needed for speaking and listening 277
working memory (WM) 18, 235, 377; and
attention during L2 oral production 31–33; and Zhang, W. 265
attention in the Blueprint of the Speaker 28–29; Zhang, Y. 204
individual differences in 74 Zheng, D. 265
workplace, culture and 362–363 Ziegler, N. 234
workplace communication 359; communicative Zielinski, B. 154, 167, 203
demands of the workplace 360–362; culture Zimmerman, C. B. 275
and the workplace 362–363; formal learning
468

The Routledge Handbook of Second Language Acquisition and Speaking

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Routledge Handbook of Second Language Acquisition and Speaking

Uploaded by

Copyright:

Available Formats

THE ROUTLEDGE HANDBOOK OF

SECOND LANGUAGE ACQUISITION

Tracey M. Derwing is a Professor Emeritus of TESL at the University of Alberta, Canada,

Murray J. Munro is a Professor of Linguistics at Simon Fraser University, Canada.

Ron I. Thomson is a Professor of Applied Linguistics and TESL at Brock University,

The Routledge Handbooks in Second Language Acquisition are a comprehensive, must-have

Edited by Tracey M. Derwing, Murray J. Munro, and

Typeset in Times New Roman

1 Bilingual Models of Speaking 9

2 Psycholinguistic Processes in L2 Oral Production 24

3 A Complex Dynamic Systems Theory Perspective on Speaking in

4 Sociocultural Approaches to Speaking in SLA 54

5 Aptitude and Individual Differences 68

7 Speaking Research Methodologies 99

8 Spoken Corpora 112

9 Speaking Assessment 130

10 Pronunciation Learning and Teaching 147

11 Speech Intelligibility 160

12 Speech Comprehensibility 174

14 The Role of Prosody Across Languages 201

15 Grammar for Speaking 215

16 Conversational Interaction Studies 229

17 Pragmatics: Speaking as a Pragmalinguistic Resource 243

18 Second Language Speaking Strategies 261

19 Teaching Vocabulary 273

20 The Role of Formulaic Sequences in L2 Speaking 285

21 Technology for Speaking Development 299

22 Curriculum Issues in Teaching L2 Speaking 314

23 Oral Language Development in Immersion and Dual Language

24 Speaking and English as a Lingua Franca 344

25 Workplace Communication 359

26 The Relationship Between L2 Speech Perception and Production 372

27 The Relationship Between Gestures and Speaking in L2 Learning 386

28 Speech-Language Pathologists and L2 Speakers 399

29 Child L2 Speakers with Language and Communication Disorders 413

30 Training Interpreters 427

31 First Language Attrition 442

1.1 Levelt’s SPEAKING model (1995). Reprinted with permission 10

30.1 Process- and experience-based model of interpreter competence. Reprinted

1.1 The speech production process 13

Małgorzata Baran-Łucarz is an Assistant Professor at the Institute of English Studies,

Kathleen Bardovi-Harlig is a Provost Professor of Second Language Studies at Indiana

Walcir Cardoso is a Professor of Applied Linguistics at Concordia University. He conducts

Jaemyung Goo is a Professor in the Department of English Education at Gwangju National

Marianne Gullberg is a Professor of Psycholinguistics at Lund University, Sweden. She

Marlise Horst is an Associate Professor of Applied Linguistics (retired) at Concordia

Amanda Huensch is an Assistant Professor in the Department of Linguistics at the University

Noriko Iwashita is an Associate Professor in Applied Linguistics in the School of Languages

Jimin Kahng is an Assistant Professor of Applied Linguistics in the Department of Modern

Roy Lyster is a Professor Emeritus of Second Language Education at McGill University in

Joan C. Mora is an Associate Professor in the Department of Modern Languages and

Murray J. Munro is a Professor of Linguistics at the Simon Fraser University, Vancouver,

Marie Nader is a researcher in linguistics and psycholinguistics, and a lecturer at the

Johanne Paradis is a Professor in the Department of Linguistics and an Adjunct Professor in

Daphnée Simard (PhD ULaval) is a Full Professor of second language acquisition at

Shelley Staples is an Associate Professor of English Applied Linguistics and Second

Victoria Surtees is a Teaching and Learning Specialist in Internationalization at the

Diane J. Tedick is a Professor of Second Language Education at the University of

Ron I. Thomson is a Professor of Applied Linguistics at Brock University. His research

Pavel Trofimovich is a Professor of Applied Linguistics in the Department of Education at

Marjolijn Verspoor is a Professor Emeritus of English Language at the University of

Lynda Yates is a Honorary Professor of Linguistics at Macquarie University. Her research

Part I: Theoretical Foundations and Processes Underlying Speaking

Part II: Research Issues