2020 Book Perspectives On Language Assessment Literacy

‘This volume addresses an important and perennial problem in all of education–
how teachers and students perceive, respond to, and make use (or not) of
assessment. Assessment literacy is universal in that all systems and levels imple-
ment assessment to evaluate instruction, learning, or curriculum and, just as
frequently, to inform improved teaching and learning; so language education,
not just 2nd language, researchers will find the book useful.
Language assessment literacy is more complex in that language is not just an

end but a tool by which all teaching and learning takes place. Being adept at
using assessments to improve instruction and learning of languages is seemingly
a neglected aspect of assessment in the world of language teaching, which
apparently treats the summative test or exam as the norm for assessment.
The chapters in this volume help move language assessment more into multi-
faceted data collection about competencies for the sake of improving the
quality of learning and teaching. It’s nice to see language assessment research
catching up with the world of classroom assessment theory and practice. The
volume provides access to research and thinking about the topic from some
relatively under-represented perspectives, including Turkey, Tunisia, Oman,
Ukraine, UAE, Saudi Arabia, Japan; as well as Europe and the UK.’
Professor Gavin T L Brown
Associate Dean Postgraduate Research (ADPG),
The University of Auckland
‘Langauge assessment literacy (LAL) s a critical topic in the field of language testing
and assessment; see, for example, the recently established (April of 2019) Language
Assessment Literacy Special Interest Group (LALSIG) within the International Lan-
guage Testing Association. Perspectives on Language Assessment Literacy comprises
chapters by authors from traditionally less represented regions of the world areas and
thus represents an important contribution to the field. The volume also helps advance
the scholarship of LAL. Authors pay special attention to how language assessment
theory and practice can better synergize with teaching to improve students’ language
learning and to better document students’ learning outcomes.’
Micheline Chalhoub-Deville, Ph.D.
Professor, University of North Carolina at Greensboro
Perspectives on Language
Assessment Literacy
Perspectives on Language Assessment Literacy describes how elements of language

assessment literacy can help teachers gather information about when and how to
assess learners, and about using the appropriate assessment tools to interpret results
in a fair way. It provides highlights from past and current research, descriptions of
assessment processes that enhance LAL, case studies from classrooms, and sugges-
tions for professional dialogue and collaboration.
This book will help to foster continuous learning, empower learners and teachers
and make them more confident in their assessment tasks, and reassure decision
makers that what is going on in assessment meets international benchmarks and
standards. It addresses issues like concepts and challenges of assessment, the impacts
of reflective feedback on assessment, the ontogenetic nature of assessment literacy,
the reliability of classroom-based assessment, and interfaces between teaching and
assessment. It fills this gap in the literature by addressing the current status and future
challenges of language assessment literacy.
This book will be of great interest for academics, researchers, and post-graduate
students in the fields of language assessment literacy and English language teaching.
Sahbi Hidri is Assistant Professor of Applied Linguistics, University of Tunis,

and Senior Specialist (Assessment) for the Education Division, Higher Colleges
of Technology, UAE. Sahbi is the founder of Tunisia TESOL and the Arab
Journal of Applied Linguistics. His research interests include language assessment
literacy, test specifications, the interplay between SLA and dynamic assessment,
test-taking strategies, and test mapping.
Perspectives on Language
Assessment Literacy
Challenges for Improved Student Learning
Edited by
Sahbi Hidri
First published 2021
by Routledge
2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN
and by Routledge
52 Vanderbilt Avenue, New York, NY 10017
Routledge is an imprint of the Taylor & Francis Group, an informa business
© 2021 selection and editorial matter, Sahbi Hidri; individual chapters, the
contributors
The right of Sahbi Hidri to be identified as the author of the editorial
material, and of the authors for their individual chapters, has been asserted
in accordance with sections 77 and 78 of the Copyright, Designs and
Patents Act 1988.
All rights reserved. No part of this book may be reprinted or reproduced
or utilised in any form or by any electronic, mechanical, or other means,
now known or hereafter invented, including photocopying and recording, or
in any information storage or retrieval system, without permission in
writing from the publishers.
Trademark notice: Product or corporate names may be trademarks or
registered trademarks, and are used only for identification and explanation
without intent to infringe.
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
Library of Congress Cataloging-in-Publication Data
A catalog record has been requested for this book
ISBN: 978-0-367-85969-5 (hbk)

ISBN: 978-1-003-01608-3 (ebk)
Typeset in Bembo
by Taylor & Francis Books
Contents
List of figures ix
List of tables x
List of contributors xii
Preface xiv
PART I
Language assessment literacy: Theoretical foundations 1
1 Language assessment literacy: Where to go? 3
SAHBI HIDRI
2 Language assessment literacy: Concepts, challenges, and prospects 13

DINA TSAGARI
3 Traditional assessment and encouraging alternative assessment that

promotes learning: Illustrations from EAP 33
LEE MCCALLUM
4 Language assessment literacy: Ontogenetic and phylogenetic

perspectives 52
MOJTABA MOHAMMADI AND REZA VAHDANI SANAVI
PART 2
Students’ language assessment literacy 67
5 Enhancing assessment literacy through feedback and feedforward:
A reflective practice in an EFL classroom 69
JUNIFER A. ABATAYO
viii Contents
6 Using checklists for developing student teachers’ language

assessment literacy 84
OLGA UKRAYINSKA
7 An investigation into the correlation between IETLS test

preparation courses and writing scores: Students’ reflective journals 107
FATEMA AL AWADI
PART 3
Teachers’ language assessment literacy 133
8 Language assessment literacy of novice EFL teachers: Perceptions,
experiences, and training 135
AYLIN SEVIMEL-SAHIN
9 Teachers’ assessment of academic writing: Implications for language

assessment literacy 159
ZULFIQAR AHMAD
10 Reliability of classroom-based assessment as perceived by university

managers, teachers, and students 176
OLGA KVASOVA AND VYACHESLAV SHOVKOVY
PART 4
Language assessment literacy: Interfaces between
teaching and assessment 197
11 To teach speaking or not to teach? Biasing for the interfaces
between teaching 199
DIANA AL JAHROMI
12 Planning for positive washback: The case of a listening proficiency test 220
CAROLINE SHACKLETON
13 Testing abilities to understand, not ignorance or intelligence: Social

interactive assessment: Receive, appreciate, summarize, ask! 241
TIM MURPHEY
Conclusion: Language assessment literacy: The way forward 260

SAHBI HIDRI
Author Index 262

Index 270
Figures
3.1 Academic reading multiple–choice task 36

3.2 First certificate in English: Use of English task 39
3.3 Sample collaborative writing task 45
4.1 Differential AL/LAL profiles for four constituencies. (a) Profile for test
writers. (b) Profile for classroom teachers. (c) Profile for university
administrators. (d) Profile for professional language testers. 58
5.1 Three Major Feedback Questions 75
7.2 Excerpts representing the use of conjunctions by
students B and C 117
7.3 Excerpts representing the use of determiners in reflections 118
7.4 Excerpts showing the use of ‘the’ for referencing purposes 119
7.5 Excerpt from student A’s reflection 6 showing the repetition of
lexical terms 120
7.6 Excerpts representing word repetition used the participants 120
8.1 The view of the themes and subthemes in NVivo 11 Pro
Program 143
8.2 The whole findings of novice EFL teachers 145
10.1 Training in LTA received by respondent teachers (%) 189
10.2 Preferred formats of training in LTA 191
10.3 Comparison of perceptions of summative assessment reliability 192
11.1 Students viewpoints regarding the teaching and assessment of
speaking (1) 208
11.2 Students viewpoints regarding the teaching and assessment of
speaking (2) 208
12.1 Proposed model of listening ability 223
12.2 Highest levels of processing reached in Task 1 230
13.1 Bottom of each test 248
Tables
4.1 LAL stages 57

5.1 WRIT4111 Course Profile - WRIT4111: Professional writing
course 73
5.2 Can Do Statements Feedback Sheet. 77
5.3 Typology of written corrective feedback types. 78
7.1 The participants’ demographic data 113
7.2 Topic vocabulary used by the participants in all reflections 115
7.3 Pronouns used in some reflections as anaphoric reference 116
7.4 Examples of synonyms, antonyms and collocations in student A’s
reflections 122
7.5 Examples of synonyms, antonyms, and collocations in student B’s
reflections 123
7.6 Examples of synonyms, antonyms and collocations in student C’s
reflections 124
8.1 The profile of novice EFL teachers (numbers) 142
9.1 Assessment Criteria and Rubrics. 166
9.2 Examples of Sample Paraphrasing 169
10.1 Perceptions of uniformity of requirements to summative
assessments 185
10.2 Satisfaction with the test quality 187
10.3 Perceptions of test objectivity vs satisfaction with test results/
grades (%) 188
10.4 Ranking of training format in order of effectiveness 190
11.1 Mean scores of the foreign language classroom anxiety scale
(FLCAS) 207
11.2 Correlation between FLCAS and gender 207
11.3 Mean scores of students’ satisfaction with the academic curricula
and teaching practices 208
11.4 Correlation between FLCAS and extracurricular exposure to
spoken English 209
11.5 Mean scores of students’ extracurricular exposure to spoken
English 209
List of tables xi
11.6 Correlation between FLCAS and extracurricular exposure to

spoken English 210
12.1 Test description 226
12.2 Frequencies of level of listening process reached for each correct
item 234
Contributors
Fatema Al Awadi is Education Faculty and a former Bachelor graduate

teaching within the Higher Colleges of Technology system. She is a TESOL
Master degree holder from the British University in Dubai. Currently, she is
a PhD candidate.
Zulfiqar Ahmad has a PhD in Applied Linguistics from De Montfort Uni-
versity, UK. With over 25 years of ELT experience, he is presently an ELI
faculty at the University of Jeddah, KSA. His main research interests include
academic writing, academic literacies, discourse and genre analysis, language
assessment, and TESOL.
Diana Al Jahromi is an Assistant Professor of Linguistics and the Quality
Assurance Director at the University of Bahrain. She has received
numerous local and international awards. Her research interests focus on dis-
course analysis, sociolinguistics, computational linguistics, corpus linguistics,
SLA, e-learning, and quality assurance.
Sahbi Hidri is Assistant Professor of applied linguistics, University of Tunis,
and Senior Specialist (Assessment) for the Education Division, Higher Col-
leges of Technology, UAE. Sahbi is the founder of Tunisia TESOL and the
Arab Journal of Applied Linguistics. His research interests include language
assessment literacy, test specifications, the interplay between SLA and
dynamic assessment, and test-taking strategies, and test mapping.
Olga Kvasova is Associate Professor at Taras Shevchenko University of Kyiv.
She lectures in the methodology of building MA students’ academic skills,
and language assessment, she also supervises MA and PhD theses in ELT.
Her research interests include classroom-based assessment, teacher training in
LTA, and course and materials design.
Lee McCallum is an EdD candidate at the University of Exeter. She has extensive
teaching experience in EAP from the Middle East, Europe, and China. Her
research interests include language assessment and writing instruction with a
focus on how corpus-based methods can enhance these areas.
List of contributors xiii
Mojtaba Mohammadi is Assistant Professor of TEFL at Islamic Azad

University, Roudehen Branch, Iran. Mojtaba is a certified Cambridge
trainer with 23 years of teaching experience who has attended and pre-
sented at national and international conferences. His research interests
include language assessment, CALL, and teacher education.
Tim Murphey is an active part-time Professor at KUIS (RILAE), Wayo
Women’s University Graduate School, Nagoya University of Foreign Studies
Graduate School, and Aoyama University. He has an MA in TESOL from the
University of Florida, and a PhD in Applied Linguistics from the University of
Neuchatel, Switzerland.
Aylin Sevimel-Sahin is currently working in the ELT Department, Faculty of
Education, Anadolu University, Eskisehir, Turkey. She holds a doctorate
degree in the field of ELT testing and assessment. Her research interests are
ELT teacher education, language testing and assessment, practicum/teaching
practice, affective domain of ELT, and research methodology.
Caroline Shackleton is a teacher and language testing professional presently
working at the University of Granada’s Modern Language Centre. She holds
an MA in Language Testing from the University of Lancaster, and a PhD in
Applied Linguistics from the University of Granada.
Vyacheslav Shovkovyi, Professor, holds a degree of Doctor of Pedagogic
Sciences (Dr Habil.). He heads the Department of L1 and L2 Teaching
Methodology at Taras Shevchenko University of Kyiv, Ukraine. He lectures
in methodology of teaching L1, and supervises PhD theses and research
studies conducted in the Department.
Olga Ukrayinska is an Associate Professor at the Subdepartment of English
Philology in Kharkiv Skovoroda National Pedagogical University. She has
been teaching for 16 years. She holds a PhD in FL Teaching. Her academic
interests centre on FL teaching and assessment in tertiary education. She is
an individual member of EALTA, ALTE, and UALTA.
Reza Vahdani Sanavi is currently teaching English as a Foreign Language at
the Social Sciences University of Ankara as an Assistant Professor. He also
served as the Head of Department in the ELT Department at Islamic Azad
University, Roudehen Branch. His areas of interest are assessment, attitu-
dinal studies, and feedback.
Preface
Language assessment literacy (LAL) has tremendous potential to enhance stu-

dent learning and change a whole assessment culture. When LAL assessment
is not present, or when it is not well developed among learners, teachers,
decision-makers, and other stakeholders, learning for all students does not
happen, nor does it improve. However, the presence of well-benchmarked
LAL can lead to positive washback and can ultimately help stakeholders to
attain international recognition. This entails welding together many variables
such as views of language and language learning, awareness of international
assessment benchmarks, and learners’ awareness of all the assessment
challenges.
As implied in the title, Perspectives on language assessment literacy: Challenges
for improved student learning, developing LAL is critical for its effective use in
language assessment, testing, and evaluation. This book identifies some dif-
ferent purposes of why LAL should be there and how the interfaces
between assessment, learning, and teaching can lead to useful, fair, and valid
tests, and therefore meet international standards.
Some readers may be more familiar with the importance of LAL and how
it is implemented in international contexts, whether such exams are low-
stakes or high-stakes ones. The category and nature LAL included in Perspec-
tives on language assessment literacy: Challenges for improved student learning might
have great potential to empower learners, foster continuous learning,
empower teachers and make them more confident in their assessment tasks,
and reassure decision makers that what is going on in assessment meets
international benchmarks and standards.
Perspectives on language assessment literacy: Challenges for improved student learning
is intended for learners, teachers, parents, decision-makers, testing organiza-
tions, and many others. The book provides highlights from past and current
research, descriptions of assessment processes that enhance LAL, case studies
from classrooms, and suggestions for professional dialogue and collaboration.
Readers are invited to consider Perspectives on language assessment literacy:
Challenges for improved student learning in relation to their contexts, personal
conceptions, and practices of assessment, as well as other different micro- and
Preface xv
macro-levels of LAL. Reading this book with a contextualization of the studies

as argued for by the contributors might give readers a better comprehensive
scenario of how stakeholders are considering issues that have been gaining
more and more momentum in second language assessment.
Part I
Language assessment literacy:

Theoretical foundations
Chapter 1
Language assessment literacy

Where to go?
Sahbi Hidri
Introduction
The main points in traditional assessment are certifying reliability and validity in
assessment instruments. These aspects are consistent with the goals of quanti-
fying and measuring learning and collecting information. The fact that these
aspects are the concerns in assessing learners establishes the final grade as the
product of traditional assessment (Brown, 2004). As pointed out by Alderson
(1999), Alderson et al. (1995), and Alderson and Banerjee (2001), once learning
is characterized as a number, it is of utmost significance that the number be as
reliable and valid as possible, otherwise it has no meaning. The philosophy of
traditional assessment allows assessment instruments to rank students against
each other which is termed as norm-referenced testing (NRT). According to
Dunn et al. (2004), and Dunn and Dunn (1997), norm-referenced assessment
can be inaccurate or unreliable in some cases and is consequently regarded to
be less valid than criterion-referenced testing (CRT). According to Fulcher and
Davidson (2007), the shift towards criterion-referenced assessment is indicative
of the necessity to improve more precise representations of students’ learning
than can be provided through norm-referenced assessment. Shepard (2000)
pointed out that this viewpoint on learning is based on the notion that learning
is a procedure of gathering information and knowledge in separate pieces, with
restricted writing or transfer, and that assessment under this pattern attempts to
find whether learners have retained the information which is basically given to
them by their instructors.
Review of the literature
Language education and the role of assessment

Given different responsibilities teachers face in their daily activities, it is not
surprising that assessment becomes a difficult task. Brown (2004) believes that it
is quite unusual to question the purpose of assessment due to the fact that
assessment has become a part of our everyday life. However, assessment
4 Sahbi Hidri
professionals are reflecting on the reasons behind specific practices of assessment

in society. For example, McNamara (2000) and McNamara and Hill (2011)
explain that we assess with the purpose of gaining insights into students’ level of
understanding or capability. One might imagine that the knowledge obtained
through the procedures of assessment would be welcomed and seen as an
essential constituent of good instruction. Nevertheless, the application of terms
like tasking teachers to teach for the test, and relegating teaching the curricu-
lum to the assessment level means that assessment is regarded as an activity that
is different from the teaching goals (McNamara, 2001; McNamara & Roever,
2006). Moreover, Rea-Dickins (2004) argues that educators most often feel
obliged to select which role to adhere to, be it organizer, observer–evaluator or
judge of language assessment and tests.
As pointed out by Poehner (2008), the view that assessment is in contrast to
teaching might be attributed to an increasing awareness of the political appeal of
numerous assessment approaches. Such a scenario is particularly applicable to high-
stakes exams and quizzes that are approved by school administrators and policy
makers and imposed upon instructors and students (Shohamy, 2001). The outcomes
of high-stakes examinations carry substantial significance in learner knowledge,
instructor responsibility, and state principles. Therefore, as pointed out by Poehner
(2008), assessment preparation not only improves an end in itself, but it can even
become more important than the curricular objectives and learning purposes.
According to Poehner (2008), another significant issue affecting the associa-
tion between teaching and assessment is teachers’ unfamiliarity with the prin-
ciples and theory underlying various assessment conceptions (Hidri, 2016). As
pointed out by Torrance and Pryor (2002), teachers and test designers often go
to the classrooms without being very prepared to take up the challenges of
designing valid, reliable, and fair assessment tools that would reflect the actual
language ability of the test-takers. As an alternative, they are prepared to
employ several practices and tasks such as cloze tests, group assignments, and
tests but without a conjectural understanding to simply guide their application.
Linking assessment and teaching and learning

As pointed out by Cheng et al. (2015), the first way of theorizing an association
between teaching and assessment should be within the scope of the influence
of official testing on learning and instruction, which is known as the ‘washback
effect’. According to Alderson and Wall (1993a, 1993b), Wall (2013), and Wall
and Alderson (1996), washback displays itself mainly in high-stakes assessment
contexts, where high scores should be the instruction target of teachers and test
designers. At this level, scores are meant to show the knowledge gain, and skills
and strategies that test-takers have acquired through a period of instruction.
Similarly, the test results should show how well learners qualified for the test.
Frederiksen and Collins (1989) state that tests might have positive or negative
impacts that are directly linked to construct validity of the test per se.
Where to go? 5
Although washback investigations explore the influence of assessment on

teaching, researchers have addressed the impacts of teaching on assessment. In
this method, in order to connect assessment and teaching, the assessment pro-
cess should emanate from the instructional materials that are being taught in
class (Poehner, 2008). This approach enables classroom instructors to undertake
a more active role in determining assessment practices where teachers should
undertake a more valid and active role in selecting the right assessment tools
and methods that are congruent with the course learning outcomes and
objectives. Another approach elucidated by Rea-Dickins (2004) in linking
assessment and teaching is an extra improvement of curricular-driven evalua-
tions which affords itself well to assessments of program efficiency. That is, due
to the fact that the assessment tasks emanate from the objectives of a given
curriculum, learners’ assessment tasks can be perceived as a gauge of how well
they met these purposes.
The next approach to getting assessment and teaching together, according to
Poehner (2008), includes creating educational objectives and then inventing
comparable teaching and assessment accomplishments. Instead of imposing an
assessment on a present educational milieu or applying classroom activities and
practices to create assessment processes, assessment and teaching from this outlook
should be established along with each other. The concluding approach on the
association between instruction and assessment, which is discussed by Rea-Dickins
(2004), tries to achieve this by including assessments throughout educational per-
formances and activities. This kind of assessment is typically implemented by
classroom instructors with the purpose of having teaching fine-tuned to students’
requirements which indicates a type of formative assessment.

The success of testing is related to the manner and technique in which it is
administered to learners yet much more than that depends upon the objectives
of assessment and assessor. This becomes even complex when it is in the realm
of second language teaching and learning because assessment in language is
required to measure the related elements in flux. Therefore, it seems logical
that the assessors need to possess comprehensive assessment literacy in order to
learn about the strengths and weaknesses of the learners through the process.
Unfortunately, underdeveloped educational systems are lacking in the sophisti-
cated assessment parameters, which negatively affects second language learning.
This reflected in the performance of learners, as discussed by Alderson (1996).
According to him, testing does not only reveal the strengths and weaknesses of
the learners but also the strengths of and weaknesses of the quality of teaching,
assessment, and program. Today, assessment literacy in the assessors stands as a
significant hallmark of any successful language teaching program.
Many experts in the educational field have argued that assessment literacy
can be considered a fashionable cliché in language teaching; however, its
6 Sahbi Hidri
absence has the potential to cripple the quality of a program as well as the
purpose of teaching. It is for this reason seen ‘as a sine qua non for today’s
competent educator’ (Popham, 2006, 2009). In addition, Nawab (2012) argues
that second language teachers must be trained exclusively under separate pro-
grams because language teaching and assessment demands something different
from other disciplines. It demands diligence and sensitivity towards the learning
needs of the stakeholders. It is through the phenomenon of language that
learners have to reach the stock of knowledge for other subjects. This throws
light on the significance of addressing the issues of assessment literacy among
second language programs.
After the first part of the 21st century, the concerns regarding assessment
literacy started to appear all around globe and an effective definition of the
term was needed in order to relate it to second language curriculum design.
This points towards the significance of the knowledge and practice of assess-
ment in the context of its utility in updating language learning materials
(Fulcher, 2012). In this way, assessment training among language teachers is
considered of prime importance in the delineation of successful language
teaching courses.
However, this calls in to question any language teaching and assessing cri-
teria that have been unsuccessful in yielding acceptable results from learners’
performance. In addition, it lightens the burden of responsibility and
accountability on the shoulders of learners because language learning contexts
are now packages that have a foundation built on the assessor’s assessment
skills of the assessors. The rest is unfolded later that encapsulates the traits of
other stakeholders. Nevertheless, educational organizations are still responsible
for arranging pre- and in-service courses and workshops to train the assessors
in order to alleviate the weak standards of assessment around them.
A study carried out by Vogt and Tsagari (2014) reveals that language tea-
chers, overall, expressed the need for training in issues of assessment because
they continuously have to come to terms with standardized testing. Moreover,
more and more people are getting involved in analyzing scores and deciding
about the assessment issues. In such a scenario, the systems that do not move
ahead with updates, are bound to barely meet, or completely fail in bringing
about desired learning outcomes among students. It actually is the time to
reengineer the whole system of assessment and its objectives (Taylor, 2009),
seeking to answer the most significant question: Why do we need to assess our
students? The answer takes into account professional assessment practices,
effective, assessment-literate teachers, and conscientious organizations that are
capable of providing concrete opportunities for their assessors to learn how to
effectively assess, what to assess, and what not to assess. Too much assessment
points towards the rote learning present in language learning programs in many
educational systems.
Growing professionalization has lent more meanings to the term ‘assessment
literacy’, and in the field of applied linguistics, it is now related to something
Where to go? 7
more than setting standards for developing and administering tests using cer-
tain techniques. It is something that goes beyond the traits of tests to include
the ethics of testing in policy and practice. It engages assessors at a level which
is much more than just the technical characteristics of tests (Taylor, 2009),
encapsulating the philosophy and ethics of testing. It is noteworthy that due
to globalization, an ethical milieu has appeared in the realm of assessment
literacy for assessors that functions on the standards of accountability and
responsibility. An assessor is not only a stakeholder whose responsibility starts
on the day of a test and finishes after an assigned duration. The modern
assessor is a significant learning partner whose effective techniques and con-
ceptions of assessment can produce effective learning outcomes among lear-
ners. Today’s assessor has to exercise, on an almost daily basis, analytical and
formative assessment in theory and practice. The primary reason behind such
a significant in assessment literacy is that the 21st century has so far been all
about promoting global connections among different populations of the
world. This purpose can only be achieved through understanding and
expressing ideas for which languages are the pivotal tools.
Another significant factor that can make or break assessment is the assessors’
understanding of the validity and reliability of tests. Assessors have to be fully
vigilant to look into the standardization of testing. This also includes the ability
to spot fake standardization in any testing contexts. This is too critical to be the
work of a naïve teachers without any prior training in assessment. It is for this
reason that the expert language teaching community has been working to
develop explicit standards in testing while expanding the concepts of validity
and reliability (Messick, 1996). This has updated the components of assessment
literacy. There has been a paradigmatic shift from merely relying upon the
assessment material, which is appropriate for testing a certain level, to increasing
accountability and analytically understanding the learners’ needs in a particular
context. Therefore, the shape of the present assessment paradigm is, according to
Davies (2003, 2013, 2014, 2008), ‘Knowledge + Skills + Principles’. The upda-
ted assessment frameworks function on the shoulder of teachers who reflect the
relevant ethical involvement in the process of assessment. This further indicates a
major change in the definition of assessment literacy in the age of information.
Davies (2008, 2014) continues to discuss what constitutes the elements of
Knowledge, Skills, and Principles in assessment. Knowledge is the relevant
background that is ready to function at the time of practice; Skills are the abilities
in an assessor to standardize tests and use related methodologies to hold mean-
ingful assessment, whereas Principles are related with the suitable use of language
tests with fairness and professional expertise.
We frequently come across reports of training and workshops arranged by
stakeholder organizations for updating teachers’ assessment literacy. We need to
approach this with much caution as there is a big difference between the
arrangements that bring about positive changes in teachers’ assessment concep-
tions, and counterfeit arrangements that are arranged for the sake of publicity.
8 Sahbi Hidri
Rigorous assessment literacy training pertains to practical and beneficial

approaches delineated for assessors. They are aligned with the truthful
understanding of the amount of assessment training required for assessors.
Well-rounded training not only includes practices in measurement and psy-
chometrics but also promotes among assessors an understanding of the role of
assessment in a particular social environment.
McNamara and Roever (2006) advocate a broad view of this role. They put
forth that a competent assessor takes into account the social consequences of an
assessment as well as its appropriate utility. The days of testing for testing’s sake
no longer satisfy organizations and markets requiring properly trained profes-
sionals who have passed through genuine and socially sensitive education.
Sometimes purely technical assessment that fulfils all credentials for a properly
designed test may not be the only factor in measuring learners’ performance.
Assessors should possess the ability of employing analytics in a manner that is
capable of capturing the true learning that has taken place in the mind of the
learners. External practices of assessment must be founded on the internal
understanding of its objectives and purposes. Studies on LAL are still being
released and perhaps the most challenging study that has given LAL another
dimension is Xu and Brown’s (2016) where they linked LAL to teacher edu-
cation and educational assessment. The authors also suggested a framework of
teacher assessment literacy in practice.
Perspectives on Language Assessment Literacy: Challenges for Improved Student
Learning is a book about LAL as perceived by authors in their ELT contexts.
The book is divided into four parts: a) Language Assessment Literacy: Theoretical
Foundations, b) Students’ Language Assessment Literacy, c) Teachers’ Language
Assessment Literacy, and d) Language Assessment Literacy: Interfaces between Teaching
and Assessment, and they debate different cases of LAL as investigated in these
contexts. Part One, Language Assessment Literacy: Theoretical Foundations, situates
the LAL in its theoretical underpinnings by linking LAL to teachers, learning,
and assessment. In Hidri’s Chapter, ‘Language Assessment Literacy: Where to
Go’, the author addressed the broad scope of LAL by addressing the fact that
LAL is always shaped by views of language and language learning, mainly of
teachers. LAL can also be shaped by the views of policy makers, materials
designers, and official program writers. Such views sometimes stand in sharp
contrast with the views of teachers and this can end in a ‘bloomy and shay’
assessment policy. Chapter Two, ‘Language Assessment Literacy: Concepts,
Challenges and Prospects’, theoretically addresses LAL. Tsagari hails the role of
language testing and assessment at the societal as well as the educational levels.
The author also highlights the point that having a standard and unified defini-
tion of LAL has been an ongoing issue of debate that language assessment and
testing practitioners need to agree on. To build a more comprehensive idea
about LAL, Tsagari reviews all the studies of LAL with the purpose of having a
clear, current status of LAL that will serve in having clear pedagogical implica-
tions for learners and teachers. In Chapter Three, ‘Traditional Assessment and
Where to go? 9
Encouraging Alternative Assessment that Promotes Learning: Illustrations from

EAP’, McCallum probes the traditional forms of assessment while at the same
time calling for the use of other alternative form of assessment in an EAP
context. Contrary to what the literature on LAL has defined as two opposing
directions of traditional vs. alternative forms of assessment, the author argues
that these two opponent views of LAL are in fact complimentary the moment
they are placed on a continuum that fuses ‘traditional type assessment tasks to
critical type assessment tasks to critical type assessments which represent the
different assessment goals that exist within the specialized field of EAP [English
for Academic Purposes]’. The chapter offers the opportunity to develop this
continuum in different context. Sanavi and Mohammad, in the last chapter,
‘Language Assessment Literacy: Ontogenetic and Phylogenetic Perspectives’,
investigate the ontogenetic and phylogenetic aspects of LAL in the two last
decades while probing the views of different stakeholders. The authors claim
that the LAL term was conceptually defined as ‘two concepts of contextual and
experiential factors through mediation’.
Part Two of the book, Students’ Language Assessment Literacy, deals with stu-
dents’ conceptions and practices of assessment. In Chapter One, Abayato
accentuates promoting LAL through the use of feedback. This chapter reflects on
such an implementation in an EFL classroom since it is related to students and
teacher professional development, classroom practices, and instructional materials
pertaining to students’ learning. Defining LAL in a comprehensive way requires
having a detailed idea of classroom practices of teaching and learning. Based on
his personal experience, the author exemplifies his feedback on students’ work to
effectively enhance their learning in an EFL context. Chapter Two, ‘Using
Checklists for Developing Student Teachers’ Language Assessment Literacy’,
Ukrayinska probes the use of checklists as learning and assessment tools in a
context where, for example, teachers have to develop their teaching materials.
For teachers, checklists are used to standardize the test design phase among test
designers. With almost the same purpose, checklists are employed by students for
self and peer-assessment. This multifunctional nature of checklists can contribute
to developing students’ language assessment literacy. In Chapter Three, ‘An
Investigation into the Correlation between IETLS Test Preparation Courses and
Writing Score: Students’ Reflective Journals’, AlAwadi, in a study of the United
Arab Emirates (UAE) context, studies the impacts of test preparation courses on
obtaining a high score in the International English Language Testing System
(IELTS) exam and how students reflect on their use of vocabulary to master such
a sub-skill in this international standardized exam. The author maintains that
knowledge and use of the different parts of the exam can develop students’ lan-
guage assessment literacy. However, before targeting this literacy, students have
to be involved in test preparation courses. Test scores, according to the authors,
cannot form a comprehensive view of the actual language ability of learners,
hence the necessity to involve students in IELTS preparation courses so they
perform better in the IELTS exams.
10 Sahbi Hidri
Part Three of the book, Teachers’ Language Assessment Literacy, tackles teachers’
LAL. In Chapter One of this part, Sahin stresses the necessity of language tea-
chers to be assessment literate so that they can assess students in an objective way
as per their instructional context. The study was carried out on 22 Turkish EFL
teachers and was aimed at investigating their conceptions and practices of assess-
ment. Results of the study revealed that teachers still needed to work more on
their assessment beliefs so that they can test students in a fair way. In the second
chapter, Ahmad accentuates the relevance of teachers to develop their language
assessment literacy. Based on data analysis of students’ writing, the author
affirmed that both the assessment standards and teachers’ rating should be revis-
ited as they did not meet the expectations of writing fair and valid writing
assessment tasks. The study is important in signposting the relevance of standar-
dizing international benchmarks for the assessment of writing. In Chapter Three,
Kvasova and Shovkovyi tackled some stakeholders’ perceptions of the reliability
of classroom-based summative assessments in Ukraine. The authors stress the fact
that teachers most often lack important assessment literacy skills to be operational
in their academic contexts. However, this lack of assessment expertise, as per-
ceived by university managers, teachers, and students might lead to harmful
effects and therefore will not meet international assessment standards. Implica-
tions of the study called for biasing for a better quality of constructed tests whose
major purpose is to guarantee assessment reliability.
Part Three of the book, Language Assessment Literacy: Interfaces between Teach-
ing and Assessment, includes studies from Bahrain, Spain and Japan. In Chapter
One of this part, Al Jahromi highlights the fact that the assessment of speaking
is still posing some major problems to students, practitioners, and assessors.
Based on empirical data on the Bahraini context, the author critically reviews
the assessment of speaking and calls for a reconsideration of the assessment of
this skill. In Chapter Two, Shackleton addressed the positive ‘washback’ effect
in a listening proficiency test in the Spanish context. The author used a B2
listening exam to develop its construct validity, a think-aloud protocol, and a
retrospective interview to measure the construct validity of the test and whe-
ther the test measures what it is supposed to measure. Based on the analysis of
the planning and prediction strategies as well as the other research instruments,
the author maintained that the listening construct was fuzzy and that there are
serious threats to its construct validity. The author calls for the use of authentic
listening input to raise teachers’ awareness to order to target the assessment of
listening construct validity. The last chapter of this part deals with under-
standing the assessment of testing abilities in a socially interactive environment.
To do so, the author maintains this understanding necessitates the presence of
language assessment literacy for test designers. Also, the author criticizes the
role of standardized assessment in making our students creative in their socially
interactive contexts. Developing LAL is important there needs to be much
more work on training students and teachers so that they can contribute to
useful test development.
Where to go? 11
Conclusion
The main aim of assessment is to assess whether or not learning has occurred. It
is believed that the main objective of assessment is to find out how far the
learning involvements are essentially generating the anticipated outcomes.
Assessment primarily refers to the systematic way of collecting information with
the aims of making judgments or decisions about people. Shepard (2000)
pointed out that this viewpoint on learning is based on the notion that learning
is a procedure of gathering information and knowledge in separate pieces.
The main points in traditional assessment are certifying reliability and validity in
assessment instruments. These aspects are consistent with the goals of quantifying
and measuring learning, and collecting information. The fact that these aspects are
the concerns in assessing learners establishes the final grade as the product of tra-
ditional assessment (Brown, 2004). As pointed out by Alderson (2000), once
learning is characterized as a number, it is of utmost significance that the number
be as reliable and valid as possible, otherwise it has no meaning.
The philosophy of traditional assessment allows assessment instruments to rank
students against each other, which is termed as NRT. However, NRT can be
inaccurate or unreliable in some cases, and it is consequently regarded to be less
valid than CRT. The shift towards CRT is indicative of the necessity to improve
more precise representations of students’ learning than can be provided through
NRT.
References
Alderson, J. C. (2000). Assessing reading. Cambridge University Press.
Alderson, J. C. (1999, May). Testing is too important to be left to testers [Plenary address]. The
Third Annual Conference on Current Trends in English Language Testing. United
Arab Emirates University.
Alderson, J. C. (1996). The testing of reading. In C. Nuttall (Ed.), Teaching reading skill
in a foreign language (pp. 212–228). Heinemann.
Alderson, J. C., & Banerjee, J. (2001). Language testing and assessment (Part 1). Language
Teaching, 34(4), 213–236.
Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation.
Cambridge University Press.
Alderson, J.C., & Wall, D. (1993a). Does washback exist? Applied Linguistics, 14(2), 115–129.
Alderson, J. C., & Wall, D. (1993b). Examining washback: The Sri Lankan impact
study. Language Testing, 10(1), 41–69.
Brown, H. D. (2004). Language assessment: Principles and classroom practices. Pearson Education.
Cheng, L., Sun, Y., & Ma, J. (2015). Review of washback research literature within
Kane’s argument-based validation framework. Language Teaching, 48(4), 436–470.
Davies, A. (2014). 50 years of language assessment. In A. J. Kunnan. (Ed.), The compa-
nion to language assessment: Abilities, contexts and learners (pp. 3–21). Wiley Blackwell.
Davies, A. (2013). Native speakers and native users: Loss and gain. Cambridge University Press.
Davies, A. (2008). Textbook trends in teaching language testing. Language Testing, 25(3),
327–347.
12 Sahbi Hidri
Davies, A. (2003). Three heresies of language testing research. Language Testing, 20(4),
355–368.
Dunn, L. M., & Dunn, L. M. (1997). Peabody picture vocabulary test—III. American
Guidance Service.
Dunn, L., Morgan, C., O’Reilly, M., & Parry, S. (2004). The student assessment handbook.
Routledge Falmer.
Frederiksen, J. R., & Collins, A. (1989). A systems approach to educational testing.
Educational Researcher, 18(9), 27–32.
Fulcher, G. (2012). Assessment literacy for the language classroom. Language Assessment
Quarterly, 9(2), 113–132.
Fulcher, G., & Davidson, F. (2007). Language testing and assessment: An advanced resource
book. Routledge.
Hidri, S. (2016). Conceptions of assessment: Investigating what assessment means to
secondary and university teachers. Arab Journal of Applied Linguistics, 1(1), 19–43.
McNamara, T. (2001). Language assessment as social practice: Challenges for research.
Language Testing, 18(4), 333–349.
McNamara, T. (2000). Language testing. Oxford University Press.
McNamara, T., & Hill, K. (2011). Developing a comprehensive, empirically based
research framework for classroom-based assessment. Language Testing, 29(3), 395–420.
McNamara, T., & Roever, C. (2006). Language testing: The social dimension. Blackwell.
Messick, S. (1996). Validity and washback in language testing. Language Testing, 13(3),
241–256.
Nawab, A. (2012). Is it the way to teach language the way we teach language? English
language teaching in rural Pakistan. Academic Research International, 2(2), 696–705.
Poehner, M. (2008). Dynamic assessment: A Vygotskian approach to understanding and pro-
moting L2 development. Springer Science + Business Media.
Popham, W. J. (2006). All about accountability: A dose of assessment literacy. Improving
Professional Practice, 63(6), 84–85.
Popham, W. J. (2009). Assessment literacy for teachers: Faddish or fundamental? Theory
Into Practice, 48(1), 4–11.
Rea-Dickins, P. (2004). Understanding teachers as agents of assessment. Language Test-
ing, 21(3), 249–258.
Shepard, L. (2000). The role of assessment in a learning culture. Educational Researcher,
29(7), 4–14.
Shohamy, I. (2001). The power of tests: A critical perspective on the uses of language tests. Longman.
Taylor, L. (2009). Developing assessment literacy. Annual Review of Applied Linguistics,
29, 21–36. doi:10.1017/S0267190509090035.
Torrance, H., & Pryor, J. (2002). Investigating formative assessment: Teaching, learning, and
assessment in the classroom. Open University Press.
Vogt, K., & Tsagari, D. (2014). Assessment literacy of foreign language teachers: Find-
ings of a European study. Language Assessment Quarterly, 11(4), 374–402.
Wall, D. (2013). Washback in language assessment. In C. A. Chapelle (Ed.), The ency-
clopedia of applied linguistics. Blackwell Publishing Ltd.
Wall, D., & Alderson, J. C. (1996). Examining washback: The Sri Lankan impact study.
In A. Cumming & R. Berwick (Eds.), Validation in language testing (pp. 194–221).
Multilingual Matters.
Xu, Y., & Brown, G. T. L. (2016). Teacher assessment literacy in practice: A reconcep-
tualization. Teacher and Teacher Education, 58, 149–162. doi:10.1016/j.tate.2016.05.010.
Chapter 2

Concepts, challenges, and prospects
Dina Tsagari
The conceptualizations of language assessment literacy (LAL)

The term ‘assessment literacy’ was coined by Stiggins (1991) and emerged
in an era of multiplication of literacies (e.g. technological literacy, computer
literacy, biblical literacy). As Taylor observes, when used as the functional
head of compound words, the term ‘literacy’ denotes the ability to under-
stand and engage in practices related to the denotation of the preceding
word (2013, p. 405). Overall, it can be said that ‘assessment literacy’ refers
to the general educational ability to understand and engage in practices
related to assessment (Engelsen & Smith, 2014). However, the term ‘lan-
guage assessment literacy’ (LAL) is not as straightforward.
Davies (2008, p. 335) provided a very schematic and general account of LAL
which consists of skills, knowledge, and principles as follows:
Skills provide the training in necessary and appropriate methodology,

including item writing, statistics, test analysis and increasingly software
programmes for test delivery, analysis, and reportage. Knowledge offers
relevant background in measurement and language description as well as in
context setting. Principles concern the proper use of language tests, their
fairness and impact, including questions of ethics and professionalism.
Davies’ model suggests that language assessment literate professionals should

know how to perform effective assessment activities, be able to ground their
assessment practices in the basis of solid (descriptive) knowledge and be in a
position to adopt a critical view questioning available assessment practices (see
also Davies 2008, pp. 335–336).
The central role of skills, knowledge, and principles in establishing LAL has
been underlined by several authors after Davies. For example Inbar-Lourie
(2008) considers LAL to be a concept that can be represented in the form of a
triplet, namely ‘the reason or rationale for assessment (the “why”), the
description for the trait to be assessed (the “what”), and the assessment process
(the “how”)’ (p. 390). Adopting a post-structuralist approach, Inbar-Lourie
14 Dina Tsagari
(2008) argued that assessment practices and assessment literacy have been
developed in the context of two conflicting conceptions of assessment. On the
one hand, there is the traditional, cognitivist approach, which reflects the
principles of positivism. This conception is materialised in high-stakes, standar-
dized tests and requires practitioners to have knowledge and skills which are
largely psychometric. For Inbar-Lourie, this conception, and the respective
practices, form the ‘testing culture’ (2008, p. 387). On the other hand, there is
a socio-cultural conception which adopts an interpretative, constructivist
approach to knowledge and assessment. According to the latter conception,
knowledge and assessment are not value-free (see also Taylor 2013, p. 411).
They are socially constructed under the influence of more or less dominant
epistemological assumptions, educational preconceptions, and social, political,
and cultural beliefs (2008, p. 387). This socially oriented conception, which
Inbar-Lourie calls ‘assessment culture’, requires stakeholders to be aware of the
contextual considerations and the social consequences of assessment and focus
on practices that promote learning (i.e., Assessment for Learning).
Insisting on the social aspect of assessment, and broadening the account of LAL
to include all possible stakeholders, Taylor (2009) suggested an extended and
more contextualized conceptualization of the term. As she points out, ‘training
for assessment literacy entails an appropriate balance of technical know-how,
practical skills, theoretical knowledge, and understanding of principles, but all
firmly contextualized within a sound understanding of the role and function of
assessment within education and society’ (Taylor 2009, p. 27). The contribution
of Taylor’s provisional framework is that it includes considerations of context
but, most importantly, it evokes the key notion of balance. According to Taylor
(2009, p. 27), the context and the role of the stakeholder in the assessment pro-
cess will determine the balance of knowledge in specific areas and the levels of
LAL that should be achieved. Therefore, although LAL should be developed for
all stakeholders, each stakeholder should acquire the amount of knowledge that
fits their role.
Fulcher (2012) addressed the contextualization of skills, knowledge, and
principles in his investigation on teachers’ levels of LAL. Drawing on previous
accounts, but also on his empirical findings on teachers’ perceived needs, Ful-
cher (2012, p. 125) provides the following definition of assessment literacy:
The knowledge, skills and abilities required to design, develop, maintain or

evaluate, large-scale standardized and/or classroom-based tests, familiarity
with test processes, and awareness of principles and concepts that guide and
underpin practice, including ethics and codes of practice. The ability to
place knowledge, skills, processes, principles and concepts within wider
historical, social, political and philosophical frameworks in order to
understand why practices have arisen as they have, and to evaluate the role
and impact of testing on society, institutions, and individuals.
Concepts, challenges, and prospects 15
The innovative aspect of Fulcher’s definition is that he considers that skills,

knowledge, and principles are not sufficient on their own for the develop-
ment of LAL. According to Fulcher’s conceptualization, skills, knowledge,
and principles should be integrated in the context of assessment in a dia-
lectical scheme between assessment constructs and contextual factors (e.g.
historical, social, political, and philosophical). This means that practitioners,
in particular, cannot be considered assessment literate, if they are not in a
position to fully understand how assessment practices and contextual factors
shape each other. Interestingly enough, Fulcher mentions ‘awareness’ in his
definition and argues for a critical understanding of assessment, requiring
teachers to be able to evaluate both assessment practices and social impli-
cations of assessment.
Scarino (2013), building on the notion of assessment culture, argues that
the awareness that individuals have on their own beliefs and preconceptions
about assessment (whether these are based on personal experience or con-
textual factors) is not only a fundamental component of LAL, but also an
indispensable process in the individual’s development of LAL (2013,
p. 316). Assuming that social, cultural, educational, and political beliefs
shape our understanding of assessment constructs and practices, Scarino
considers that providing training in skills, knowledge, and principles is not
sufficient for developing language assessment literate practitioners. As she
points out (2013, p. 322)
Assessment literacy needs to be considered in relation to the theoretical

knowledge base as an essential source of input, as well as to teachers’
interpretive frameworks which are shaped through their particular situated
personal experiences, knowledge, understandings and beliefs. They are
obliged to integrate simultaneously the complex theoretical, practical and
institutional dimensions of the assessment act and an understanding of self
in relation to these.
On this view, self-awareness becomes an aim and measure of success in LAL

development (Scarino 2013, p. 324).
Other authors have tried to provide more flexible models allowing for
variation in the development of LAL components. Pill and Harding (2013)
presented a developmental model in which LAL is represented in the form of
a continuum (e.g., from literate to illiterate) with five stages:
a) Illiteracy, i.e. the state of ignorance of language assessment concepts and

methods.
b) Nominal literacy, i.e. understanding that a specific term relates to assess-
ment, although with possible misconceptions.
c) Functional literacy, that is, sound understanding of basic terms and concepts.
16 Dina Tsagari
d) Procedural and conceptual literacy, i.e. understanding central concepts of

the field and using knowledge in practice.
e) Multidimensional literacy, i.e. knowledge that extends beyond ordinary
concepts, including philosophical, historical, and social dimensions of
assessment. (Pill & Harding 2013, p. 383)
The advantage of this type of conceptualization is that it can reflect the fact that
LAL cannot be acquired as a block of knowledge all at once. The continuum
approach seems to describe more accurately the development of LAL. Moreover,
Pill and Harding’s model captures the fact that stakeholders in assessment have
different needs which in turn define the levels of assessment literacy that they
should reach. For instance, policy makers could fulfil their role by reaching the
‘functional level’ in Pill and Harding’s model, whilst teachers would most likely
need to acquire ‘multidimensional’ or, at least, ‘conceptual and procedural’ literacy
in order to engage in effective assessment practices.
Although innovative in design and insightful in distinguishing assessment
literacy levels according to stakeholders’ needs, Pill and Harding’s model
presents some critical problems. As the authors seem to acknowledge (2013,
p. 383), the exact meaning and the content of the proposed levels are rather
vague (see also Harding & Kremmel, 2016). Also, historical and social
dimensions of assessment appear peripheral in Pill and Harding’s model unlike
Brindley (2001a, 2001b), Inbar-Lourie (2008), and Scarino (2013) who
address contextual considerations and awareness of social consequences as
fundamental for the development of LAL.
An attempt to bridge component- and levels-based conceptualizations of
LAL have been presented in Taylor (2013). Taylor acknowledges that LAL
involves various stakeholders, not only teachers. She also takes into account
that the levels and areas of literacy vary with stakeholders’ roles and needs.
However, instead of matching areas of knowledge with levels of LAL (see Pill
& Harding, 2013), Taylor hypothesizes eight core dimensions of knowledge,
skills, and principles, and five degrees of literacy. These eight dimensions of
LAL are: knowledge of theory, technical skills, principles and concepts, lan-
guage pedagogy, sociocultural values, local practices, personal beliefs/attitudes,
scores, and decision making.
For the definition of the levels of literacy, Taylor follows Pill and Harding’s
(2013) model and assumes the possibilities of illiteracy, nominal illiteracy,
functional literacy, conceptual and procedural literacy, and multidimensional
literacy. According to Taylor’s model, stakeholders are expected to acquire a
specific level of literacy in each key dimension depending on their context and
needs (Taylor 2013, pp. 409–410). The proposed conceptualization allows us
to capture the fact that abilities in a knowledge area can be more or less
developed depending on each stakeholder’s specific needs. Of course, there
might be objections or modifications with respect to the content of the key
dimensions or the exact levels of literacy that are necessary for each group of
stakeholders. The point is that Taylor’s conceptualization provides a powerful

tool for creating LAL profiles for each stakeholder, which, in turn, enables
more focus on pedagogical efforts.
The relation of personal conceptions, beliefs, and attitudes in developing
LAL was examined by Scarino (2013). Drawing on evidence collected from her
own previous projects, Scarino pointed out that teachers’ preconceptions and
beliefs about assessment form an interpretive framework that shapes teachers’
assessment practices and understanding of language assessment constructs. On
the basis of this observation, Scarino suggests that the conceptualization of LAL
should not be restricted to a core knowledge base, but should also include
some component for the teachers’ interpretive framework.
It is worth mentioning that, against the aforementioned attempts to con-
ceptualize or define LAL, Inbar-Lourie has recently introduced the idea of
‘language assessment literacies’ (see Inbar-Lourie 2016). Multiple amalgamated
understandings of assessment are created by the dynamic merging of expertise
in language assessment with expertise in the local context (educational, social,
etc.). This idea suggests that research should drop the pursuit of a monolithic
prescriptive framework of LAL in favour of a dynamic, loose descriptive model
which ‘sets general guiding principles for different assessment literacies but is
aware of local needs and is loose enough to contain them’ (Inbar-Lourie 2016,
p. 268). Inbar-Lourie (2017) also argued against insulation in defining the
knowledge base of LAL, claiming that a pluralistic, descriptive framework of
localized language assessment literacies is preferable to a prescriptive monolithic
literacy approach.
More recently, research and discussions have contributed to the efforts
towards defining the construct of assessment literacy. For example, Kremmel,
Eberharter, and Harding (2017) pointed out that the role of ‘language’ in the
existing LAL models is not clearly conceptualized. According to Kremmel et
al., a clear understanding and description of the language component should
be at the heart of LAL. Insisting on the variety of stakeholders’ needs, Cooke,
Barnett, and Rossi (2017) argued that existing LAL models offer only hypo-
thetical assessment literacy profiles for certain stakeholder groups. In their
research, Cooke et al. (2017) try to fill that gap by presenting an empirical
model to operationalize the definition of the LAL construct across different
stakeholder groups. Interestingly, Cooke et al. offer some worked examples
of their model, e.g. assessment literacy profiles of language teachers, language
assessors, etc. in the East Asian educational context. Kremmel and Harding
(2017; 2019) have also called for more evidence on the conceptualization of
stakeholders’ literacy profiles and presented their own empirical model of
LAL across different contexts. Supporting their theoretical suggestions,
Kremmel and Harding drew on data from web-based questionnaires and
offered assessment literacy profiles for language teachers, language testing
professionals, and language testing researchers.
18 Dina Tsagari
Expanding the circle of stakeholders, Malone (2017) stressed the centrality of

including students’ perspectives in defining the construct and suggested relevant
methodologies. Adopting a constructivist and interpretive perspective, Tsagari
(2017) argues that the development of LAL is a situated activity located in par-
ticular contexts and underlines the role of teachers’ perceptions and knowledge
in the process of assessment literacy acquisition.
Empirical research in language assessment literacy

Against the theoretical background presented above, scholars conducted
empirical investigations of various designs in an effort to understand the con-
structs and the processes pertaining to LAL.
LAL research with various stakeholders

In an attempt to identify testing and assessment needs of language testing practi-
tioners across Europe, Hasselgreen, Carlsen, and Helness (2004) conducted a large-
scale survey in 37 European countries. A total of 914 language teachers, teacher
trainers, and testing experts participated in a web-based closed-item questionnaire
devised to identify their assessment practices, training, and perceived needs. Has-
selgreen et al. (2004) found evidence that all practitioners have significant needs in
LAL and that they urgently required training in alternative (or less traditional)
forms of assessment, such as portfolios and self- and peer-assessment. Language
teachers and teacher trainers, in particular, showed a greater need for training in
the aforementioned areas, as well as in interpreting test results, using informal
continuous assessment, giving feedback, establishing reliability and validity, and
using statistics. In particular, language teachers expressed a need for training in
designing tests, too. Test experts revealed a special interest in non-teaching-
related areas, such as statistical analysis, validation research, and item writing
and reviewing.
A widely overlooked group of stakeholders, that of non-practitioners (i.e.
test users), was the research focus of O’Loughlin’s study (2013). Using a survey
and semi-structured interviews, O’Loughlin investigated the LAL levels and
needs of language proficiency test users. Collecting data from the staff members
of two Australian universities that used International English Language Testing
System (IELTS) scores as student entry requirements, O’Loughlin found that
non-practitioner test users have an oversimplified view of assessment processes.
As the findings showed, IELTS users in Australian universities were only con-
cerned with surface information on the test procedures that they used, pre-
dominantly with minimum scores required for entry to particular courses
(2013, p. 371). Participants showed a much lower degree of knowledge about
issues of validity and reliability. Underlining the risk of misjudgements with
crucial impact on students’ choices and academic development, O’Loughlin
suggests the provision of more language assessment information to test users
(through workshops, discussion groups and web resources), and the adoption of
a more holistic, contextual, and ethically grounded approach to the interpreta-
tion of language proficiency test scores.
Yan, Fan, and Zhang (2017) drew on data from semi-structured interviews
in order to provide LAL profiles for language teachers, language testers, and
graduate students in language studies programs in China. As their findings
suggest, assessment practices and training needs in China are highly con-
textualized and shaped by experiential factors which are different for each
stakeholder group.
Kim, Chapman, Wilmes, Cranely, and Boals (2017) illustrated a case of col-
laboration between educators and test developers for the creation of formative
language assessment tools in the US educational context. Their findings
revealed the benefits of dynamic collaboration with stakeholders – particularly
with educators and parents – in the development of valid language assessment.
The effects of dynamic collaboration were also presented by Harsch, Seyferth,
and Brandt (2017). In particular, Harsch et al. (2017) presented insights from
the eighteen first months of a long-term project in which teachers, coordina-
tors, and researchers developed their assessment literacy together. The aim of
the project was to investigate how the aforementioned stakeholder groups
bring their abilities, skills, and knowledge together, and how they learn with
and from each other.
A study that examined both teachers’ and language assessment specialists’ LAL
development is that of Baker and Riches (2017). The study was carried out over a
series of workshops on language assessment in 2013, where Haitian teachers
offered feedback to assessment specialists about draft examinations. The outcome
of these workshops was a revision of national English examinations which were
then presented to the Haitian Ministry of Education and Professional Training
(MENFP). Interestingly, the study found that teachers’ and specialists’ expertise
complement each other and there are still challenges to be addressed in colla-
borative decision making and consensus building among these stakeholders.
Finally, Kim, Chapman, and Wilmes (2017) studied various resources created to
enhance parents and educators’ assessment literacy and, more specifically, the
ability to interpret and use score reports.
Aspects of LAL and English language teachers

There are some interesting findings regarding the LAL of teachers of English as
a foreign language (EFL). Researchers who investigated the educational context
of the Middle East and North Africa provided evidence of low LAL levels and
stressed the urgent need for training. For example, in an attempt to explore the
interaction of low assessment literacy with negative washback, Kiomrs, Abdol-
mehdi, and Naser (2011) replicated Plake and James’ (1993) test in a small
survey (N=53) in the Iranian EFL context. Apart from a significant correlation
between low LAL levels and negative washback effects, Kiomrs et al. found
20 Dina Tsagari
that Iranian EFL teachers performed poorly in assessment practices and had
major misunderstandings about assessment, but felt well prepared for teaching
and assessing (see also Badia, 2015).
At the European level, the Hasselgreen et al. (2004) questionnaire was replicated
by Vogt and Tsagari in their 2014 study on the assessment literacy of EFL teachers
in seven countries in Europe who recruited 853 participants via questionnaires and
added a qualitative component in their research by conducting follow-up inter-
views with 63 of them. Findings from Vogt & Tsagari’s study confirmed teachers’
lack of assessment literacy and training (see also Tsagari & Vogt, 2017) particularly
in less traditional areas of assessment, as well as a need for development in the areas
of reliability, validity, and statistics, and an ability to critically evaluate the tests they
used. Additionally, Vogt and Tsagari (2014) point out that teachers, in their effort
to meet the requirements of their role, tend to resort to compensation strategies,
such as learning on the job (by observing colleagues and mentors) or testing as they
were tested (pp. 390–391). With the addition of a follow-up interview section,
Hasselgreen et al.’s (2004) questionnaire was also replicated by Kvasova and
Kavytska (2014) who conducted research on the LAL levels of Ukrainian EFL
teachers. Interestingly enough, Kvasova and Kavytska observed that Ukrainian
teachers also use the compensation strategies reported in Vogt and Tsagari (2014)
(i.e. learning on the job and assessing as they themselves were assessed).
In his investigation of assessment conceptions of Tunisian English language
teachers, Hidri (2016) found evidence of wrong and conflicting conceptions
about assessment. More recently, Berry, Munro, and Sheehan have presented
the results of a project which aimed to investigate the training needs, practices,
and beliefs of English language teachers in a wide variety of countries (see also
Berry & Sheehan, 2017; Berry, Sheehan, & Munro, 2017a; Sheehan & Munro,
2017). Drawing from semi-structured interviews, classroom observations, and
teachers’ written feedback on a LAL workshop, the researchers largely con-
firmed previous studies on teachers’ LAL levels and beliefs. More specifically, as
their data showed, English language teachers expressed a lack of knowledge in
assessment literacy, as well as a need for training in practical elements of
assessment and clear criteria in assessment. Supporting previous investigations
on the issue, Berry et al. (2017a) pointed out that teachers in their sample had a
testing-oriented conception of assessment. Interestingly enough, the majority of
the participants in the project were not confident about assessment.
Findings from a qualitative part of a large-scale study on EFL teachers’
assessment literacy levels, training, and needs were presented in Tsagari and
Vogt (2017). Using data from semi-structured interviews with primary and
secondary state school EFL teachers in Greece, Cyprus, and Germany, Tsagari
and Vogt investigated teachers’ perceptions of their own professional prepara-
tion, as well as teachers’ perceived training needs. Findings showed that EFL
teachers in the aforementioned educational contexts have low levels of LAL.
The majority of them said that they had not learnt anything (or learnt very
little) about language testing and assessment during their pre-service training.
Teachers also held fuzzy concepts about assessment, they tended to revert to
traditional assessment procedures, and their feedback procedures reflected a
deficit-oriented approach. Supporting findings from similar studies (see Vogt &
Tsagari, 2014; Kvasova & Kavytska, 2014), participants in Tsagari and Vogt’s
(2017) study followed the strategy of learning on the job, relying on mentor
colleagues and published materials. A very important finding in Tsagari and
Vogt’s research is that teachers were not able to clearly formulate their training
and professional needs.
The role of contextual factors in developing LAL, but also in exploring LAL
levels was examined in Xu and Brown’s (2017) study, which used an adapted
version of the Teacher Assessment Literacy Questionnaire to explore the LAL
levels of university English language teachers in China. They also explored the
possible interaction between LAL levels and demographic characteristics, such
as age, gender, professional title, qualification, and others. Drawing on data
collected from 891 participants, Xu and Brown’s study revealed that teachers in
Chinese universities have a very basic level of LAL while demographic factors do
not have a significant impact on teachers’ assessment literacy. The only factor that
seemed to influence LAL levels was the institution in which teachers worked.
However, the authors stress that these findings should not be considered as clear
evidence of the lack of interplay between contextual factors and LAL as this might
be due to the methodological design employed e.g. a questionnaire used which
was originally intended for a U.S. context 30 years ago was not appropriate for
capturing the particular contextual parameters of Chinese universities.
Assessment LAL levels in South America, more specifically in the Colombian
educational context, were also the focus of Hernández Ocampo (2017),
Restrepo and Jaramillo (2017), and Giraldo’s (2018) works. Villa Larenas (2017)
also explored LAL levels of Chilean EFL teacher trainers. More recently, Berry,
Sheenan, and Munro (2017b) collected data from interviews and classroom
observations in the course of a study aiming at exploring UK teachers’ attitudes
towards assessment as well as teachers’ perceived training needs.
Focusing on the Canadian educational context, Valeo and Barkaoui’s (2017)
research explored how English as a Second Language (ESL) teachers con-
ceptualize and conduct assessment in the ESL classroom and how teachers’
conceptions influence their decisions in designing and using writing assessment
tasks (see Valeo & Barkaoui, 2017; Barkaoui & Valeo, 2017). Valeo and Barkaoui’s
findings suggest that teachers hold varying conceptualizations about how to design
and select writing tasks. Using an expansion of Fulcher’s (2012) questionnaire,
Kremmel et al. (2017) presented a case about how teacher involvement in high-
stakes test development can contribute to the development of their LAL.
Focusing on the Turkish educational context, Mede and Atay (2017) exam-
ined the LAL levels of English-language university teachers in Turkey. Data
were collected from 350 participants who completed an adapted version of
Vogt and Tsagari’s (2014) questionnaire and from follow-up group interviews
with 34 participants. As Mede and Atay’s research showed, English language
22 Dina Tsagari
teachers in Turkish universities have limited language testing and assessment

literacy and exhibit a significant lack of knowledge in assessment concepts.
Participants in the study were not familiar with notions, such as validity and
reliability, and could not use statistics. Testing micro-linguistic aspects of lan-
guage, such as grammar and vocabulary, was the only domain in which parti-
cipants seemed to feel comfortable. Mede and Atay’s study showed that English
language teachers in Turkish higher education have a compelling need for
training in classroom-based assessment practices and in contents and concepts of
assessment. Another qualitative study (Yastıbaş & Takkaç, 2018) on language
teachers’ LAL in Turkey revealed that language assessment, which is based on
instructional purposes and developed by language teachers, is focused on the
students and their coursebooks. The findings of the study also showed that such
a language assessment structure contributes to the validity of exams in terms of
content and construct, and to positive washback effects on students.
Finally, apart from evidence on LAL levels and needs, recent research has
made significant contributions in exploring how LAL relates to factors such as
practitioners’ professional background, practitioners’ beliefs, and the surround-
ing socio-political context. For example, Yan, Zhang, and Fan (2018) con-
ducted a qualitative study to explore the way contextual and experiential
factors mediate LAL development. Data involved semi-structured retrospective
interviews with three secondary-level EFL teachers in China. Findings showed
that the teachers’ experience on assessment, the network of stakeholders, the
assessment policies implemented, the assessment training resources available,
and practical constraints have all contributed to teachers’ distinct profiles on
language assessment and a need for more training in assessment practice.
Methodological considerations
Methodological designs
The majority of research on LAL draws on quantitative and qualitative analytical
methods with an evident recent increase in the use of the latter. The use of
mixed methods is also very frequent by authors who attempt to combine the
validity of quantitative data analysis with the illuminating and clarifying force of
qualitative analytical tools (see Jeong, 2013).
The most popular instruments of quantitative approaches in LAL research are
questionnaires and surveys. In some works, authors designed, developed, and
administered original questionnaires (e.g. Brown & Bailey 2008; Hasselgreen et
al. 2004; O’Loughlin 2013). In other works, highly esteemed questionnaires
were replicated and adapted to the context and needs of particular research
projects (e.g. Jin, 2010; Kiomrs et al. 2011; Vogt & Tsagari 2014; Kvasova &
Kavytska 2014). The questionnaires used in the literature commonly consist of
closed-response items (e.g. Hasselgreen et al. 2004; Jin, 2010; Fulcher 2012;
O’Loughlin 2013), although combinations of both open- and closed-response
questions are not rare (e.g. Brown & Bailey 2008; Mazandarani & Troudi,
2017). Apart from cases in which questionnaires are the sole source of data
gathering and analysis (such as Hasselgreen et al. 2004; Fulcher 2012),
scholars often recruit subsets of questionnaire respondents for follow-up –
usually semi-structured – interviews (O’Loughlin, 2013; Jeong, 2013; Vogt
& Tsagari, 2014). In the latter case, the qualitative analysis of the informa-
tion gathered by interviews is meant to elaborate and clarify the quantitative
data offered by the questionnaire. Rarely, the purpose of collecting both
quantitative and qualitative data is materialized by the use of questionnaires
that include sections for question elaboration and other written comments
(see, for instance, Malone, 2013).
In general, scholars have made extensive use of interviews but often as a
supplement to other types of data. While in most cases these interviews were
conducted on an individual basis (see, for instance, O’Loughlin, 2013), group
interviews were also used in some methodological designs (e.g. Malone, 2013).
Compared to individual interviews, group interviews are considered to max-
imize interactions between participants. However, they always entail the risk of
failing to collect the interviewees’ actual beliefs, since participants in group
interviews might influence one another or conform their discourse to the
group’s (see Malone, 2013, pp. 334–335). These limitations might explain the
choice of some scholars to use private conversations instead of interviews as a
means to collect qualitative evidence for their research (e.g., Arkoudis &
O’Loughlin, 2004).
In the recent literature on LAL, there are few studies that drew exclusively
on data gathered from interviews. Deneen and Brown (2016) is one of them.
In most cases, interviews are used in addition to other data collection tools (e.g.
Gu (2014); Tsagari & Vogt (2017)
Empirical investigations on practitioners and stakeholders have offered
important contributions to the field, but they do not monopolize the relevant
research. Significant findings and insights have also been presented by literature
and textbooks reviews (Davies 2008; Allen & Negueruela-Azarola, 2010).
There are also other studies, such as position papers (Boud, 2000; Carless, 2007;
Popham, 2009; Stiggins, 2006, 2012; Scarino, 2017, among many others), and
assessment literacy course surveys and overviews (Brown & Bailey, 2008; Jin,
2010; Lam, 2015). Papers discussing processes of language assessment in practice
(see, for instance, Rea-Dickins, 2001, 2006; Gu, 2014) and the implementation
of state-wide assessment reforms (e.g. Davison, 2013; Hamp-Lyons, 2016) have
also played a significant role.
Case studies constitute the vast majority of research (see, among others,
Pill & Harding, 2013; Kvasova & Kavytska, 2014; Hidri, 2016; Gu, 2014).
Large-scale investigations on different countries are also common (Hassel-
green et al., 2004; Brown & Bailey, 2008; Vogt & Tsagari, 2014), while
comparative studies are significantly fewer (e.g., Davison, 2004; Cheng,
Rogers & Hu, 2004; East, 2015). In general, research in LAL has been
24 Dina Tsagari
developed in isolation from other disciplines. According to Davies, this is

the cost of the growing professionalism in the field (2008, p. 341). The
overall tendency for LAL research is the non-inclusion of other disciplines’
frameworks (e.g. Xu & Brown, 2016) and methods (e.g. Antoniou & James,
2014). Nevertheless, exemptions do exist, and some authors have presented
interdisciplinary works that combine LAL interests with ethnographic
methods (see Hill & McNamara, 2012) or discourse analysis principles (see
Leung & Mohan, 2004).
Participants
The vast majority of empirical investigations on LAL elicited data from infor-
mants who were either foreign/second language teachers or testing and assess-
ment instructors. However, at times the participants’ professional identity is not
very clearly identified – especially in large-scale investigations. Thus, many
empirical studies drew on data from participants who were teaching foreign
languages at both secondary and university level (Fulcher, 2012; Vogt & Tsa-
gari, 2014; Mazandarani & Troudi, 2017). Similarly, some investigations
recruited assessment and testing instructors as participants, who were also lan-
guages instructors (e.g. Hasselgreen et al., 2004). In his work, East (2015)
clearly distinguishes participant groups (i.e. Australian EFL teachers teaching in
secondary schools only, grouped by subject language), but this is rather an
exception compared to the overall tendency. It is very common for researchers
to use the whole classroom context in order to collect information about
actual, classroom-based assessment practices. The vast majority of these works
focus on the teacher’s role in the assessment process (e.g. Rea-Dickins, 2001,
2006; Gu, 2014), while the students’ role gains some attention only in investi-
gations that focus on teacher–student interaction (see Leung & Mohan, 2004).
Remarkably, research on the assessment literacy of other stakeholders is limited.
An exception to this is Pill and Harding (2013) who used the transcripts of
thirteen hearings of the Australian House Standing Committee on Health in
order to investigate the misconceptions of policy makers about language test-
ing. O’Loughlin’s work (2013) also explored the assessment literacy levels of
university staff in two Australian universities.
Participants in LAL research were largely self-identified and self-volunteered.
In most cases, they were contacted or recruited through professional lists,
mailing lists, and social networks (e.g. Fulcher, 2012; Jeong, 2013; Kvasova &
Kavytska, 2014). Cases of classroom- or course-observation analyses can be
considered as an exemption to this tendency. Therefore, it is reasonable to
assume that the participation of a classroom (or any other educational unit) in
scientific investigations entails some degree of collaborative spirit at some level
of the educational administration.
Since a large amount of research has been conducted on the basis of
web-collected information (e.g. Kremmel & Harding, 2017, 2019), the
exact geographical distribution of participants is not possible. Nevertheless, the

available data and the contexts of research suggest that participants in LAL inves-
tigations were predominantly from Australia, United States, United Kingdom,
Europe, Canada, Hong Kong, and China. Fewer studies have focused on the
educational context in the Middle East (Kiomrs et al., 2011; Mazandarani &
Troudi, 2017) and North Africa (Hidri, 2016). Assessment practitioners and sta-
keholders from Middle East, Africa and South America have been generally
overlooked by research so far; at best, the latter geographical regions are under-
represented in large-scale questionnaire samples.
Conclusions
Based on the findings and the points of agreement in or divergence from
claims and research outcomes presented above, the literature on LAL does not
reflect an entirely optimistic view. Authors have repeatedly observed a gap
between the theoretical standards of LAL and actual language assessment
practices. Other scholars have explicitly expressed doubts about whether the
field of LAL has really evolved in recent research. Still, essential components
of the LAL framework need further research, and the promotion of LAL is
still under investigation.
There is no doubt that research on LAL will improve our theoretical
designs, and new frameworks are likely to appear in the future. These future
models will probably accommodate defects and problems of previous con-
ceptualizations. However, as the overview of conceptualizations shows, some
crucial aspects of LAL should be prioritized in future investigations because
LAL components and practices are not definite or clearly articulated. As
Inbar-Lourie (2016, 2017) observes, the field is characterised by absence of
the language trait from some of the definitions offered in the literature.
Therefore, further research should clarify the relation between assessment lit-
eracy and language assessment literacy (see Kremmel et al., 2017) as they are
commonly treated as freely interchangeable, which seems to be due to the
existing theoretical vagueness.
Nevertheless, when referring to the language trait, it seems that the field does
not have a clear definition of language (see also Kremmel et al., 2017). In her
studies, Scarino (2008, 2017) observes that the conception of the language
construct has changed through time. Language assessment has shifted from a
cognitivist approach to language to a more communicative approach and,
recently, to an intercultural approach. Of course, language can be approached
from different perspectives, and language teaching and assessment can focus on
different dimensions of language use, especially within multilingual contexts,
which have become a challenge for most educational contexts today (Schissel,
Leung, & Chalhoub-Deville, 2019). Future research should provide a clear
definition of language in a more holistic framework that incorporates useful
insights from all approaches to language.
26 Dina Tsagari
Scholars also need to conclude with a clear and generally acceptable definition
of LAL. A major trend in the field suggests that LAL is made up of certain com-
ponents. However, both the number and the classification of these components
are debatable. Taylor (2009), for instance, hypothesizes that LAL consists of eight
components, including principles and concepts, sociocultural values, local prac-
tices, and personal beliefs. An obvious problem with such a classification is that it is
not entirely clear where the line between the different components should be
drawn. On a theoretical level, it seems reasonable to distinguish between, for
instance, socio-cultural values and personal beliefs. Nevertheless, when it comes to
examinations of actual attitudes and practices, the distinction between the socio-
cultural pattern and personal behaviour does not seem that straightforward and
requires clear, supportive evidence. The point is that LAL conceptualizations
should not focus entirely on addressing theoretical requirements but should also
provide some framework that can incorporate real assessment performances.
Another challenge in the conceptualizations of LAL concerns the role of
context. More and more, authors acknowledge that contextual factors have a
significant effect on the development of LAL and the implementation of
assessment practices (e.g. Hill 2017a, 2017b; Tsagari, 2017). In the literature,
authors claim that language assessment can be affected by parameters, such as
class size and administrative requirements (e.g. Cheng et al., 2004). Although it
is undeniable that assessment practices as well as LAL development are affected
by contextual considerations, there is still a need for a clear definition of these
considerations.
In addition to the need for a more systematic and general study of context,
the literature also reveals a need for a more intensive investigation of con-
textual factors that until now have been widely overlooked. It is not until
very recently that scholars have started to investigate parameters such as the
demographics of language teachers (e.g. experienced vs novice teachers; see
Hildén & Fröjdendahl, 2018). Similarly, authors concentrated on studying
LAL in the context of teaching English, overlook assessment practices and
needs in the context of teaching other languages. The educational level factor
has been also somewhat neglected in studies. Authors tend to collect data and
suggest theoretical models without distinguishing among educational levels. This
seems to imply that teaching, learning, and assessment can be treated in a uni-
form way whether it refers to primary, secondary, or tertiary educational levels. It
is obvious, though, that each educational level has different aims and needs and
suggests a different context. Thus, future research should investigate these con-
texts separately and suggest ways to promote LAL according to the context and
needs of each educational level. On this view, the field of LAL should also
investigate the ways in which LAL could be better communicated to all levels
involved.
Another aspect of context that needs careful consideration is the professional
context of non-practitioner stakeholders, such as policy makers or administra-
tion officers. Lack of LAL might be a sign of limited professionalism for
language teachers, but this is not the case for administration officers or policy
makers. As a result, general conceptualizations of dimensions are provided for
non-practitioners on the basis of the practitioner’s context, such as professional
ethics, decision making, and attitudes, and if policy makers, for instance, have
the same professional ethics with teachers. While the findings of relevant
research revealed misconceptions of LAL among practitioners as well as limita-
tions in the implementation of assessment practices, the findings are usually
explained on the basis of practitioners’ training and professional development.
Thus, the practitioner’s professional perspective should promote LAL to these
groups, and non-practitioners should not be expected to adapt to practitioners’
needs and intentions.
The aforementioned considerations of suggested models and conceptualiza-
tions should be equally addressed in future efforts to promote LAL. For
instance, the importance of contextual factors should concern not only research
projects but also the training provided. Training in LAL should not be designed
and delivered according to some general conception of assessment literacy.
Instead, it should be formed with respect to the contextual parameters and the
needs of the target stakeholders’ group. Thus, teachers of primary education
should receive different training from university teachers. Similarly, training
programmes should be designed according to the educational context of each
country. A training programme designed for English language teachers in
China cannot be transferred and applied as is in the context of French language
teaching in Greece.
In addition to carefully designed training programmes, LAL should be
promoted through other means, such as web resources (e.g. the TALE
project, http://taleproject.eu), online tutorials, seminars, and workshops.
Again, these materials should reflect the contextual considerations that apply
to each stakeholder group and each educational environment. Ideally,
practitioners and other test users should be given the chance to practice and
experience testing and assessment processes in structured educational
opportunities. Research shows that language teachers tend to adopt assess-
ment practices through experience and by observing others (see Vogt &
Tsagari, 2014). Similarly, O’Loughlin (2013) suggests that LAL of university
administrative staff could be raised if the latter could actually take the test
they use. O’Loughlin’s proposal is worth examining in practice and even
generalized for other stakeholders, too.
Moreover, there is a lot yet to be learnt about the protagonists of assess-
ment – students and teachers, and how they enact assessment policy mandates
in their daily practices. Research should shed light on students’ perspective on
assessment practices (see Rea-Dickins 2001; Malone 2016, 2017; Tsagari,
2013). Also, if assessment is meant to be for learning, then LAL should pro-
vide some account of what constitutes evidence of language learning (see
Rea-Dickins, 2001).
28 Dina Tsagari
Of course, the most powerful factor in assessment practices is the wider

educational and cultural conceptions of assessment. Training and experience
can do very little in changing society’s views of testing and assessment. In
general, people tend to require certifications as proof of knowledge or expertise.
Interestingly enough, people do not seem so obsessed with external, standardized
examinations when it comes to other important qualifications, such as the ability
to drive a car. Thus, providing appropriate information to interested parties and
applying good assessment practices can be a timely and effective way to change
assessment cultures.
References
Allen, H. W., & Negueruela-Azarola, E. (2010). The professional development of
future professors of foreign languages: Looking back, looking forward. The Modern
Language Journal, 94(3), 377–395.
Antoniou, P., & James, M. (2014). Exploring formative assessment in primary school
classrooms: Developing a framework for actions and strategies. Educational Assessment,
Evaluation and Accountability, 26(2), 153–176.
Arkoudis, S., & O’Loughlin, K. (2004). Tensions between validity and outcomes: Tea-
cher assessment of written work of recently arrived immigrant ESL students. Language
Testing, 21(3), 284–304.
Badia, H. (2015). English language teachers’ ideology of ELT assessment literacy. Inter-
national Journal of Education & Literacy Studies, 3(4), 42–48.
Baker, B. A., & Riches, C. (2017). The development of EFL examinations in Haiti: Colla-
boration and language assessment literacy development. Language Testing, 35(4), 557–581.
Barkaoui, K., & Valeo, A. (2017). Designing L2 writing assessment tasks for the ESL class-
room: Teachers’ conceptions and practices [Conference paper]. The 39th Language Testing
Research Colloquium, Universidad de los Andes, Bogotá, Colombia.
Berry, V., & Sheehan, S. (2017). Exploring teachers’ language assessment literacy: A
social constructivist approach to understanding effective practice. In Learning and
assessment: Making the connections: Proceedings of the ALTE 6th International Conference.
http://events.cambridgeenglish.org/alte2017-test/perch/resources/alte-2017-procee
dings-final.pdf.
Berry, V., Sheehan, S., & Munro, S. (2017b). Mind the gap: Bringing teachers into the lan-
guage literacy debate [Conference paper]. The 39th Language Testing Research Collo-
quium, Universidad de los Andes, Bogotá, Colombia.
Berry, V., Sheehan, S., & Munro, S. (2017a). What do teachers really want to know about
assessment? [Conference paper]. The 51st Annual International IATEFL Conference.
Glasgow, United Kingdom.
Boud, D. (2000). Sustainable assessment: Rethinking assessment for the learning society.
Studies in Continuing Education, 22(2), 151–167.
Brindley, G. (2001a). Language assessment and professional development. In C. Elder,
A. Brown, K. Hill, N. Iwashita & T. Lumley (Eds.), Experimenting with uncertainty:
Essays in honour of Alan Davies (pp. 126–136). Cambridge University Press.
Brindley, G. (2001b). Outcomes-based assessment in practice: Some examples and
emerging insights. Language Testing, 18(4), 393–407.
Brown, J. D., & Bailey, K. M. (2008). Language testing courses: What are they in 2007?
Carless, D. (2007). Learning-oriented assessment: Conceptual basis and practical impli-
cations. Journal of Innovations in Education and Teaching International, 44(1), 57–66.
Cheng, L., Rogers, T., & Hu, H. (2004). ESL/EFL instructors’ classroom assessment
practices: Purposes, methods, and procedures. Language Testing, 21(3), 360–389.
Cooke, S., Barnett, C., & Rossi, O. (2017). An evidence-based approach to generating the
language assessment literacy profiles of diverse stakeholder groups [Conference paper]. The 39th
Language Testing Research Colloquium, Universidad de los Andes, Bogotá,
Colombia.
327–347.
Davison, C. (2004). The contradictory culture of teacher-based assessment: ESL teacher
assessment practices in Australian and Hong Kong secondary schools. Language Test-
ing, 21(3), 305–334.
Davison, C. (2013). Innovation in assessment: Common misconceptions and problems.
In K. Hyland & L. L. C. Wond (Eds.), Innovation and change in English language edu-
cation (pp. 263–275). Routledge.
Deneen, C. C., & Brown, G. T. L. (2016). The impact of conceptions of assessment on
assessment literacy in a teacher education program. Cogent Education, 3(1),
doi:10.1080/2331186X.2016.1225380.
East, M. (2015). Coming to terms with innovative high-stakes assessment practice:
Teachers’ viewpoints on assessment reform. Language Testing, 32(1), 101–120.
Engelsen, K. S., & Smith, K. (2014). Assessment literacy. In C. Wyatt-Smith, V. Klenowski
& P. Colbert (Eds.), Designing assessment for quality learning (pp. 91–107). Springer
Netherlands.
Quarterly, 9(2), 113–132.
Giraldo, F. (2018). Language assessment literacy: Implications for language teachers.
Profile: Issues in Teachers’ Professional Development, 20(1), 179–195.
Gu, P. Y. (2014). The unbearable lightness of the curriculum: What drives the assess-
ment practices of a teacher of English as Foreign Language in a Chinese secondary
school? Assessment in Education: Principles, Policy & Practice, 21(3), 286–305.
Hamp-Lyons, L. (2016). Implementing a learning-oriented approach within English lan-
guage assessment in Hong Kong schools: Practices, issues and complexities. In G. Yu &
Y. Jin (Eds.), Assessing Chinese learners of English (pp. 17–37). Palgrave MacMillan.
Harding, L., & Kremmel, B. (2016). Teacher assessment literacy and professional
development. In D. Tsagari & J. Banerjee (Eds.), Handbook of second language assess-
ment. Handbooks of Applied Linguistics (pp. 413–428). Mouton De Gruyter.
Harsch, C., Seyferth, S., & Brandt, A. (2017). Developing assessment literacy in a dynamic
collaborative project: What teachers, assessment coordinators, and assessment researchers can learn
from and with each other [Conference paper]. The 39th Language Testing Research Col-
loquium, Universidad de los Andes, Bogotá, Colombia.
Hasselgreen, A., Carlsen, C., & Helness, H. (2004). European survey of language testing and
assessment needs: Report: part one – general findings. European Association for Language
Testing and Assessment. http://www.ealta.eu.org/documents/resources/survey-rep
ort-pt1.pdf.
30 Dina Tsagari
Hernández Ocampo, S. P. (2017). How literate in language assessment should English teachers
be?[Conference paper]. The 39th Language Testing Research Colloquium, Universidad
de los Andes, Bogotá, Colombia.
Hildén, R., & Fröjdendahl, B. (2018). The dawn of assessment literacy – exploring the
conceptions of Finnish student teachers in foreign languages. Apples – Journal of
Applied Language Studies, 12(1), 1–24.
Hill, K. (2017a). Language teacher assessment literacy – scoping the territory. Papers in
Language Testing and Assessment, 6(1), iv–vii.
Hill, K. (2017b). Understanding classroom-based assessment practices: A precondition
for teacher assessment literacy. Papers in Language Testing and Assessment, 6(1), 1–17.
Hill, K., & McNamara, T. (2012). Developing a comprehensive, empirically based
Inbar-Lourie, O. (2008). Constructing a language assessment knowledge base: A focus
on language assessment courses. Language Testing, 25(3), 385–402.
Inbar-Lourie, O. (2017). Language assessment literacies and the language testing communities:
A mid-life identity crisis? [Conference paper]. The 39th Language Testing Research Col-
loquium, Universidad de los Andes, Bogotá, Colombia.
Inbar-Lourie, O. (2016). Language assessment literacy. In E. Shohamy, I. Or & S. May
(Eds.), Language testing and assessment. (pp. 257–270). Springer International Publishing.
Jeong, H. (2013). Defining assessment literacy: Is it different for language testers and
non-language testers? Language Testing, 30(3), 345–362.
Jin, Y. (2010). The place of language testing and assessment in the professional pre-
paration of foreign language teachers in China. Language Testing, 27(4), 555–584.
Kim, A. A., Chapman, M., & Wilmes, C. (2017). Developing materials to enhance the
assessment literacy of Parents and Educators of K-12 English language learners [Conference
paper]. The 39th Language Testing Research Colloquium, Universidad de los Andes,
Bogotá, Colombia.
Kim, A. A., Chapman, M., Wilmes, C., Cranley, M. E., & Boals, T. (2017). Validation
research of preschool language assessment for dual language learners: Collaboration between
educators and test developers [Conference paper]. The 39th Language Testing Research
Colloquium, Universidad de los Andes, Bogotá, Colombia.
Kiomrs, R., Abdolmehdi, R., & Naser, R. (2011). On the Interaction of test washback
and teacher assessment literacy: The case of Iranian EFL secondary schools teachers.
English Language Testing, 4(1), 156–160.
Kremmel, B., & Harding, L. (2017). Towards a comprehensive, empirical model of language
assessment literacy across different contexts [Conference paper]. The 39th Language Testing
Research Colloquium, Universidad de los Andes, Bogotá, Colombia.
Kremmel, B., & Harding, L. (2019). Towards a comprehensive, empirical model of
language assessment literacy across stakeholder groups: Developing the language
assessment literacy survey. Language Assessment Quarterly, 17(1), 100–120.
Kremmel, B., Eberharter, K., & Harding, L. (2017). Putting the ‘language’ into language
assessment literacy [Conference paper]. The 39th Language Testing Research Collo-
quium, Universidad de los Andes, Bogotá, Colombia.
Kvasova, O., & Kavytska, T. (2014). The assessment competence of university language
teachers: A Ukrainian perspective. Language Learning in Higher Education: Journal of the
European Confederation of Language Centres in Higher Education (CercleS), 4(1), 159–177.
Lam, R. (2015). Language assessment training in Hong Kong: Implications for language
assessment literacy. Language Testing, 32(2), 169–197.
Leung, T., & Mohan, B. (2004). Teacher formative assessment and talk in classroom
contexts: Assessment as discourse and assessment of discourse. Language Testing, 21(3),
335–359.
Malone, M. E. (2013). The essentials of assessment literacy: Contrasts between testers
and users. Language Testing, 30(3), 329–344.
Malone, M. E. (2017). Including student perspectives in language assessment literacy [Conference
Bogotá, Colombia.
Malone, M. E. (2016). Training in language assessment. In E. Shohamy, I. Or & S. May
(Eds.), Language testing and assessment (pp. 225–239). Springer International Publishing.
Mazandarani, O., & Troudi, S. (2017). Teacher evaluation: what counts as an effective
teacher? In Hidri S., & C. Coombe (Eds.), Evaluation in foreign language education in the
Middle East and North Africa (pp. 3–28). Springer International Publishing.
Mede, E., & Atay, D. (2017). English language teachers’ assessment literacy: The
Turkish context. DilDergisi, 168(1), 43–60.
O’Loughlin, K. (2013). Developing the assessment literacy of university proficiency test
users. Language Testing, 30(3), 363–380.
Pill, J., & Harding, L. (2013). Defining the language assessment literacy gap: Evidence
from a parliamentary inquiry. Language Testing, 30(3), 381–402.
Plake, B. S., & James, C. I. (1993). Teacher assessment literacy questionnaire. University of
Nebraska-Lincoln.
into Practice, 48(1), 4–11.
Rea-Dickins, P. (2006). Currents and eddies in the discourse of assessment: A learning-
focused interpretation. International Journal of Applied Linguistics, 16(2), 163–188.
Rea-Dickins, P. (2001). Mirror, mirror on the wall: Identifying processes of classroom
assessment. Language Testing, 18(4), 429–462.
Restrepo, E., & Jaramillo, D. (2017). Preservice teachers’ language assessment literacy devel-
opment [Conference paper]. The 39th Language Testing Research Colloquium, Uni-
versidad de los Andes, Bogotá, Colombia.
Scarino, A. (2017). Developing assessment literacy of teachers of languages: A conceptual
and interpretive challenge. Papers in Language Testing and Assessment, 6(1), 18–40.
Scarino, A. (2013). Language assessment literacy as self-awareness: Understanding the role
of interpretation in assessment and teacher learning. Language Testing, 30(3), 309–327.
Scarino, A. (2008). The role of assessment in policy-making for languages education in
Australian schools: A struggle for legitimacy and diversity. Issues in Language Planning,
9(3), 344–362.
Schissel, L. J., Leung, C., & Chalhoub-Deville, M. (2019). The construct of multi-
lingualism in language testing. Language Assessment Quarterly, 6(4–5), 373–378.
Sheehan, S., & Munro, S. (2017). Assessment: attitudes, practices and needs: Project report.
British Council. https://www.teachingenglish.org.uk/sites/teacheng/files/pub_
G239_ELTRA_Sheenan%20and%20Munro_FINAL_web%20v2.pdf
Stiggins, R. (1991). Assessment literacy. Phi Delta Kappa, 72(7), 534–539.
Stiggins, R. (2006). Assessment for learning: A key to motivation and achievement.
Edge: The Latest Information for the Education Practitioner, 2(2), 1–19.
32 Dina Tsagari
Stiggins, R. (2012). Classroom assessment competence: The foundation of good teaching. http://
images.pearsonassessments.com/images/NES_Publications/2012_04Stiggins.pdf
Taylor, L. (2013). Communicating the theory, practice and principles of language test-
ing to test stakeholders: Some reflections. Language Testing, 30(3), 403–412.
29, 21–36.
Tsagari, D. (2013). EFL students’ perceptions of assessment in higher education. In D.
Tsagari, S. Papadima-Sophocleous & S. Ioannou-Georgiou (Eds.), International
experiences in language testing and assessment (pp. 117–143). Peter Lang.
Tsagari, D. (2017). The importance of contextualizing language assessment literacy [Conference
Bogotá, Colombia.
Tsagari, D., & Vogt, K. (2017). Assessment literacy of foreign language teachers around
Europe: Research, challenges and future prospects. Papers in Language Testing and
Assessment, 6(1), 41–63.
Valeo, A., & Barkaoui, K. (2017). How teachers’ conceptions mediate their L2 writing assessment
practices: Case studies of ESL teachers across three contexts [Conference paper]. The 39th Lan-
guage Testing Research Colloquium, Universidad de los Andes, Bogotá, Colombia.
Villa Larenas, S. (2017). Language assessment literacy of EFL teacher trainers [Conference
Bogotá, Colombia.
Xu, Y., & Brown, G. T. L. (2016). Teacher assessment literacy in practice: A recon-
ceptualization. Teaching and Teacher Education, 58, 149–162.
Xu, Y., & Brown, G. T. L. (2017). University English teacher assessment literacy: A
survey-test report from China. Papers in Language Testing and Assessment, 6(1), 133–158.
Yan, X., Zhang, C., & Fan, J. J. (2018). ‘Assessment knowledge is important, but…’:
How contextual and experiential factors mediate assessment practice and training
needs of language teachers. System, 74, 158–168. doi:10.1016/j.system.2018.03.003.
Yan, X., Fan, J. J., & Zhang, C. (2017). Understanding language assessment literacy profiles of
different stakeholder groups in China: The importance of contextual and experiential factors
[Conference paper]. The 39th Language Testing Research Colloquium, Universidad de
los Andes, Bogotá, Colombia.
Yastıbaş, A. E., & Takkaç, M. (2018). Understanding language assessment literacy: Devel-
oping language assessment. Journal of Language and Linguistic Studies, 14(1), 178–193.
Chapter 3
Traditional assessment and

encouraging alternative
assessment that promotes
learning
Illustrations from EAP
Lee McCallum
Introduction
The field of language testing (also termed language assessment in this chapter and
other work [e.g. O’Sullivan, 2011]) involves the process of designing language
tests, testing students, and using this data for evaluation and decision-making pur-
poses (Davies, Brown, Elder & Hill, 1999). Language testing enjoys a rich, com-
plex, and often misinterpreted history, with tests simply defined as instruments that
elicit certain behaviour from candidates whereby this behaviour is used to make
inferences about a candidate’s language ability (Carroll, cited in Bachman, 1990).
These inferences are often reflected in a numerical score which is used against a
benchmark level to set entry into higher education, training or employment
opportunities, and to govern immigration to an often-English-speaking country
(Shohamy, 1998, 2001a). These inferences are facilitated by using standardized
tests where test administration, content, format, language, and scoring procedures
are equal for all test takers. This allows scores across test populations to be easily
compared (Popham, cited in Menken, 2008).
The history of language testing has been mapped according to different time
periods and ‘waves of scholarship’ including Spolsky (1976) and Davies’ (1978) three
stages: pre-scientific, psychometric–structural, and psycholinguistic–sociolinguistic as
well as Morrow’s (1979) time period classification of the same eras: the ‘Garden of
Eden’, the ‘Vale of Tears’, and the ‘Promised Land’. Shohamy’s (1996) distinct five-
stage categorization is partly guided by test task typologies: discrete-point, inte-
grative, communicative, performance testing, and alternative assessment, which span
more than a century of testing practices. These waves are steeped in economic,
social, and political influences that steer the direction of testing in mainstream edu-
cation. Weir, Vidakovic, and Galaczi (2013) summarize how tests have always been
gatekeeping tools to prevent mass immigration such as the US immigration that
took place in the post-world war years, while Spolsky (2008) highlights how the
Chinese first introduced formal selection testing for elite government positions and
34 Lee McCallum
this later transcended to education in Europe with France, Italy, and the UK using
tests to decide entry into higher education.
Given these uses, it is important to recognize that the last two waves of scho-
larship of psychometric–structural and psycholinguistic–sociolinguistic play a key
role in the understanding, promotion, use of tests, and the desire to change them
(Fulcher, 2000). The field of psychometrics is viewed as the cornerstone of ‘tradi-
tional’ testing with its focus on objectively measuring mental traits such as language
ability, whereas the increasingly influential psycholinguistic–sociolinguistic wave
champions the need for fairer communicative testing that is more socially aware,
ethical, and grounded in ‘re-humanizing’ the testing process (Fulcher, 2000).
This chapter acknowledges the vast history of testing, yet does not strive to
simply remap it. Instead, it follows other theoretically motivated work such as
Alderson and Banerjee (2001) by presenting an overview of the landscape of tra-
ditional testing under two broad principled sections that cover pertinent issues in
testing. The two principled sections – the reliance on statistically robust psycho-
metric scoring practices to meet the aim of selecting the highest scoring students for
entry into higher education, and beliefs that ‘Standard English’ is the testing model
to be followed – help outline traditional testing’s key tenets and shape the chapter
in a logical manner. These sections will also refer to task types and how they facil-
itate testing goals. The chapter will also examine the same two principled sections
through the lens of Critical Language Testing (CLT) to illuminate how, by situat-
ing itself in critical theory and critical applied linguistics, CLT offers alternative
views that are more concerned with prioritizing test takers than the scientific mus-
ings that traditional testing offers. CLT recognizes the power that tests yield over
test takers and aims to promote more interpretive, open scoring procedures which
call for the inclusion of local varieties of English in language testing.
In examining these tenets, the chapter focuses on providing theoretical and
empirical research evidence from the narrow context of English for Academic
Purposes (EAP), which involves the teaching and testing of academic English that
is needed for tertiary level study. It is hoped that such an analysis can contribute
to the wider literature on language assessment literacy and help illuminate task
type options to teachers and test designers who need to include a range of task
types in their assessments to capture the range of skills they need to test.
Traditional language testing
The paradigmatic stance and aims of traditional testing

Traditional language testing has largely been shaped by its middle stage of psy-
chometric–structuralist approaches, which embody the influence of psychology
and psychometrics. These influences conjure up the traditional notions of language
testing as being characterized by objectively scored items and final results that are
Traditional and alternative assessment 35
easily quantifiable and form the basis of key decision-making processes (Spolsky,
1976). Psychometrics, and its reliance on statistics, has played a pivotal role in
language assessment since the late 1950s coinciding with and being influenced by
structural linguistics. This influence continues today with standardized tests oper-
ating at all levels of education in different countries including China (see Jenkins
(2014) for an overview of China’s ‘Gaokao’ high school exit exam that determines
entry to study opportunities), and in the UK, US, and Australia with proficiency
exams such as IELTS (International English Language Testing System) and
TOEFL (Test of English as Foreign Language) required to gain entry into higher
education (Weir et al., 2013).
Fulcher (2010, 2014) places this historical reliance on psychometrics in a
wider sphere of testing being viewed as a natural science and having its
roots, in hard, generalizable, objective positivism and realism whereby lan-
guage ability was measurable and able to be isolated from the person pro-
ducing it. This means objective tests were supported for their purity in
measuring a single construct well and in achieving high levels of validity
and reliability as well as for being capable of being objectively scored and
administered to large populations (Spolsky, 1994). Moss, Pullin, Gee, and
Haerbel (2005) further outline the underlying goal of psychometrics, and
thus traditional testing, in that they seek to develop interpretations that are
generalizable across individuals and contexts, and to understand the limits of
those generalizations. In seeking out these generalizations, interpretations
characterize groups of individuals who score the same in the trait being
tested. This stance also highlights how test scores are interpreted, with
increasing test scores symbolizing proof of educational gains in knowledge
(Moss et al., 2005). However, this interpretation of scores remains dubious
because it disregards how learners are taught or prepared to answer ques-
tions and perform tasks on the knowledge appearing in the test (Miller &
Legg, 1993). This condenses curricula and means that while learners ‘appear’
to gain knowledge, this is somewhat artificially gained from a narrow
knowledge base that is decided by the test, which has in turn been decided
by those tasked with making selection decisions (Shohamy, 1996). This base
is of largely receptive knowledge because tasks are designed to be uniform
and easy to assess, and to require a single answer through their mediums of
gap-fill, multiple-choice questions (MCQs) and true–false or matching
exercises (Linn, 2000, cited in Moss et al., 2005). Figure 3.1 below shows a
typical multiple-choice task whereby students are expected to choose a
single correct answer.
The task in Figure 3.1 closely matches the tenets of the traditional paradigm
in language testing via using a task which has a limited number of possibilities
and which strengthens the focus on a particular kind of knowledge by asking
students to choose only one correct answer. This task achieves strong reliability
in grading as answers are prescribed in the shape of an answer key. These types
of tasks are a frequent occurrence in large-scale proficiency examinations which
36 Lee McCallum
Questions 10 – 12
Choose the appropriate letters A, B, C or D.
Write your answers in boxes 10–12 on your answer sheet.
10. Research completed in 1982 found that in the United States soil erosion
A reduced the productivity of farmland by 20 per cent.

B was almost as severe as in India and China.
C was causing significant damage to 20 per cent of farmland.
Dcould be reduced by converting cultivated land to meadow or forest.
11.By the mid-1980s, farmers in Denmark
A used 50 per cent less fertiliser than Dutch farmers.

B used twice as much fertiliser as they had in 1960.
C applied fertiliser much more frequently than in 1960.
D more than doubled the amount of pesticide they used in just 3 years.
12.Which one of the following increased in New Zealand after 1984?
A farm incomes
B use of fertiliser
C over-stocking
D farm diversification
Figure 3.1: Academic Reading Multiple Choice Task. Task taken from: https://www.ielts.org/-/
media/pdfs/academic_reading_sample_task_multiple_choice.ashx?la=en
serve large numbers of students and institutions, and are often marketed as tests
that determine proficiency for a candidate’s suitability to undertake academic
study or skilled employment.
A further paradigmatic tenet of this reliance on psychometrics stems from
the belief that the scores adhere to normal distribution where the majority of
scores pool at the mean and other scores deviate from the mean to create a
bell-shaped curve (Douglas, 2010). This normal distribution pattern is used to
determine access to resources such as higher education, with scores benchmarked
against a cut-off score that determines pass or fail decisions and, ultimately, the
benefits available to the test taker (Spolsky, 1995). Shohamy (2001a) explains that
assessment stakeholders perceive these scores as objective, legitimate, and a mark
of achievement; however, several scholars have appreciated that scores can have
serious consequences for test takers with Cattell, cited in Spolsky (1995), recog-
nizing the serious decisions being made from these scores, and that the testing
community has a responsibility to ensure scores are reliable. This stance was also
taken by Edgeworth (1888); however, Edgeworth’s (1888) view on ensuring
reliability lay in rater reliability with the concept of ‘reliability’ chiefly concerned
with the measurement consistency of scoring practices (Carr, 2011).
These concerns, whether socially guided or purely statistically investigated,
highlight issues that were also recognized and investigated by Thorndike (1904)
in the early 20th century, and in Cambridge and Oxford University and
UCLES (University of Cambridge Local Examinations Syndicate) circles at
much later dates (Weir et al., 2013). Thorndike (1904), cited in Spolsky,
(1995), also recognized that fairness to test takers meant scoring questions to
reflect their level of difficulty, and therefore weighted scoring meant that the
candidates answering more challenging questions received more marks than
those answering simpler questions.
In keeping with a focus on the test and its target inferences, traditional test-
ing also supports a narrow conceptualization of validity, with Messick’s (1989)
content and construct validity receiving extensive attention from testers. Con-
tent validity addresses how representative the test is as a sample of a course’s
syllabus, whereas construct validity is an evaluation of how well the test’s scores
reflect what the test claims to be measuring (Davies, et al., 1999).
In EAP, content and construct validity have been traditionally governed
and driven by a needs analysis of learners whereby, as Fulcher (1999) indi-
cates, ‘content validity’ is taken to be representing the course students were
taught, and ‘construct validity’ ensures that the test examines what skills it
claims to be designed to test, while also ensuring that it tests the skills the
course aimed to develop. It is of utmost importance that the testing com-
munity, including instructors, realize that reliability and validity are not
absolute qualities, and that the two tenets operate on a continuum, much like
discrete-integrative/communicative whereby neither will ever be absolute
properties. It is equally important to realize that the political, social, eco-
nomic, and cultural terrains that tests operate under contribute to balancing
this continuum (McNamara, 2001). This narrow, traditional view of validity
does not consider the social consequences of the test. Thus, in combination
with reliability, a narrow range of single answer discrete test items, and a
narrow scoring scale, testing therefore views language proficiency as a single
unitary construct that can be isolated from human, social, and test adminis-
tration factors. In achieving a single measurable construct, traditional testing
supports validity and reliability practices that are geared towards ensuring that
the test document allows testers to make the types of inferences they aim to
make. This approach, Shohamy (2004) argues, is still embedded in a mea-
surement lens because it further distinguishes between those with lower and
higher levels of predetermined knowledge. These practices, including the
weighting of items, still rely on the design and selection of items that elicit a
single correct answer, signalling that as a unidimensional construct, language
proficiency can be isolated and measured as a single trait with the right tools
(Bachman, 2000).
38 Lee McCallum
The quest for uniform, single measurement is facilitated by an inter-

connected belief in structural linguistics (Davies, 2003). Structural linguistics
began to play a significant role in language testing in the 1920s with its
importance peaking in the 1960s with the influence of Lado (1961) and dis-
crete-point testing. While this influence has diminished with the introduction
of integrative, communicative task-based testing, it is important to note, as
Davies (1978) does, that its influence in modern times remains balanced with
later, more communicative approaches. This trend can be seen throughout the
history of UK testing with UCLES Cambridge English exams such as the FCE
(Cambridge English First), which functions as an exam that is accepted for
study and employment opportunities, including discrete-point items. These
tests, alongside IELTS, examine word and phrase knowledge with topics that
reflect real-life interests and contexts.
Davies (2003) highlights the role of structuralist approaches in that there are
two camps or opposing ends of a continuum in communicative and discrete-
point approaches and points out that tests must be balanced between both
extremes because a test which is too structural lacks context and a test which is
too integrative/communicative is too local, subjective, and ungeneralizable. A
sample task that aims to balance these two perspectives is presented below in
Figure 3.2. Figure 3.2’s task balances some aspects of using discrete-point items
with real-life interests. In taking such an approach, there is little scope for
negotiating an answer with students required to produce grammatically correct
choices which are limited by the surrounding co-text.
A key tool in balancing the continua of reliability and validity and discrete-
communicative is the use of rigorously tested measurement scales. Traditional
testing has used measurement scales since the early Thorndike (1904) scales of the
20th century; however, tests now use more pluralistic scales which measure sev-
eral interconnected micro skills and features. This is illustrated by examining the
development of TOEFL since its creation in the 1960s (Spolsky, 1995). TOEFL
has moved from the indirect testing of writing to directly testing writing through
essay composition and, more recently, integrated tasks to reflect changing trends
in applied linguistics as it moves from a structural to an integrated, commu-
nicative focus. Indeed, in practice, this has meant that the inferences being made
about language ability have also changed to reflect the task types students are
required to complete in higher education. The early work by Bridgeman and
Carlson (1983) was influential in redefining the scope and aims of TOEFL with
their study providing an essay task inventory from US universities to ensure
TOEFL mirrored these tasks through assessing the writing skills needed at tertiary
level. The requirement to produce a timed written piece is often argumentative
in nature and, on the one hand, these essay tasks allow writing competence to be
elicited in a communicative manner that partly mirrors future academic work;
however, on the other hand, topic control, time limits, and prescriptive scoring
scales combine to limit response types and scoring ranges, and to tighten per-
ceptions of successful writing (Fulcher, cited in Menken, 2008).
This task tests candidates’ knowledge of word class with candidates required to change
the word given on the left to fit the passage.
For questions 17 – 24, read the text below. Use the word given in capitals at the end of
some of the lines to form a word that fits in the gap in the same line. There is an
example at the beginning (0).
Write your answers IN CAPITAL LETTERS on the separate answer sheet.
Example: 0 M E M O R A B L E
_____________________________________________________________________
Family bike fun
National Bike Week was celebrated last week in a (0) … … …. way with a
Family Fun Day in Larkside Park. (MEMORY)
The event (17) … … …. to be highlysuccessful with over five hundred people attending.
(PROOF)
Larkside Cycling Club brought along a (18) … … …. of different bikes to
(VARY)
demonstrate the (19) … … …. that family members of all ages can get from (ENJOY)
group cycling. Basic cycling (20) … … …. was taught using conventional bikes. (SAFE)
There were also some rather (21) … … …. bikes on display. (USUAL)
One-wheelers, fivewheelersand even one which could carry up to six (22) … … (RIDE)
were used forfun.
The club also gave information on how cycling can help to reduce (23) … … … damage.
(ENVIRONMENT)
They also provided (24) … … …. as to how people could substitute the
bike for the car for daily journeys. (SUGGEST)
The overall message was that cycling is great family fun and an excellent alternative to
driving. By the end of the day over a hundred people had signed up for membership.
Figure 3.2: First Certificate in English: Use of English Task. Adapted from: http://www.cam-
bridgeenglish.org/exams/first/preparation/
In this respect, Fulcher notes how a scale such as the TOEFL or Common
European Framework of Reference (CEFR) can be misused because teachers
come to view the scale as a resource to prescriptively judge learners as well as to
influence curriculum development that reflects what the scale sees as signalling
a higher proficiency grade. The proficiency scales also help shape knowledge
by indicating, for example, linguistic features at each proficiency level, organi-
zation patterns, expected discourse markers, and also the development of ideas,
meaning the writer changes or molds their response to match these criteria
(Hawkins & Filipovic, 2012). In an international context, this means learners’
writing is forced to change style from the rhetoric and discourse style of their
L1 to the discourse of the L2, and learners are often prepared for these changes
in the form of IELTS exam preparation classes or freshman composition classes
at university (Kachru, 2006).
40 Lee McCallum
Traditional testing can be seen to balance on the edges of discrete-objective

and communicative and/or integrative testing and reliability and validity, which
has meant a more prominent stance being taken on test ethics as testing con-
tinues to become more socially aware of test takers. The influence of socio-
linguistics that helps shape current communicative testing has led to professional
testers’ social responsibilities increasing and a wider call for tests to meet the
needs of test takers and test users (McNamara & Roever, 2006). One such
response to this protection of local needs is the International Language Testing
Association’s (ITLA) (2000) ethical guidelines which were written to avoid
local bias and Western hegemonic influence on testing. Despite this assumed
awareness of local needs, Davies (2014) and McNamara and Roever (2006)
believe a response to local needs is only undertaken by international testing
bodies when negative social consequences are empirically linked to the physical
test document itself. In practice, this means test ethics are addressed through
appropriate topic choices that appear to accept or meet local cultural realities.
McNamara and Roever (2006) question these efforts by holding the view that
this acceptance of local context is superficial with the Educational Testing
Service (ETS) opting for culturally appropriate topic choices because it increases
their potential market value for a wider range of cultures.
The next section of the chapter outlines how traditional testing’s objective
aims are realized by following a ‘Standard English’ testing model.
Language standards: Adhering to standard inner circle

English
Traditional testing in its views of objectivity believes in using ‘Standard English’
(henceforth SE), as the testing model which matches or informs the same SE
model used in pedagogy. The SE model is defined by Trudgill and Hannah (2008)
and Davies (2013) as the variety of English that is normally used by educated native
speakers with these speakers belonging to Kachru’s (1986) Inner Circle countries.
Inner Circle countries are those who hold the traditional cultural and linguistic
bases of English and have traditionally had ‘ownership’ of the language (e.g. the
UK, US, and Australia) with US and UK English favoured as pedagogic and test-
ing models because of their wide dispersion and robust codifications over time
(Hickey, 2015). Testing models that favour these standard varieties facilitate
objectivity and allow test aims to be standardized across populations with test
takers being assessed on these prescriptive Inner Circle norms. The SE that is
referred to in Trudgill and Hannah (2008) emanates from traditional prescriptive
grammar from 18th century grammarians, such as the work of Lowth, and is firmly
guided by what people ought to say or write, and disregards what people actually
say (descriptivism) as erroneous and crucially of a lower standard or status (Free-
born, 2006). This view of SE is seen as ‘correct’ English and a marker of being
highly educated. Seargeant (2012) also refers to SE as being rule-governed (and
these rules set out what the ‘proper’ form of the language is) and raises awareness
that SE is set and maintained by authorities of the standard, such as prescriptive

grammarians, and institutions including the Oxford English Dictionary. This view
of SE also assumes that a dependent relationship exists in teaching and testing, with
countries outside the politically powerful Inner Circle countries being dependent
on using the native norms of SE for teaching English as a foreign language
(Kachru, 1986). Kachru (1986) sees this relationship as Inner Circle countries being
‘norm-providing’ and those outside the Inner Circle being ‘norm-dependent’.
These norms and their dependency link back to structuralism and Chomskyan
(1965) and Quirkian (1990) views of linguistics in that the only model to be pro-
moted as reliable and valid is the native speaker model with little serious credit
given to alternative models.
Criticisms levied at traditional testing

The main tenets of traditional testing can be summarized as adhering to epis-
temological and ontological pillars that promote objectivity, uniformity, and an
adherence to standardization across learner language production. Standardiza-
tion in scoring procedures, interpreting these scores, and using tests to classify
and sort second language learners into groups that signal levels of ability at a
single time are all characteristics of traditional testing that influence the testing
we see today in EAP contexts.
In outlining these characteristics, several tensions arise from the literature.
Spolsky (1994, 1995) tracks the tensions between seeking psychometric rigour,
objectivity, and reliable measurement, and the realization that the human trait of
language proficiency is variable, multidimensional, and complex. Shohamy
(2001a) also indicates that the preference for objective measurement also creates
class differences in society, feelings of inferiority, and the fact that tests have
become control devices because of a faith in numbers that only an elite group of
people can interpret. Shohamy (2001a) furthers this ideology by acknowledging
that objectivity also means tests are isolated from people and society where the
test facilitates injustice and creates a culture in TESOL where current and future
human worth and opportunities are determined by a test that examines a narrow
knowledge and skills base. The following section of the chapter presents CLT’s
response to the practices, issues, and beliefs of traditional testing.
Critical language testing

CLT is a response to what scholars believe to be unjust practices that take place
in the uses, design, implementation, and scoring of traditional tests (Lynch,
2001). This response is guided by the belief that the uses and consequences of
tests have the power to open and close doors for people and create winners and
losers simultaneously, and therefore shape test takers’ lives (Shohamy, 2001b).
Shohamy (2001a) outlines how traditional testing results in those denied access
to resources being marginalized and forced to conform to the expected
42 Lee McCallum
behaviour that the test elicits to gain access. It is these beliefs that lead the call
for change in the testing world and that achieve a fairer system that equalizes
and better distributes the current stratified sharing of resources and access to
benefits (Lynch, 2001). It is important to realize that in forming a response to
traditional testing, CLT strives for fair and equal testing opportunities under a
critical theory framework, yet it does not advocate eradicating traditional test-
ing. A fundamental consideration is that traditional test approaches have an
appropriate use; however, CLT’s objective is to highlight that neglect of other
approaches is undemocratic and thus calls for testing to be dialogic in nature
where all parties involved in testing have a voice (Shohamy, 2001a). It is also
important to clarify that CLT recognizes that through dialogue a harmonized
medium that balances interests can be achieved, and change can take place
(Trede & Higgs, 2010). This section of the chapter further explores how the
tenets of traditional testing are viewed under this framework.
The paradigmatic stance and aims of CLT

CLT emerges from critical applied linguistics and the ontological and episte-
mological foundations of critical theory. Critical theory is embedded in several
branches of critical applied linguistics which aims to redress bias and hegemonic
practices in testing and teaching (Pennycook, 1994). CLT is grounded in the
critical theory positions of postmodernism and an ever-present need to balance
injustice and transform the practice under scrutiny (Trede & Higgs, 2010).
Critical theory has its roots in postmodernism in response to 19th century
modernism which championed the careful separation of language from other
domains such as politics (Canagarajah, 2016). Critical theory also considers the
language learner not as an object following the L1 speaker (responding to
Chomskyan linguistics), but instead values their language production as unique
and worthy of study in its own right (Canagarajah, 2016).
Critical theory appreciates that all knowledge is equally valuable, and instead
of disregarding technical, objectifying knowledge or practical, interpretive
knowledge, it values both as equal to critical interest knowledge which seeks to
transform the current status quo (Trede & Higgs, 2010). It is also important to
point out that critical theory and indeed critical testing bodies reject the tradi-
tional model of education which Freire, cited in Raddaoui and Troudi (2013),
terms ‘banking’ education whereby teachers are the ‘depositors’ of knowledge
and the students are the ‘depositories’. Under critical views, this model transfers
teacher-dictated knowledge to passive learners who reiterate and transfer this
knowledge to others who continue this cycle. Raddaoui and Troudi (2013)
highlight how in valuing all knowledge types, approaches to critical theory,
including that of testing, call for education to reflect not traditional ‘banking’
education but a shared partnership where students have a voice in shaping
knowledge and collaborating with those in power to help create an appropriate
and specific curriculum that enhances learning and ethically tests that learning.
Shohamy (2001b) equally outlines how tests originally aimed to provide access
to services for all regardless of entitlement. However, Shohamy (2001b) argues
that these tests have failed to shake off their overarching selection purpose
meaning classroom teaching is forced, consciously or unconsciously by the
school’s management, teachers, and would be selected students, to centre on
ensuring students pass the test and receive the associated pass grade benefits. In
this sense, Menken (2008) highlights how tests become the centralized lan-
guage policy that dictate, from top-down, what content is taught, how, and by
whom it can be taught, and in what language it is best taught. This is also
indicated in the work of Hamp-Lyons (1998) at the international level with the
TOEFL test dictating curriculum in schools, universities, and academies.
Shohamy (2001b) explains how CLT seeks to change the top-down practice
that has been created by traditional testing. This change places test takers at the
heart of the testing process, and seeks to give them an active role in that process,
and for power to be redistributed more fairly to reflect this new balance. Sho-
hamy (2001b) discusses how CLT invites stakeholders – including teachers and
test takers – to debate and confront the roles tests play in shaping instruction,
access to education, and the creation of ‘new’ knowledge. These views are also
stated by Darling-Hammond (1994) who sees a need for testing to move from a
sorting tool to a developmental aid that supports learners and has greater appre-
ciation for individuals’ unique knowledge that is often disregarded in traditional
testing in favour of dominant knowledge that those in power have deemed
important and therefore testable (Shohamy, 2001b). Shohamy (1998) highlights
how critical perspectives view the testing process as a non-neutral practice that is
ideologically laden with the values of those in power, while Messick (1989) and
Alderson and Banerjee (2001) similarly note that tests contain values that are
psychologically, socially, economically, and politically guided, with testing deci-
sions reflecting all of these values through the physical test document. Noam,
cited in Shohamy (1998), also clarifies how these factors merge to shape learners’
beliefs about knowledge, learning, and success, with learners believing that suc-
cess equals mastering test knowledge.
The stance of critical theory is further influenced by constructivism whereby
there is a need to peel away the surface and uncover power. Constructivists
believe that those in power decide what knowledge is valuable and whether it
will be tested. These issues of the powerful deciding knowledge are explained by
Foucault, cited in Benesch (2001), as being ever present. In testing, the power
imbalance between stakeholders such as teachers, test designers, and test users
such as institutions and test takers, is a ‘self-sustaining’ system where test designers
have total non-negotiable control over the knowledge input (Shohamy, 2001a).
A fundamental concept in understanding the power relations that exist in testing
is Bourdieu’s (1991) symbolic power, which specifies that power relations con-
tinue to exist and are maintained because the party granting the power believes
that the power exists; it is willing to give the other party power, and to allow the
other party to exercise its dominance.
44 Lee McCallum
In applying this notion of symbolic power to testing, Shohamy (2001a)

shows that the power of those who introduce tests derives from the trust that
those affected by the tests place in them. This means that the perpetuation of a
power imbalance can only flourish with the agreement of both parties and
there must be a strong willingness on the part of the test takers to be dominated
for the existing testing culture to prevail. Foucault (1980) explains that power
and resistance coexist, and that human agency means both power and resistance
are needed to maintain the situation of one party appearing to exercise power
over the other. Lynch (2001) expands on this by explaining how test takers and
parents favour tests because they confirm how good they are, and allow those
who pass to maintain their dominance and uphold the view that education is
for ‘only our people’ meaning those who also pass the test.
The power balance is, as Shohamy (2001a) explains, maintained by group col-
laboration, and key collaborators in language testing are the many fee-demanding
test preparation schools, academies, and related institutions that thrive on molding
test takers into test takers that master the test’s single type of knowledge. This is
best seen in EAP with many IELTS and TOEFL preparation courses and acade-
mies set up to train students to achieve their required score to enter higher edu-
cation (Spolsky, jenk1995). These institutions aid the perpetuation and
maintenance of this singular view of knowledge by preparing the compliant test
takers who need to pass the test and gain access to the benefits that come with a
pass grade. Jenkins (2014) and Mauranen, Llantada, and Swales (2010) note the
damage these practices do because test takers then enter a higher education
environment that requires them to demonstrate knowledge mastery collabora-
tively in both EAP and content degree module group tasks, albeit, to an extent,
still in a restricted manner, through adhering to academic writing and speaking
conventions (see Omoniyi, (2010) for a critical review of these conventions).
The paradigmatic aims of CLT are realized through using a repertoire of
tasks that are ‘alternatives’ to the objective, single answer tasks that traditional
tests support. These alternative tasks fall under the domain of ‘alternative
assessment’ which is grounded in critical theory and makes use of multiple
assessment measures that assess a wide range of skills and knowledge types.
Under the umbrella term of ‘alternative assessment’, there are several types of
alternatives including democratic, dynamic, formative, and diagnostic assess-
ment (Huhta, 2007). However, this chapter will not address the subtle differ-
ences in these terminologies; it will merely provide task examples that operate
as alternatives to the unidimensional traditional approach (see Leung, (2007) for
an elaboration of these types). These tasks count writing portfolios, journals,
and peer and self-assessment as core examples (Brown & Hudson, 1998). These
tasks are not one-off events; assessment takes place over an extended period of
time and focuses on learning and how learners can manage weaknesses in par-
ticular areas (Lynch & Shaw, 2005). Davies (2003) considers these tasks as less
formal and more used for formative rather than summative assessment,
although as Teasdale and Leung (2000) and Moss et al., (2005) note, some of
these tasks are robustly designed and developed to ensure fairness, illustrate face
validity, and guard against grades that are too subjective and cannot be justified.
A sample alternative task is presented below in Figure 3.3:
Overview of task
In groups, learners design a brochure for the Hong Kong Tourism Board describing 4
attractions in Hong Kong which would appeal to young people of their own age.
Task guidelines for learners
Writing a Tourist Brochure

Imagine the Hong Kong Tourism Board has asked your class to design a brochure that
would be of interest to young visitors of your own age. In groups of 4, design the
brochure describing FOUR sites suitable to young people of your age coming to Hong
Kong. Complete this task by following the steps below:
Step 1: Group task
Discuss in your groups which sites young people would want to visit in Hong Kong.
Choose one site each to investigate. For homework find out as much as you can about
the site, where the site is, when it is open, what one can see/do there, what the facil-
ities are, how one gets there etc. Bring this information to the next class.
Step 2: Group task
Exchange information with your group members. Tell them about the site you have
found out about. Then decide how you are going to present the information in your
brochure, what order you want to put your sites in, what illustrations you need, what
title you want to give the brochure, etc.
Step 3: Individual task
Write a description about your chosen site (120 words). Remember to say why it is
interesting. Proofread it carefully, then hand it to your teacher.
Step 4: Group task
In your groups, edit your work based on your teacher’s comments. Then put together
your brochure. Your brochures will be assessed on the following basis:
(a) Task fulfilment: would your selected sites appeal to young people?
(b) Accuracy of language and information provided: Is the brochure written in good
English? Is the information provided accurate?
(c) Attractiveness of final written submission
Figure 3.3: Sample collaborative writing task (Adapted from the Curriculum and Development
Institute (2005) and Douglas (2010)).
The sample task in Figure 3.3 represents a possible task that is suitable for a
pre-university EAP course that focuses on a specific genre; it also raises aware-
ness of audience an integrates interpersonal affective skills such as communica-
tion and critical thinking. The task also requires prolonged engagement with
these skills and the opportunity to respond and react to teacher feedback.
46 Lee McCallum
Brown and Hudson (1998) suggest that these tasks can become fairer and allow
learners a louder voice by negotiating assessment criteria and giving learners a
say in the important elements of the task. In this respect, other collaborative
tasks that can form the basis for assessment may include jigsaw reading and
writing tasks (e.g. Esnawy, 2016) as well as tasks that allow students to compare
their experiences on individual, pair, and group work (e.g. Bhowmik, Hillman,
& Roy, 2018). Unlike the previous two tasks, this task allows negotiation of
meaning and allows students to produce freer examples of written language, as
opposed to the restricted output in Figure 3.2, and the focus on recognition of
the correct answer in Figure 3.1.
Language standards: Appreciating difference

CLT takes issue with the use of native norms in language tests with Tomlinson
(2010) strongly expressing that learners are being tested on Standard English
they never use, and will have very little exposure to, and no future use for,
outside possible academic circles. However, in modern times, the spread of
English and globalization serves and creates test populations that use English as
a communication tool to communicate with other non-natives, and therefore,
while traditional testing seeks to group test takers together for a specific pur-
pose, there is a growing need to do so with consideration for language use that
is localized (Canagarajah, 2006).
Tomlinson (2010) further highlights the marginalization that learners
experience when they are forced to learn norms that their context does not
use, and Hamid (2014) and Zheng (2014) also claim that this situation forces
learners to chase an impossible standard or ‘phantom’ construct of native
speaker proficiency, which is not their attainment goal. Davies (2013) also
indicates that justifying this standard is problematic when there is wide variation
between native speakers in different Inner Circle countries. On an international
level, Jenkins (2006) refers to the claims made by international testing bodies
that their tests are internationally appropriate when in reality candidates are
penalized for using internationally communicative forms of the language. The
‘World Englishes’ paradigm, which champions pluralistic views of language
proficiency, includes work from Canagarajah (2006) and Kirkpatrick (2006)
who argue alongside Kachru (1982) that proficiency involves test takers being
proficient enough to communicate in the context of use and so local varieties
of English and their established norms are worthy targets for inclusion in tests.
This is relevant for both Outer and Expanding Circle countries, which were
thought to traditionally err on the side of norm dependency; however, both do
not always follow native norms, as previously thought by Kachru’s (1986) ear-
lier work on the Expanding Circle.
Norms are said to be social constructs that are generated by a complex set of
ideological, socio-political, socio-economic, and cultural forces that help to
establish boundaries in practice and are guided by prescriptivism (Tupas, 2010).
Their socially constructed nature means they are labelled with connotations of
powerful native norms seen as ‘legitimate’, and non-native norms seen as ‘ille-
gitimate’, or in the words of Quirk (1990), ‘quackery’. Hamid and Baldauf
(2013) also note that non-native norms are also seen as ‘deficit forms’, or, in
some cases, ‘interlanguage’, and are not recognized under traditional Second
Language Acquisition (SLA) theories as legitimate. However, Groves (2010)
refutes suggestions that these norms are ‘interlanguage’ by reminding us that the
interlanguage concept arose from application to individual learners and not
whole social communities, and alongside Kirkpatrick & Deterding (2011) he
points out that non-native norms, such as the practice of placing the topic at
the front of the sentence, are far-reaching and could be spread and shared across
more than one geographical area. Kirkpatrick and Deterding (2011) and Kim
(2006) cement this valid point by arguing that since more non-native speakers
shape the language than native speakers, their use of the language should be
considered in testing language use and ability.
Conclusion
This chapter has outlined the key theoretical tenets of traditional and critical lan-
guage testing and provided examples of EAP-relevant task assessments. Within this
broad discussion, a number of historical and contemporary assessment terms and
trajectories were set out to engage with the key understandings of language
assessment we need as practitioners. The chapter presented and discussed tasks that
typify these different understandings and it seeks to encourage these discussions to
continue at both global and local levels. At a global level, there is a need to
examine differences across international tests to find common and divergent task
differences, and how these relate back to the skills we perceive as fundamental to
study in higher education. At a local level, there is also a need to examine tasks that
play a role in shaping local assessment practices and how these tasks align with EAP
curriculum goals (e.g. Rauf & McCallum, in press).
References
Alderson, J. C., & Banerjee, J. (2001). Language testing and assessment (Part 1). Language
Teaching, 34, 213–236.
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford University
Press.
Bachman, L. F. (2000). Modern language testing at the turn of the century: Assuring
that what we count counts. Language Testing, 17(1), 1–42.
Benesch, S. (2001). Critical English for academic purposes: Theory, politics and practice.
Lawrence Erlbaum Associates.
Bhowmik, S. K., Hillman, B., & Roy, S. (2018). Peer collaborative writing in the EAP
classroom: Insights from a Canadian postsecondary context. TESOL Journal,
doi:10.1002/tesj.393.
Bourdieu, P. (1991). Language and symbolic power. Polity.
48 Lee McCallum
Bridgeman, B., & Carlson, S. (1983). A survey of academic writing tasks required of
graduate and undergraduate foreign students. TOEFL Research Report 15. Educational
Testing Service.
Brown, J. D., & Hudson, T. D. (1998). The alternatives in language assessment. TESOL
Quarterly, 32, 653–675.
Canagarajah, A. S. (2006). Changing communicative needs, revised assessment objec-
tives: Testing English as an international language. Language Assessment Quarterly, 3(3),
229–242.
Canagarajah, A. S. (2016). TESOL as a professional community: A half-century of
pedagogy, research and theory. TESOL Quarterly, 50(1), 7–41.
Carr, N. T. (2011). Designing and analysing language tests. Oxford University Press.
Carroll, J. B. (1968). The psychology of language testing. In A. Davies (Ed.), Language
testing symposium: A psycholinguistic approach (pp. 46–69). Oxford University Press.
Cattell, J. M. (1890). Mental tests and measurements. Mind, 15, 373–381.
Chomsky, N. (1965). Aspects of the theory of syntax. MIT Press.
Curriculum and Development Institute (2005). Task-based assessment for English language
learning at secondary level. Education and Manpower Bureau. https://cd1.edb.hkedcity.
net/cd/eng/TBA_Eng_Sec/pdf/part2_Task1.pdf
Darling-Hammond, L. (1994). Performance-based assessment and educational equity.
Harvard Educational Review, 64, 5–30.
Davies, A. (2014). 50 years of language assessment. In A. J. Kunnan (Ed.), The companion
to language assessment: Abilities, contexts and learners (pp. 3–21). Wiley Blackwell.
Davies, A. (1978). Language testing survey article. Part 1. Language Teaching and Lin-
guistics Abstracts, 113(4),145–159.
Davies, A. (2013). Native speakers and native users: Loss and gain. Cambridge University
Press.
Davies, A. (2003). Three heresies of language testing research. Language Testing, 20(4),
355–368.
Davies, A., Brown, A., Elder, C., & Hill, K. (1999). Dictionary of language testing. Cambridge
University Press.
Douglas, D. (2010). Understanding language testing. Hodder Education.
Edgeworth, F. Y. (1888). The statistics of examinations. Journal of the Royal Statistical
Society, LI, 599–635.
Esnawy, S. (2016). EFL/EAP reading and research essay writing using jigsaw. Procedia –
Social and Behavioral Sciences, 232, 98–101.
Cambridge University. First certificate English: Use of English paper: Part 3 task. https://
www.gettinenglish.com/wp-content/uploads/2014/07/cambridge-english-first-ha
ndbook-2015.pdf
Foucault, M. (1980). Power/knowledge: Selected interviews and other writings: 1972–1977.
Pantheon.
Freeborn, D. (2006). From Old English to Standard English: A course book in language var-
iation across time (3rd ed). Palgrave Macmillan.
Freire, P. (1996). Pedagogy of the oppressed. Penguin.
Fulcher, G. (1999). Assessment in English for Academic Purposes: Putting content
validity in its place. Applied Linguistics, 20(2), 221–236.
Fulcher, G. (2000). The communicative legacy in language testing. System, 28, 483–497.
Fulcher, G. (2014). Philosophy and language testing. In A.J. Kunnan (Ed.), The compa-
nion to language assessment: Evaluation, methodology, and interdisciplinary themes. (pp.
1434–1451). Wiley Blackwell.
Fulcher, G. (2010). Practical language testing. Hodder Education/Routledge.
Groves, J. (2010). Error or feature? The issue of interlanguage and deviations in non-
native varieties of English. HKBU Papers in Applied Language Studies, 14, 108–129.
Hamid, O. M. (2014). World Englishes in international proficiency tests. World Eng-
lishes, 33(2), 263–277.
Hamid, O. M., & Baldauf, R. B. (2013). Second language errors and features of world
Englishes. World Englishes, 32(4), 476–494.
Hamp-Lyons, L. (1998). Ethical test preparation practice: The case of the TOEFL.
TESOL Quarterly, 32, 329–337.
Hawkins, J. A., & Filipovic, L. (2012). Criterial features in L2 English: Specifying the refer-
ence levels of the common European framework. Cambridge University Press.
Hickey, R. (2015). (Ed.). Standards of English: codified varieties around the world. Cambridge
University Press.
Huhta, A. (2007). Diagnostic and formative assessment. In. B. Spolsky & F. M. Hult
(Eds.), The Handbook of educational linguistics (pp. 469–482). Wiley-Blackwell.
International English Language Testing System (IELTS). (2017). IELTS Academic reading
sample task. https://www.ielts.org/-/media/pdfs/academic_reading_sample_task_m
ultiple_choice.ashx?la=en
International Language Testing Association (ILTA). (2000). ILTA Code of Ethics. http://
www.iltaonline.com/page/CodeofEthics
Jenkins, J. (2006). Current perspectives on teaching world Englishes and English as a
Lingua Franca, TESOL Quarterly, 40(1), 157–181.
Jenkins, J. (2014). English as a lingua franca in the international university: The politics of aca-
demic English language policy. Routledge.
Kachru, B. B. (1986). The alchemy of English: The spread, functions and models of non-native
Englishes. Pergamon Press.
Kachru, B. B. (1982). (Ed.). The other tongue – English across cultures. University of Illinois
Press.
Kachru, Y. (2006). Culture and argumentative writing in World Englishes. In K. Bolton &
B. B. Kachru (Eds), World Englishes: Critical concepts in linguistics. (Vol. V) (pp. 19–39).
Routledge.
Kim, H. J. (2006). World Englishes in language testing: A call for research. English
Today, 22(4), 32–39.
Kirkpatrick, A. (2006). Which model of English: native-speaker, nativised or lingua
franca? In R. Rubdy & M. Saraceni (Eds.), English in the world: Global rules, global roles
(pp. 71–83). Continuum.
Kirkpatrick, A., & Deterding, D. (2011). World Englishes. In J. Simpson (Ed.), The
Routledge handbook of applied linguistics (pp. 373–388). Routledge.
Lado, R. (1961). Language testing. McGraw-Hill.
Leung, C. (2007). Dynamic assessment: Assessment for and as teaching? Language
Assessment Quarterly, 4(3), 257–278.
Linn, R. L. (2000). Assessments and accountability. Educational Researcher, 29(2), 4–16.
Lynch, B. K. (2001). Rethinking assessment from a critical perspective. Language Testing,
18(4), 351–372.
50 Lee McCallum
Lynch, B., & Shaw, P. (2005). Portfolios, power and ethics. TESOL Quarterly, 39(2),
263–297.
Mauranen, A., Llantada, C. P., Swales, J. M. (2010). Academic Englishes: A standar-
dized knowledge? In A. Kirkpatrick (Ed.), The Routledge handbook of world Englishes
(pp. 634–653). Routledge.
McNamara, T. (2001). Language assessment as social practice: Challenges for research.
McNamara, T., & Roever, C. (2006). Language testing: The social dimension. Language
Learning Monograph Series. Blackwell Publishing.
Menken, K. (2008). High-stakes tests as de facto language education policies. In E. Sho-
hamy, & N. H. Hornberger Encyclopedia of language and education (Vol. 7), (pp. 401–413).
Springer.
Messick, S. (1989). Validity. In R. L. Linn. (Ed.), Educational measurement (pp. 13–103).
Macmillan.
Miller, D. M., & Legg, S. M. (1993). Alternative assessment in a high-stakes environ-
ment. Educational Measurement: Issues and Practice, 12(2), 9–15.
Morrow, K. (1979). Communicative language testing: revolution of evolution? In C. K.
Brumfit & K. Johnson (Eds.), The communicative approach to language teaching (pp. 143–159).
Oxford University Press.
Moss, P. A., Pullin, D., Gee, J. P., & Haerbel, E. H. (2005). The idea of testing: Psy-
chometric and sociocultural perspectives. Measurement: Interdisciplinary Research and
Perspectives, 3, 63–83.
Noam, G. (1996). Assessment at a crossroads: Conversation. Harvard Educational Review,
66, 631–657.
O’Sullivan, B. (2011). Language testing. In J. Simpson (Ed.), The Routledge handbook of
applied linguistics (pp. 259–274). Routledge.
Omoniyi, T. (2010). Writing in English(es). In A. Kirkpatrick (Ed.), The Routledge
handbook of world Englishes (pp. 471–490). Routledge.
Pennycook, A. (1994). The cultural politics of English as an international language.
Routledge.
Popham, J. W. (1999). Why standardized test scores don’t measure educational quality.
Educational Leadership, 56(6), 8–15.
Quirk, R. (1990). Language varieties and standard language. English Today, 21, 3–10.
Raddaoui, R., & Troudi, S. (2013). Three elements of critical pedagogy in ELT: An
overview. In P. Davidson, M. Al-Hamly, C. Coombe, S. Troudi, & C. Gunn (Eds.),
Achieving Excellence Through Life Skills Education: Proceedings of the 18th TESOL Arabia
Conference, (pp. 73–82). TESOL Arabia Publications.
Rauf, M., & McCallum, L. (in press). Language assessment literacy: Task analysis in
Saudi universities. In L. McCallum & C. Coombe (Eds.), The assessment of L2 written
English across the MENA Region: A synthesis of practice. Palgrave Macmillan.
Seargeant, P. (2012). Exploring world Englishes: Language in a global context. Routledge.
Shohamy. E. (2004). Assessment in multicultural societies: Applying democratic princi-
ples and practices to language testing. In B. Norton & K. Toohey (Eds.), Critical
pedagogies and language learning (pp. 72–93). Cambridge University Press.
Shohamy, E. (1998). Critical language testing and beyond. Studies in Educational Eva-
luation, 24(4), 331–345.
Shohamy, E. (2001b). Democratic assessment as an alternative. Language Testing, 18(4),
373–391.
Shohamy, E. (1996). Language testing: Matching assessment procedures with language

knowledge. In M. Brenbaum & F. J. R. C. Dohy (Eds.), Alternatives in assessment of
achievements, learning processes and prior knowledge (pp. 143–160). Kluver Academic.
Shohamy, E. (2001a). The power of tests: A critical perspective on the use of language tests.
Longman.
Spolsky, B. (2008). Language assessment in historical and future perspectives. In E.
Shohamy & N. H. Hornberger (Eds.), Encyclopedia of language and education (Vol. 7)
(pp. 445–454). Springer.
Spolsky, B. (1976). Language testing: Art or science? Paper read at the 4th International
Congress of Applied Linguistics, Stuttgart, Germany.
Spolsky, B. (1995). Measured words. Oxford University Press.
Spolsky, B. (1994). Policy issues in testing and evaluation. The Annals of the American
Academy of Political and Social Sciences, 532, 226–237.
Teasdale, A., & Leung, C. (2000). Teacher assessment and psychometric theory: A case
of paradigm crossing? Language Testing, 17(2), 163–184.
Thorndike, E. L. (1904). An introduction to the theory of mental and social measurements. The
Science Press.
Tomlinson, B. (2010). Which test of which English and why? In A. Kirkpatrick (Ed.),
The Routledge handbook of world Englishes (pp. 599–617). Routledge.
Trede, F., & Higgs, J. (2010). Critical inquiry. In J. Higgs, N. Cherry, R. Macklin, & R.
Ajjawi (Eds.), Researching practice: A discourse on qualitative methodologies (pp. 247–255).
Sense Publishers.
Trudgill, P., & Hannah, J. (2008). International English: A guide to varieties of Standard
English (5th ed.). Hodder Arnold.
Tupas, F. R. T. (2010). Which norms in everyday practice and why? In A. Kirkpatrick
(Ed.), The Routledge handbook of world Englishes (pp. 567–580). Routledge.
Weir, C. J., & Vidakovic, I., & Galaczi, E. D. (2013). Measured constructs: A history of
Cambridge English language examinations 1913–2012. Cambridge University Press.
Zheng, Y. (2014). A phantom to kill: The challenges for Chinese learners to use English
as a global language. English Today, 30(4), 34–39.
Chapter 4

Ontogenetic and phylogenetic
perspectives
Mojtaba Mohammadi and Reza Vahdani Sanavi
Introduction
With the tenets of a sociocultural perspective widely recognized in the field of
language education, scholars have attempted to expand these tenets into different
aspects of learning and teaching. Assessment was not excluded. ‘Assessment of
learning’ was critically argued and the concept of ‘assessment for learning’ was
introduced. Out of the need to document learners’ outcomes, and to establish
standards and benchmarks to measure their knowledge of language learning,
came the concept of language assessment literacy (LAL). In its short lifespan
beginning 1991 in general education, and 2001 in language education, the defi-
nition of LAL has been expanded and reconceptualized, and a number different
models proposed . In the light of the prominence LAL has recently gained, and
given the depth and breadth of this concept, this chapter intends trace its genesis,
examining the development of ‘literacy’ to ‘literacies’, and of ‘assessment literacy
to ‘language assessment literacy’. It also explores theoretical frameworks and
proposed models. In conclusion, the concept of LAL is problematized and future
directions of the field are suggested.
From literacy to literacies

A quick review of the definition of ‘literacy’ in different disciplines, might lead
one come to conclude that, with the exception of the core definition (i.e.,
being able to read and write), there is no unanimous definition among scholars
in the humanities. According to Hillerich (1976), the meaning of ‘literacy’ may
be argued to be ambiguous, or, as more recently stated by UNESCO (2004),
considered as “not uniform” (p. 13). The birth of the concept can be traced
back to the time when sociologists started to study man as talking and writing
animal (Goody & Watt, 1963), but the benchmark to define literacy varies in
different cultural, social, economic, political, and even religious contexts. In
more modern contexts, the meaning is metaphorically expanded to cover other
concepts such as computing, art, drama, math, and science. For UNESCO,
‘literacy’ is defined as:
Ontogenetic and phylogenetic perspectives 53
the ability to identify, understand, interpret, create, communicate and com-

pute, using printed and written materials associated with varying contexts.
Literacy involves a continuum of learning in enabling individuals to achieve
their goals, to develop their knowledge and potential, and to participate fully
in their community and wider society.
(UNESCO, 2004)
This conceptualization leads us to delineate literacy as a complex, multifaceted issue

in the modern era. Taylor (2013) even viewed it as ‘literacies’ or ‘multiple literacies’
and claimed it to be the result of ‘pressure to acquire an ever expanding body of
knowledge, skills and competence relating to a growing number of domains in
everyday life’ (p. 404), which includes the language and discourse patterns of each
of these disciplines. The expanding trend of acquiring knowledge and skills, and the
ability to use them efficiently in daily life, was the consequence of sociocultural and
functional approaches to education. This was what Hyland and Hamp-Lyons
(2002) named ‘academic literacy’ defining it as ‘the complex set of skills (not
necessarily only those relating to the mastery of reading and writing) which are
increasingly argued to be vital underpinnings or cultural knowledge required for
success in academic communities’ (p. 4). With the emergence of multifarious
translations of literacy in the last few decades, and the introduction of notions such
as ‘media literacy’, ‘emotional literacy’, ‘health literacy’, ‘vernacular literacy’, ‘cul-
tural literacy’, ‘moral literacy’, and ‘critical literacy’, it comes as no surprise to have
‘assessment literacy’ (Stiggins, 1991) appear in the field of education.
The genesis of language assessment literacy

Assessment as one of the components of any educational curriculum is believed
to be the section which is ‘the least amenable to change’ (Scarino, 2013, p. 310).
Yet, with the augmenting rate of reform in the approaches, methods, and tech-
niques in teaching and learning in the last decades of the twentieth century,
change in the testing and assessing of the products of teaching processes was
inevitable. As viewed by Stiggins (1991), who first coined the term ‘assessment
literacy’, educational agencies were required to administer tests to measure and
document the outcomes of their programs. Besides, major educational and policy
decisions counted on the results of the tests. For Stiggins (1995), assessment lit-
erate individuals have ‘a basic understanding of the meaning of high- and low-
quality assessment and are able to apply that knowledge to various measures of
student achievement’ (p. 535). They come
to any assessment knowing what they are assessing, why they are doing so,
how best to assess the achievement of interest, how to generate sound
samples of performance, what can go wrong, and how to prevent those
problems before they occur.
(p. 240)
54 Mojtaba Mohammadi and Reza Vahdani Sanavi
The need for assessment to move from the periphery to centre stage was keenly
felt. The reason might be that testing has turned out to be ‘a big business’
(Spolsky, 2008, p. 297) – both commercially or non-commercially – and ‘the
societal role that language tests play, the power that they hold, and their central
functions in education, politics and society’ (Shohamy & Or, 2013, p. x). After
Stiggins, other scholars started to conceptualize assessment literacy, such as
Falsgraf (2006), who defined it as ‘… the ability to understand, analyze and
apply information on student performance to improve instruction’ (p. 6).
Inbar-Lourie (2013) viewed it as ‘the knowledge base required for performing
assessment tasks’ (p. 2924).
The concept of ‘language assessment literacy’ entered the field of language
assessment at the beginning of the 21st century when Brindley (2001) stated
that, unlike in general education, language teaching programs lack a sizable
bulk of research in ‘teacher’s assessment practices, levels, and training, and
professional development needs’ (p. 126). Without mentioning the stake-
holders, some scholars defined language assessment literacy. For Inbar-Lourie
(2008a), it is defined as ‘having the capacity to ask and answer critical questions
about the purpose for assessment, about the fitness of the tool being used,
about testing conditions, and about what is going to happen on the basis of
the test results’ (p. 389). In O’Loughlin’s (2013) view, it encompasses ‘the
acquisition of a range of skills related to test production, test-score inter-
pretation and use, and test evaluation in conjunction with the development of
a critical understanding about the roles and functions of assessment within
society’ (p. 363). Vogt and Tsagari (2014) saw LAL as ‘the ability to design,
develop and critically evaluate tests and other assessment procedures, as well
as the ability to monitor, evaluate, grade and score assessments on the basis of
theoretical knowledge’ (p. 377). Another definition, from quite a general
perspective in terms of stakeholders, is Pill and Harding’s (2013), which states
that LAL ‘may be understood as indicating a repertoire of competences that
enable an individual to understand, evaluate and, in some cases, create lan-
guage tests and analyse test data’ (p. 382).
Malone (2013) laid the accountability on teachers’ shoulders and remarked
that LAL ‘refers to language instructors’ familiarity with testing definitions and
the application of this knowledge to classroom practices in general and specifi-
cally to issues related to assessing language’ (p. 329). Fulcher (2012) offers a
definition of LAL with a wider scope, which embraces different assessment
competencies. He argued that LAL is:
the knowledge, skills and abilities required to design, develop, maintain or

evaluate, large-scale standardized and/or classroom based tests, familiarity
place knowledge, skills, processes, principles and concepts within wider
historical, social, political and philosophical frameworks in order to
(p. 125)
The above conceptualizations of LAL have one thing in common. They have
pinpointed micro- and/or macro-level components of language assessment
practice. Some of the micro-level components are assessment, such as design-
ing, developing, grading, and analysing tests in the classroom, and the related
theoretical issues. Some of the macro-level components are assessment practices
which deal with having a critical perspective for the purpose of assessment,
societal roles and the consequences of assessment, and holding assessment
within wider historical, social, political, and philosophical frameworks.
LAL competencies
In line with the bringing attention to language assessment literacy, there were
attempts to demystify the concept and elaborate on what exactly is meant by
attaining this kind of literacy. The earliest attempt to describe ‘assessment literacy’,
though not named so, in education was the list of seven standards proposed by the
American Federation of Teachers, the National Council on Measurement in
Education, and the National Education Association in 1990, which are summar-
ized by Stabler-Havener (2018, p. 3). It includes skills such as:
Choosing and developing assessment methods appropriate for making instructional

decisions.
Administering, scoring, and interpreting the results of teacher produced and
externally produced assessments.
Appropriately using assessment results to inform decisions regarding student and
curriculum development.
Devising valid grading procedures for student assessments.
Communicating assessment results to stakeholders.
Identifying unethical and inappropriate assessment methods and use of assess-
ment data.
The other major proposed competencies of LAL are presented in four dif-
ferent models. The first one is Brindley’s (2001) professional development
program model which introduced five competencies for a language teacher. He
criticized the standards presented by the American Federation of Teachers, the
National Council on Measurement in Education, and the National Education
Association as not being ‘flexible enough to allow teachers to acquire familiarity
with those aspects of assessment that are relevant to their needs’ and not having
considered the individuals’ different levels of knowledge in assessment issues

(p. 129). Reiterating that any program for teachers on assessment should be
related to the curriculum, geared to the current level of teachers’ knowledge,
and fitted to their needs, he proposed five components, two of which were
considered as the core units for any stakeholder. They are
Knowledge about the social context of assessment (including social, educa-

tional, and political aspects of assessment and such issues as accountability,
standards, and ethics).
Defining and describing proficiency (dealing with the knowledge of theoretical
background of language tests and the related concepts of reliability and validity).
Constructing and evaluating language tests (including the skills in test development
and analysis).
Assessment in the language curriculum (exploring the notion of criterion-
reference in learning and testing and examining assessment performance-based
test options such as observation, portfolio, conferencing, project work, journal
keeping).
Putting assessment into practice (knowledge of strategies to assign action plans
for all the theoretical issues raised in the whole program to be further explored
and documented).
The next model is the skills, knowledge, principles model proposed by

Davies (2008). At the end of his assessment coursebook evaluation, he put forth
three competencies of teachers’ LAL.
Skills provide the training in necessary and appropriate methodology,

including item writing, statistics, test analysis and increasingly software
programs for test delivery, analysis and reportage. Knowledge offers relevant
background in measurement and language description as well as in context
setting. Principles concern the proper use of language tests, their fairness and
impact, including questions of ethics and professionalism.
(p. 335)
This is rather similar to Inbar-Lourie’s (2008a) three competency-related

questions of how (to assess), what (to assess), and why (to assess) to be addressed
by language teachers, which correspond to the Davies’ skills, knowledge, and
principles respectively.
Having analysed the data collected from 278 international language teachers,
Fulcher (2012) presented a tripartite model of needs for an assessment training
program. In his hierarchical model, he adopted the skill, knowledge, and ability
competencies of LAL proposed by Davies as the fundamentals for language
teachers, and called it ‘practice’. Putting the concept of LAL in to a broader
perspective, he expanded its model to two more tiers of ‘principles’ and ‘con-
text’. The concept of ‘principles’ includes the issues that guide teachers to have
the best possible practice, i.e., getting acquainted with principles, concepts,
processes of assessment, and ethics and codes of practice. ‘Context’ looks at a
wider landscape of LAL by placing practice and principles within a historical,
political, social, and philosophical framework for the stakeholders to be able to
figure out the origin and justifications of these practices, and principles, and the
impact(s) adopting any one of them may have on society, organizations, and
individuals. The initiative of the model seems to be twofold. One is that the
hierarchical nature of the model presents a sequence for the course providers
and trainers, which can be very assistive in providing coursebooks or planning
their sessions. The other, which can also be taken as the consequence of the
hierarchy of materials and issues here, is that these levels are not essential for all
stakeholders. That is, depending on the level they are working at, it can be
mandatory or optional for them. Another point to mention is the necessity of
practicing the theories for the teachers.
More recent research attempts introducing the componential framework of
language assessment literacy are Pill and Harding (2013) and Taylor (2013),
which have taken different view on the concept by accounting for stake-
holders in LAL. Excluding Brindley (2001) and Fulcher (2012) who, in one
way or another, consider a variety of stakeholders with respect to the levels of
literacy development, the studies by Pill and Harding and Taylor highlight
the agencies that can benefit differently and to various degrees from LAL
competencies. Pill and Harding adapted the idea from Bybee (1997), and it
was later expanded by Kaiser and Willander (2005) in the fields of scientific
and mathematical literacy in education. Unlike the previous models which
are mostly modular, Pill and Harding proposed a continuum of stages for LAL
ranging from illiteracy, which is complete ignorance in assessment concepts
and methods, to multidimensional literacy, which includes knowledge about
the philosophical, historical, and social background of assessment. Kaiser and
Willander (2005) summarized these five stages as follows:
Table 4.1 LAL stages

Illiteracy Ignorance of language assessment concepts and methods
Nominal literacy Understanding that a specific term relates to
assessment, but may indicate a misconception
Functional literacy Sound understanding of basic terms and concepts
Procedural and conceptual literacy Understanding central concepts of the field, and
using knowledge in practice
Multidimensional literacy Knowledge extending beyond ordinary concepts
including philosophical, historical and social
dimensions of assessment
We think that the major demerit here is the lack of a clear definition of
the levels each stakeholder is expected to attain, as Harding and Kremmel
(2016) also note. Taylor’s (2013) spider web model provides the solution.
In her LAL profile model, there are eight competencies, which are avail-
able in the previously mentioned models, with the exception of one:
knowledge of theory, technical skills, principles and concepts, language
pedagogy, sociocultural values, local practices, personal beliefs/attitudes,
and scores and decision making. This is the first model that has paid
attention to the personal beliefs/attitudes of the stakeholders in language
assessment. In tandem with Scarino (2013) and Giraldo (2018), we believe
that some particularities of language assessment practice come from the
bottom, from teacher–assessors’ interpretations, judgements, and decisions
in assessment, which result from their developing capabilities. Taylor’s
model has mapped these eight components onto a five-stage continuum
adopted from Pill and Harding (2013) (0 = illiteracy to 4 = multi-
dimensional literacy), and proposed which components, and how much of
each, should be achieved by each stakeholder – test writers, classroom
teachers, university administrators, and professional language testers.
Figure 4.1 Differential AL/LAL profiles for four constituencies. (a) Profile for test wri-
ters. (b) Profile for classroom teachers. (c) Profile for university adminis-
trators. (d) Profile for professional language testers.
Adopted from Taylor (2013), p. 410
According to the model, a test writer, for example, is expected to have more
developed literacy in knowledge of theory, technical skills, and principles and
concepts, and less in personal beliefs/attitudes, local practices, and language
pedagogy than a classroom teacher. In spite of the differentiation of the stake-
holders, and the related levels of mastery in each dimension of LAL competency
here, Taylor (2013) has failed to define these dimensions; hence, anyone can
have his or her own definition of them (Harding & Kremmel, 2016).
In a recent study, Yan, Zhang, and Fan (2018) investigated how contextual
and experiential factors mediate teachers’ LAL development. They proposed a
two-layered mediation model for language teachers LAL competencies that
includes contextual factors (such as educational and assessment policies, assess-
ment practice for different stakeholders, and the resources and constraints of the
local instructional context), and experiential factors (such as assessment develop-
ment, i.e., item writing skills, and development of the assessment intuition, i.e.,
items analysis and score use). They also found that ‘the impact of contextual
factors is mediated through experiential factors. That is, while the assessment
context creates opportunities and motivation for assessment practice, it is the
accumulation of assessment experiences that foster and strengthen assessment
knowledge and skills’ (p. 166). Like some previous studies (e.g., Giraldo, 2018;
Scarino, 2013, Taylor, 2013), the enhancement of language teachers’ LAL begins
with their own assessment experiences, interpretations, and self-awareness.
Problematizing LAL and future directions

In the previous parts of the chapter, we have detailed the ontogenetic and
phylogenetic analyses the concept of LAL has undergone, from its origin to its
reconceptualization. In the next section, we will problematize the concept of
LAL by critiquing what has been done so far in the field and by predicting
possible future directions for research.
LAL conceptualization
The literature of assessment literacy, both in general and in language educa-
tion, has put forward a number of definitions for key concepts. In language
education, from the earliest model up to the most recent one, few have
included macro-level assessment (i.e. consideration of the social, cultural, and
political context) when evaluating competencies. The majority of the com-
petency evaluations are for micro-level assessment (i.e. classroom assessment).
There is a need to specify a fairly balanced weight for the mid-level organi-
zations and agencies, institutions, associations, and communities. Along with a
need for a more comprehensive theoretical definition, LAL could also benefit
from being better defined operationally. Several studies have designed a
questionnaire and used interviews (e.g., Crusan, Plakans, & Gebril, 2016;
Fulcher, 2012; Hasselgreen, Carlsen, & Helness, 2004; Vogt & Tsagari, 2014);
however, a detailed description of what an assessment literate individual needs

to know, and a robust instrument, which is theoretically based on one of the
competency models, are yet to be developed.
LAL stakeholders
Taylor’s (2013) recent model, paid attention to a number of stakeholders in language
assessment; however, other studies were either limited to looking at teachers or
remained silent regarding the existence of any stakeholders. Any re-conceptualiza-
tions need to be comprehensive in scope in order to cover all stakeholders including:
teachers, test writers, university administrators, and professional language testers.
Other stakeholders can also play a crucial role in the development, analysis, and
interpretation of a test. If parents, for example, are assessment-literate and are familiar
with a few testing principles and practices, teachers and school administrators can
have their support in their school/classroom activities.
LAL teacher education programs

Many teacher education centres all around the world are offering training
programs and/or short courses for pre-service and in-service language teachers
to establish or enrich their competence to start or continue their careers. Yet,
few modules or unit credits are allotted to assessment competencies and the
results of studies revealed that teachers have inadequate knowledge of assess-
ment (Fulcher, 2012; Lam, 2014; Mendoza & Arandia, 2009; Yan, 2010). One
of the reasons, as Brindley (2001) claimed, is the insufficient attention to
assessment-related contents in the teacher education courses. Even if there is a
course, it is devoted to presenting the formal testing practices rather than for-
mative or classroom-based assessment (Bailey & Brown, 1996; Brown & Bailey,
2008) and it cannot adequately equip learners with the fundamentals of an
assessment literacy (Lam, 2014). Most of these courses are not compulsory ones
but electives that have no guarantee of registration for the pre-service teachers
(Lam, 2014. Mendoza and Arandia (2009) also argued that undergraduate and
graduate education programs should teach language assessment practices to pre-
service teachers, and in-service teachers can try it on their own. These results are
also endorsed by other scholars (e.g., Hasselgreen, 2008; Vogt & Tsagari, 2014).
On the practitioners’ side, there are major courses offered by Cambridge and
Trinity College London, as well as tests like the Cambridge Teaching Knowledge
Test (TKT) for pre-service and in-service teachers, which include few to no sec-
tions for assessment-related issues. In her analysis of TKT, Stabler-Havener (2018)
reported that only thirteen percent of the first two modules and sixteen percent of
the third module are dedicated to assessing literacy, and while the young learner
(YL) module has twenty-five percent assessment-related content, the Content and
Language Integrated Learning (CLIL) module has only five percent. We also
examined the syllabus and assessment guidelines of the Cambridge Certificate in
Teaching English to Speakers of Other Languages (Adult) (CELTA) (5th ed.,

2018) and there is no content dealing with assessment issues: ‘Learners and tea-
chers, and the teaching and learning context, Language analysis and awareness,
Language skills: reading, listening, speaking and writing, Planning and resources
for different teaching contexts, and Developing teaching skills and professional-
ism’. In the Certificate in Teaching English to Speakers of Other Languages
(CertTESOL) course, which is the CELTA counterpart run by Trinity College
London (2016) for pre-service English teachers, there is little to no trace of lan-
guage assessment content in the five units ‘Teaching Skills’, ‘Language Awareness
& Skills’, ‘Learner Profile’, ‘Materials Assignment’, and ‘Unknown Language’. In
the Diploma in Teaching English to Speakers of Other Languages (Adults)
(DELTA) course at Cambridge University (2019), there are few sections related to
language assessment: one on key concepts, and three on terminology related to
assessment and assessment issues as English Language Teaching (ELT) specialism.
In the Diploma in Teaching English to Speakers of Other Languages (DipTESOL)
course at Trinity College London (2017), there is only one module related to
language assessment out of ten. This underlines the fact that these highly fashion-
able teacher training courses are insufficiently designed in terms of increasing the
theoretical and practical awareness of the teachers’ knowledge of assessment in
general and classroom-based assessment in specific.
LAL resources
To enhance LAL among teachers (or even other stakeholders), teacher educa-
tion programs, or other stakeholders, require adequate resources. Davies (2008),
in his analysis of the assessment textbooks published over 46 years, revealed that
language testing experts have separated themselves from the field of education
testing which has resulted in ‘students [being] over-protected from exposure to
empirical encounters with real language learners’ (p. 341). In the resources
meant to equip novice teachers with understanding and practical skills in the
classroom environment, there seems to be a need to have a trade-off between
covering theoretical concepts and practical, classroom-based assessment issues.
Fulcher (2012) also investigated the needs of the language teachers and con-
cluded that language teachers’ require certain resources to fulfil their role as
language testers:
A text that is not light on theory, but explains concepts clearly, especially
where statistics are introduced.
A practical ‘how-to’ guidance, although not prescriptive in nature.
A balance between classroom and large-scale testing, with illustrations and
practical examples drawn from a range of sources and countries.
Activities that can be reasonably undertaken given the constraints and
resources that teachers normally face. (p. 124)
Textbooks are not the only available resources to help LAL stakeholders
enhance their awareness of and adherence to language assessment literacy norms.
As Malone (2013) noted, in recent years, textbooks can be supplemented in
assorted ways, including:
traditional as well as face-to-face workshops, online or downloadable

tutorials, materials produced by professional language testing associations,
reference frameworks, such as the Common European Framework of
Reference in Europe, video projects, pre-conference workshops, and series
of narrative accounts about developing assessment literacy.
(p. 332)
Though non-textbook resources are currently rather prevalent, they cannot be

regarded as effective and efficient as written textbooks and practical manuals.
Conclusion
In the early 21st century, teaching, learning, and assessment are no longer three
stranded islands in the ocean of education, in general, nor is it the same in that
of language education, in particular. They are parts of a trilogy, with a series of
theories and practices in one large compendium, with various characters and
roles, and different scripts, but carefully stage-managed to make one voice and
a single message: Teaching is for the sake of learning, and assessing is for the
sake of learning. Assessment is no longer an ignored field in this compendium.
As Coombe, Troudi and Al-Hamly (2012) estimated, assessment makes up 30
to 50 percent of the daily activities of the language teachers. Hence, assessment
literacy deserves more attention. In its short lifespan, LAL has undergone a
series of (re)conceptualizations with a number of models having been proposed,
yet it continues to mature. There are several areas for future growth:
1 The concept ‘language assessment literacy’ requires the adoption of a more

comprehensive perspective and its operationalization still has room for growth.
2 There is a need for research on stakeholders other than teachers (i.e. parents
and policy makers). Also, learner assessment literacy can be investigated to
explore the impact of a learner’s awareness of the assessment-related issues.
According to Kumaravadivelu, 2006), this would help achieve ‘liberatory
autonomy’ for language learners. Learner assessment literacy can go side by
side with other competencies essential in the 21st century, such as critical
thinking, problem-solving, effective collaboration and communication, and
self-directed language learning (Koh & DePass, 2019).
3 Language teacher education program should be reformed to include more
assessment-related content.
4 The resources should cover a balance of the theoretical and practical issues
of assessment for stakeholders in the field.
Only when language assessment literacy has explored these growth areas, can
we think of the concept as mature and developed, and contributing to an
increased quality of language education.
References
Bailey, K. M., & Brown, J. D. (1996). Language testing courses: What are they? In A.
Cumming & R. Berwick (Eds.), Validation in language testing (pp. 236–256). Multi-
lingual Matters.
Brown, J. D., & Bailey, K. M. (2008). Language testing courses: What are they in 2007.
Bybee, R. W. (1997). Achieving scientific literacy: From purposes to practices. Heinemann.
Brindley, G. (2001). Language assessment and professional development. In C. Elder, A.
Brown, K. Hill, N. Iwashita, T. Lumley, T. McNamara & K. O’Loughlin (Eds.),
Experimenting with uncertainty: Essays in honor of Alan Davies (pp. 126–136). Cambridge
University Press.
Cambridge University (2018). Certificate in teaching English to speakers of other languages
(CELTA): Syllabus and assessment guidelines. https://www.cambridgeenglish.org/Ima
ges/21816-celta-syllbus.pdf
Cambridge University (2019). Diploma in teaching English to speakers of other languages
(DELTA): Syllabus specifications. https://www.cambridgeenglish.org/Images/
22096-delta-syllabus.pdf
Coombe, C., Troudi, S., & Al-Hamly, M. (2012). Foreign and second language tea-
cher assessment literacy: Issues, challenges, and recommendations. In C. Coombe,
P. Davidson, B. O’Sullivan & S. Stoynoff (Eds.), The Cambridge guide to second lan-
guage assessment (pp. 20–29). Cambridge University Press.
Crusan, D., Plakans, L., & Gebril, A. (2016). Writing assessment literacy: Surveying second
language teachers’ knowledge, beliefs, and practices. Assessing Writing, 28, 43–56.
327–347.
Falsgraf, C. (2006). Why a national assessment summit? New visions in action. National
Assessment Summit. Summit conducted in Alexandria, Va. https://files.eric.ed.gov/
fulltext/ED527580.pdf
Quarterly, 9(2), 113–132.
Goody, J., & Watt, J. (1963). The consequences of literacy. Comparative Studies in Society
and History, 5(3), 304–345.
Harding, L., & Kremmel, B. (2016). Language assessment literacy and professional
development. In D. Tsagari & J. Banerjee (Eds.), Handbook of second language assessment
(pp. 413–428). Mouton de Gruyter.
Hasselgreen, A. (2008). Literacy in classroom assessment (CA): What does this involve?5th
Annual Conference. Paper conducted at the meeting of the European Association for
Language Testing and Assessment, Athens, Greece. http://www.eaulta.eu.org/con
ferences/2008/docs/sunday/panel/Literacy%20in%20classroom%20assessment.pdf
Hasselgreen, A., Carlsen, C., & Helness, H. (2004). European survey of language testing and
assessment needs: Report: Part 1 general findings. European Association for Language
ort-pt1.pdf
Hillerich, R. L. (1976). Toward an assessable definition of literacy. The English Journal,
65(2), 50–55.
Hyland, K., & Hamp-Lyons, L. (2002). EAP: Issues and directions. Journal of English for
Academic Purposes, 1(1), 1–12.
Inbar-Lourie, O. (2008a). Constructing an assessment knowledge base: A focus on
language assessment courses. Language Testing, 25(3), 385–402.
Inbar-Lourie, O. (2013). Language assessment literacy. In C. Chapelle (Ed.), The ency-
clopedia of applied linguistics (pp. 1–9). Blackwell Publishing Ltd.
Kaiser, G., & Willander, T. (2005). Development of mathematical literacy: Results of an
empirical study. Teaching Mathematics and its Applications, 24(2–3), 48–60.
Koh, K., & DePass, C. (2019). Developing teachers’ assessment literacy: Multiple per-
spectives in action. In K. Koh, C. DePass, & S. Steel (Eds.), Developing teachers’
assessment literacy: A tapestry of ideas and inquiries (pp. 1–6). Brill Sense.
Kumaravadivelu, B. (2006). Understanding language teaching: From method to postmethod.
Malone, M. (2013). The essentials of assessment literacy: Contrasts between testers and
Mendoza, A. A. L., & Arandia, R. B. (2009). Language testing in Colombia: A call for more
teacher education and teacher training in language assessment. Profile, 11(2), 55–70.
O’Loughlin, K. (2013). Developing the assessment literacy of university proficiency test
from a parliamentary enquiry. Language Testing, 30(3), 381–402.
Scarino, A. (2013). Language assessment literacy as self-awareness: Understanding the
role of interpretation in assessment and in teacher learning. Language Testing, 30(3),
309–327.
Shohamy, E., & Or, I. G. (2013). Introduction to volume 7. In E. Shohamy, I. G. Or, &
S. May (Eds.), Encyclopedia of language and education (3rd edition., Vol. 7) (pp. ix–xviii).
Springer Science and Business Media.
Spolsky, B. (2008). Language testing at 25: Maturity and responsibility? Language Testing,
25(3), 297–305.
Stabler-Havener, M. L. (2018). Defining, conceptualizing, problematizing, and assessing
language teacher assessment literacy. Teachers College, Columbia University Working
Papers in Applied Linguistics & TESOL, 18(1), 1–22.
Stiggins, R. J. (1991). Assessment literacy. Phi Delta Kappa, 72(7), 534–539.
Stiggins, R. J. (1995). Assessment literacy for the 21st century. Phi Delta Kappa, 77(3),
238–245.
Taylor, L. (2013). Communicating the theory, practice and principles of language testing
to test stakeholders: Some reflections. Language Testing, 30(3), 403–412.
Trinity College London (2016). Certificate in teaching English to speakers of other languages
(CertTESOL): Syllabus. https://www.trinitycollege.com/resource/?id=5407
Trinity College London (2017). Licentiate diploma in teaching English to speakers of other
languages (LTCL Diploma TESOL): Validation requirements, syllabus and bibliography for
validated and prospective course providers: Syllabus. https://www.trinitycollege.com/
resource/?id=1776
UNESCO (2004). The plurality of literacy and its implications for policies and programs: Edu-
cation sector position paper. UNESCO. http://unesdoc.unesco.org/images/0013/
001362/136246e.pdf
Vogt, K., & Tsagari, D. (2014). Assessment literacy of foreign language teachers:
Findings of a European study. Language Assessment Quarterly, 11(4), 374–402.
Yan, J. (2010). The place of language testing and assessment in the professional preparation
of foreign language testers in China. Language Testing, 27(4), 555–584.
Yan, X., Zhang, C., & Fan, J. J. (2018). Assessment knowledge is important, but …:
needs of language teachers. System, 74, 158–168.
Part 2
Students’ language assessment

literacy
Chapter 5
Enhancing assessment literacy

through feedback and
feedforward
A reflective practice in an EFL classroom
Junifer A. Abatayo
Introduction
Providing effective feedback and expansive feedforward involves a complex
process that requires teachers to provide meaningful advice to students’ work to
take their learning forward. Feedback, as described by Brown, Bull, and Pen-
dlebury (2013), is the best-tested principle and practice in the classroom. It can
be meaningful and relevant when it is directed towards encouraging students to
improve their own learning. In addition, feedback becomes effective when it is
focused on students and where it provides the opportunity to act in response.
This closing the gap between where the learner is and where the learner will go
leads to the power of feedback (Hillocks, 1986; Hattie, 2009). As Brown and
Knight (1994) put it, good feedback is achieved when students are suitably
guided towards developing deep understanding to affect learning.
Feedback has been introduced and used intensively in English as a Foreign
Language (EFL) and English as a Second Language (ESL) classrooms as a
method of encouraging students to collaborate with others, contribute to
learner autonomy, and develop a sense of ownership (Berg, 1999; Carson &
Nelson, 1996; Tsui & Ng, 2000; Paulus, 1999).
Some cultures are particularly sensitive to receiving feedback and feedfor-
ward, therefore knowledge of a specific classroom culture can be helpful.
Hattie and Timperley (2007) asserted that learning context is very important in
providing feedback. Noels (2001) added that by acknowledging students’ needs
and preferences, feedback could help students develop an understanding of its
positive effect as a technique that could lessen the negative affective results of
their work. Furthermore, Hyland (2003) advised that teachers should imple-
ment feedback by explaining to students that they are part of the process, and
responsible for their own development. Peacock’s study (as cited in Al-wossabi,
2019) stated that when teachers employ different types and techniques of
feedback in relation to students’ preferences and styles, it would support and
advance their learning. This point is of great importance because feedback not
only helps students and teachers achieve effective learning and teaching, it also
70 Junifer A. Abatayo
provides an opportunity in enhancing assessment literacy through under-

standing what students know and what they can do, and using results of
assessments to improve curriculum, programmes, learning, and teaching.
As an EFL language teacher in the Sultanate of Oman, I believe that my
experience in providing feedback on students’ work in the classroom has
enhanced the quality of student learning, and encouraged students to reflect
and move forward, closing the gap between where they are and where they
will go next. I have been teaching in Oman for 5 years, and I admit that it has
been very challenging. My students come from a myriad of cultural and edu-
cational backgrounds; therefore, it is important to consider their previous
experiences. I need to consider the underlying socio-cultural factors that might
affect students’ involvement in the learning process such as: their unique use of
the English language, authentic speech, and developmental errors committed
while being EFL learners. Considering these factors made my teaching
engagement with students, and the education process more inclusive.
This paper explains my critical review and reflects on how aspects of my
teaching, assessment practices, and assessment literacy are enhanced through the
integration of feedback in the classroom. The paper also elucidates how I value
my students as learners in EFL context, the support I provided on their
achievement, and the use of technology to support feedback and feedforward
in the classroom.
My teaching context
The Sultanate of Oman and its Ministry of Education have high hopes that
young Omanis, equipped with a good education can cope up with the demands
of time. To support all Omani students’ learning, the Education Council of
Oman (2017) put emphasis on its Education Vision 2040. Education authorities
mandated that development of the education system and curricula should pro-
vide students with positive reinforcement and support in classrooms across dif-
ferent levels. Highlighting the support needed in the classroom, one important
move is to monitor students’ progress and lead them to the next phase of learn-
ing. This is where feedback and feedforward mechanisms play their role in the
development of students’ learning. However, the lack of communication
between teacher and student, between student and student, and among students
posed a threat to achieving effective teaching and learning. Interestingly, Al-Issa
(2006) an Omani scholar, stressed that the teacher’s role is important in raising
students’ awareness about taking individual responsibility and in advancing their
own learning. When feedback is timely, and purposely given, it will help stu-
dents advance their learning.
In 2016, I initially conducted an exploratory study that addressed the
importance of the integration and implementation of feedback and feedforward
in the context of Omani students’ learning experiences. Students’ form of
writing is one aspect of teaching that has influenced my desire to go extra miles
in assessing their work. The objectives of the investigation were to document

students’ learning based on feedback provided by the teacher, and to outline
practical forms of assessment in order to monitor their progress in the class-
room. Black and Wiliam (1998) indicated that progress monitoring could
increase students’ learning. With constant follow-up, and an open line of
communication with me, students can monitor their own learning.
At the Faculty of Language Studies (FLS), there is a need to provide mean-
ingful feedback on students’ work in order to provide them with meaningful
learning experiences. Monitoring students’ learning can help them prepare for
the next classroom activity, thus facilitating their own learning. In his study,
Budimlic (2012) affirmed that the functionality and importance of feedback in
the classroom raised the standards of student achievement. This also confirmed
the initial investigation I conducted two years ago; the integration of feedback
and feedforward in an EFL context has worked well and helped students
understand their own progress and learning. According to Sadler (1989),
formative assessment and feedback are concerned with how judgements about
the quality of students’ responses can be used to shape and inform decisions to
improve students’ competence. Remarkably, Sadler’s explanation of formative
and summative assessment showed alignment with the initial findings of my
analysis on students’ feedback. Sadler further emphasized that, to provide
feedback, there must be judgement or evaluation of a product. In writing
classes, teachers evaluate students’ writing, then students’ received feedback to
help them understand their very own strength and weaknesses in writing. Irons
(2007) shared that teachers’ awareness on students’ learning can also influence
teaching; therefore, feedback is important in helping students advance their
learning. Feedback is important if students are made aware of its process, and
when they are given the opportunity to set their own learning goals (Black &
Wiliam, 1998; Torrance & Pryor, 2002; Budimlic, 2012; McCord, 2012;
Evans, Hartshorn & Tuioti, 2010)
My personal experience as a language teacher in EFL writing classes has
influenced my teaching and has helped me understand the social context of my
students. It also helped me understand the nature of the assessment environ-
ment, assessment literacy, instructional processes, and testing conditions.
Through understanding the feedback mechanisms as suggested by research and
literature, it made me realize that students can do more in the classroom when
guided properly and in a timely fashion.
Teaching practice: The case at Sohar University

The Faculty of Language Studies (FLS) at Sohar University has two streams:
English Language Studies and Translation. Three courses at FLS involve writ-
ing: Academic Writing 1, Academic Writing 2 and Professional Writing. All
these courses require students to write academic journals, summaries, para-
phrases, application letters, and other forms of academic writing. When I first
taught one of these writing classes, I noticed that there were no explicit men-
tions of feedback sessions on students’ writing in the course profile and port-
folios. Although feedback can be conducted indirectly, it is important that
students and teachers share and understand the feedback mechanism to improve
teaching and advance learning. According to Black and Wiliam (1998), all
activities undertaken by teachers and students in assessing themselves and their
work can help develop teaching and learning and can increase engagement in
the classroom. At the Faculty of Language Studies, it is central to adopt and
implement a clear feedback mechanism that can influence good learning and
teaching. I realized that providing feedback and effective feedforward within a
learning and teaching context is essential to the development of students’
potential. In the case of EFL instruction, in particular in the writing classroom,
my students experienced both global and developmental errors, and I believe
that if these errors are used effectively as a form of feedback, it can help stu-
dents improve their work.
As an EFL teacher, I always engage my students in the teaching and learning
process in order to offer them ownership of their written work. It is true that
there are contradictions between teacher and student perceptions in relation to
the different forms of feedback used in the classroom. This led me to suggest an
assessment protocol posing these questions in context: ‘Where I am going?’
‘How I am going?’ and ‘Where to next?’ (Hattie & Timperley, 2007). With
these questions in mind, it is my desire and teaching goal to guide students and
motivate them to explore opportunities for learning. In my EFL classroom, my
students’ voices are heard, and I always remind them that their mistakes in
writing are part of their development and can help them develop awareness in
becoming self-directed learners.
Course design, learning outcomes, assessment, and feedback

mechanism
Below is the WRIT4111 course that outlines learning outcomes, course content
and learning activities, assessment design and feedback (see Table 5.1). The pur-
pose of this description and documentation is to show the course I taught at the
Faculty of Language Studies: how the course was developed, what topics were
discussed, its assessment design, and feedback mechanisms and procedures used in
informing students of their own learning. Feedback mechanisms were the most
recent activities added in the course profile to guide my colleagues and students
on how to conduct formative feedback (Ethical application approved).
Course teaching and learning activities

To develop and improve this writing course, I met my co-teachers and the
course coordinator to discuss relevant additions to the course profile that relates
to formative feedback and feedforward phases in order to guide students. The
Table 5.1 WRIT4111 Course Profile - WRIT4111: Professional writing course
Learning outcomes Course Teaching and Learning Assessment design Feedback mechanisms used
Activities
Explain the rhetorical context of Lectures- group discussions, pair Assessment 1 Group discussions
different kinds of professional work, discussion leaders
writing (purpose, audience, medium) Written quiz which covers Pair work
Tutorials- group discussions, pair lectures and discussions
Integrate the stages of the writing work, learning to write writing conducted in the class Students were asked to share
process to develop, organize, present to learn (LWWL) their sample papers/ writing
ideas and information in writing Assessment 2
Lab and Independent study Peer critique/peer review
Compose concise, clear, Proofreading and Editing
Challenge Self-editing checklist
well-focused sentences that commu- Homework- Independent
nicate effectively and professionally learning, additional readings Feedback sessions
Assessment 3
Demonstrate proofreading skills for Portfolio developmen Reporting
basic grammatical and punctuation Learning to write writing to
errors, revise texts accordingly Collection and compilation of
writing exercises, independent learn feedback sessions
Create and produce portfolio show- learning activities, group work Can do statements
casing different forms of professional activities, self-assessments,
writing feedback sessions and reflection
Provide and engage in feedback papers, rubrics
sessions on the acquisition of writing Assessment 4 (Final examination)
strategies and application of different
concepts in professional writing Written exam
*Assessment is both summative
and formative
course is designed according to different levels of students’ performance and the

nature of the course. The types of performance reflected in the course should
demonstrate how students’ achieve learning, knowledge, skills, and understanding
as described by the objective. The description of the performance served as the
basis for acceptable evidence of achievement; therefore, focus of instruction, stu-
dent learning and assessment were given utmost consideration. If students are
made aware of the course learning outcomes at the beginning of instruction, both
teachers and students are working towards the achievement of their common and
shared goal, curricular objectives, teaching, and assessment. Significantly, clear
definition of learning outcomes and emphasis of assessment designs are part of the
process in supporting students, teaching, and learning.
With regard to teaching and instruction, I made sure that the writing course
is interesting even to struggling EFL students. It was not easy; it was very
challenging. Nevertheless, seeing your students learn something new is already
an achievement especially when they know how to direct their own learning.
Al-wossabi (2019) shared the same observation in his EFL classroom in Saudi
Arabia. He stated that to some extent, teaching in the EFL writing classroom is
relatively challenging, yet it can result in students leading and directing their
own learning. Providing different types of feedback in the process of writing
enables students to acquire skills that improve their writing. In addition, stu-
dents are also prepared to be autonomous learners even if developmental errors
in their writing occurs (Al-wossabi, 2019; Lightbrown & Spada, 1999; Lyster,
Lightbrown, & Spada, 1999)
In my writing class, students’ questions and interest are of high value. I
always encourage them to ask questions before I introduce another lesson
or activity. I conducted individual and group consultations to maintain
communication and bridge the gap between teaching content and task. A
classroom environment where teachers have a dialogue with students help-
ing them construct knowledge on their own made me realize that forging
good relationships with them can help establish an effective classroom
environment where teachers’ role is rooted in negotiation that facilitated in
achieving purposeful learning.
Supporting students’ learning: Feedback design and

implementation
To support students at the FLS, I proposed the following feedback mechanism
to strengthen learning tasks and to remedy learning failures particularly in the
writing classroom, because the initial study and informal interviews I conducted
with my colleagues and students showed that FLS did not have a uniform
feedback mechanism and procedures. I presented the feedback model in one of
the professional development workshops at the FLS to gather suggestions from
my colleagues. Here are some of their comments and suggestions that helped
me frame a feedback structure that can be integrated into writing classrooms:
‘We need to ensure that all writing teachers receive training on how to
implement the suggested feedback mechanism’.
‘Student-level representatives should be invited to increase students’
awareness on feedback structure’.
‘Teachers must be given the chance to also explore other feedback types
that will work in EFL context’.
‘The feedback model should also be initially implemented in Level 1
writing course to determine its effectiveness and limitations’.
‘As members of the faculty, we must ensure that assessment literacy be
the focus of our seminar–workshops so we can initiate a timely feedback
mechanism, thus supporting assessment literacy across disciplines’.
The model below (see Figure 5.1) is developed and modified based on Hattie
and Timperley’s (2007) three major feedback questions. The design of this
feedback instrument integrated the results of focus group discussions I con-
ducted with my colleagues who are involved in the development of the writing
course. Program leaders, course coordinators, and student level representatives
met and discussed the suggestions and feedback from the students.
In the feed up stage, learning outcomes, class activities, assessments, and
marking criteria were explained explicitly to students. This helped them
understand the nature of the course and teachers’ expectations of their output.
The importance of learning outcomes was highlighted in the beginning of the
course, so that students can prepare for tests and other in-class requirements and
activities. Rubrics and other scoring guides were also discussed and made clear
for the students during the first week of the class. In fact, students were
involved in the development of the rubrics used in evaluating their work.
¥ Teachers inform students

¥ Students are informed on
of the learning outcomes,
context, nature of the
Feedback how they can improve
activity and assessment their work
procedures ¥ Teachers provide ¥ Teachers provide students
¥ Teachers and students feedback to students with context related
discuss the expected output through discussions, response
of class activities and how direct and indirect ¥ Learners' social harmony
their works are evaluated mechanisms
¥ Peer evaluation
Feed up Feedforward
Figure 5.1 Three Major Feedback Questions

Adapted from Hattie and Timperley, 2007
During the feedback stage, information was given to students regarding their
own progress and achievement. Here, students were asked to submit their written
work depending on the class activity every week. This was followed with the
revision of their work integrating the valuable feedback and comments from their
teachers and peers. Revisions at this stage involved two stages: teacher focus feed-
back and some suggestions using corrective feedback strategies. To understand
students’ learning experiences, and to determine what they can do and cannot do
in a particular task, students were asked to map their own learning using the ‘can
do statements’. These are statements which are rooted from the course learning
outcomes that provide students’ level of learning in the classroom. The results of
this feedback mechanism helped teachers improve teaching strategies and adjust
teaching tasks within students’ grasp. To add meaningful context to this mechan-
ism, reflective sessions were also conducted where students got involved in focused
group discussions, online discussions, and teacher consultations.
Lastly, feedforward is a discussion phase between teacher and students on
what they have achieved and how they can improve their own learning in the
future. In addition, I conducted a discussion on the possibility of how they can
move on to the next level of knowledge so as to help them to do better in the
next phase of learning.
The use of ‘can do’ statements

To enhance and strengthen the feedback mechanisms in my classroom, I also used
and integrated a formative assessment type that is popular in general foundation
programs and in language classrooms: the ‘can do’ statement. I designed this feed-
back sheet by rewriting intended learning outcomes as statements relating to what
students can do in a particular course or classroom. The purpose of this feedback
mechanism is to gather information relating to what students can do and cannot do
in a particular course. With the use of this sample student feedback, we can form
judgements and decisions on how we can improve teaching and learning. More-
over, determining what students can do is the primary concern of a delivery system,
affecting curricular sequencing, strategies, and suitability of teaching materials.
Apart from teachers’ improving their teaching and learning, students have
also the opportunity to know their strengths and weaknesses thereby helping
them to improve their skills and capabilities to achieve outstanding learning. To
make feedback productive and beneficial, students must be trained in self-
assessment classroom mechanisms so that they can understand the main pur-
poses of their engagement in the classroom, and what they need to do to
achieve purposeful learning (Black & Wiliam, 2010).
Below is a sample ‘can do’ statement feedback tool that I integrated into the
class (see Table 5.2). My students were asked to tick the column that corre-
sponds to their own evaluation and understanding of their learning goals and
targets. This feedback strategy helped students to chart their own progress as
part of reflective learning.
Table 5.2 Can Do Statements Feedback Sheet

Statements of performance I can do this well I can do this I cannot do this
with support
I can explain different context of
different kinds of professional writing
such as purpose, audience, medium
I can use and integrate the stages of
the writing process to develop,
organize, present ideas and
information in writing
compose concise, clear, well-focused
sentences that communicate
I can demonstrate proofreading skills
for basic grammatical and punctuation
errors
I can create and produce portfolio
showcasing different forms of
professional writing
I can write correct and effective
sentences
I can develop my portfolio following
the guidelines and instructions
I can identify sentence errors
I can organize good ideas and
information while writing paragraphs
I can edit my own writing
I can engage in group discussions and
feedback sessions
I can offer suggestions during
feedback sessions
Corrective feedback strategies

The corrective feedback strategies shown below were also adapted in cor-
recting students’ writing samples (see Table 5.3). This is one of the many
additions in the WRIT4111 course profile meant to strengthen student
engagement in the writing process. In my class, I combined some, but not all
of the suggested strategies, because teaching in this context depends on the
nature of the activity, the course, and the number of students. Although it is
obvious that the number of students can affect the delivery and the conduct
of feedback, teachers encouraged to implement strategies to help increase
students’ awareness of the feedback mechanism itself, the process, and their
own learning. Contextual definitions and examples are provided to ensure
that the delivery of its implementation reflects actual practice at FLS.
Table 5.3 Typology of written corrective feedback types.

Strategy Contextual definition Contextual examples
Direct Feedback is given by Student: My mother goes to the market
providing students with yesterday.
the correct form. Teacher: goes – went
Student: Amina is ate chicken in the
cafeteria.
Teacher: Amina is eating chicken
biryani in the cafeteria.
Indirect In this type of feedback, The teacher may underline or circle the
the teacher indicates the errors and ask students to correct them.
error, and informs students
of the areas that need
correction.
Metalinguistic Metalinguistic clues and The teachers used codes, letters, and
codes are used by the other symbols to inform students of the
teacher to indicate the error (e.g. mohammed – m – C
error. [spelling, capitalization])
The focus of The teacher provided Both strategies were used, focused and
feedback feedback by highlighting unfocused
some areas or specific areas The teacher may provide a sample
that need improvement or corrected sentences or paragraphs to
correction. inform students of the areas of
correction.
Electronic The teacher provided The teacher shared useful links or
feedback feedback on the errors hyperlinks, internet sources, and elec-
through emails, links, and tronic discussions for students to explore
other electronic sources in order to help them correct the errors.
that are helpful for students In my class, I used WhatsApp chat,
in order to correct and Moodle, and online discussions.
improve their work.
Adapted from Ellis (2009)
The corrective feedback strategies that I used in my class proved to be useful

particularly with foreign language pedagogy. I agree with Lantolf and Thorne
(2007) when they said that corrective feedback enhanced noticing forms of stu-
dents’ writing and prompted self-regulation. In the same context, other researchers
supported the use of corrective feedback because of its usefulness in helping stu-
dents understand meaning and form (Sauvignon, 2005; Ellis, 2005; Lyster, 2011).
It is also worth to mention here that I faced a number of challenges despite the
effectiveness of the integration of corrective feedback. Students continued to make
mistakes regardless of the amount of feedback and the number of consultations
conducted. There were researchers like Krashen (1982) and Ferris (1999) who
suggested that written corrective feedback could be limited to features that are,
respectively, simple and treatable. However, Ellis (2009) argued that none of the
suggested proposals is easy to implement. I agree on this point, and based on my

own experience, the integration of corrective feedback was challenging, though it
facilitated EFL students’ improvement. Asari (2019) made the same observation in
relation to her class in Japan: students realized the importance of corrective feed-
back and thus improved their learning.
Use of technology to support feedback in EFL classroom

Many studies have illuminated the positive effects of technology integration in
the classroom. Ogle and Beers (as cited in Borich, 2010) indicate that technology
use in the classroom can increase student engagement and motivation. Con-
temporary students technologically adept; therefore, they find it useful to learn a
skill or develop a strategy. Technology also can improve their reading and writ-
ing skills. In EFL classrooms, teachers supported the integration of technology
because it provides students with the opportunity to discover a new way of
learning that is different from books and other printed materials. In addition,
technology also expands students’ responses and collaboration. In the case of
electronic feedback in WRIT4111, it gave students autonomy in improving their
own work. Technology not provides engagement and supports collaboration; it
also expands students’ experiences, content knowledge, imaginations, and critical
thinking. Internet technologies provide students with a flexible media environ-
ment where they can easily respond. They feel empowered because they can
demonstrate their hypermedia skills. When these technologies are integrated into
instruction, they can support different aspects of cognition. They can also help
create an interactive and positive teaching and learning environment for students
and teachers (Swaffar, Romano & Arens, 1998).
At Sohar University, the design of online learning activities is built on
instructional objectives. Moodle is a good learning platform that helps teachers
and students create a positive learning environment that supports basic skills,
higher-level skills, inquiry-based activity, online discussions, and online learning
communities. The technology supported feedback, as shown below, offers an
opportunity for students to express themselves, especially in peer feedback ses-
sions. This also reduces the threat in EFL writing classrooms caused by face to
face discussions and students’ sharing of their reflections on their writing.
Process and progress

At the faculty level, some educational technologies were used in implementing
feedback in the writing classroom. I made use of Moodle where my students
could share their own evaluation of their writing. Online discussions were also
conducted to help them understand the lesson and other activities in class. For
example, I asked my students to submit their writing through Moodle, then I
provided electronic feedback sessions by sending them links to read, as well as
study sample forms, and types of writing to help them improve their work.
Aside from Moodle, I also explored other practical ways to transform feed-
back using technology. I created a WhatsApp group in my tutorial class with
a weekly chat or sessions that I called ‘Help & Receive Help’ (HRH). Stu-
dents in a group could send a message asking for help with improving and
correcting spelling mistakes, the use of correct words, and sentence con-
struction. My students were given a theme or topic every week, so that they
were guided in their participation. Words and sentence types were sent in the
WhatsApp group, then students offered help through sharing or providing
correct spelling, grammar, or sentence construction. There were some chal-
lenges with this, but I believe the practicality of this technology was an
opportunity that helped students improve their writing. Due to technological
advancement, the implementation of feedback has been widely supported,
especially on Internet chat sites where both teachers and students share
thoughts and ideas through electronic mail, bulletin board systems, and on
line discussion boards (Chen, 2016; Braine, 2001; Ware, 2004).
Here is some unedited student feedback regarding the use of technology in
the classroom (i.e. Moodle, an on-line discussion board, and WhatsApp):
‘Internet sources and computer use make me feel confident in writing’

‘Writing is difficult but with computers is good’
‘I am afraid of writing but the technology, internet and computer make
me not afraid’
‘My mister I like because he used computer and internet in class and lab’
‘My mister and me can work fast in editing because of computers’
‘WhatsApp chat helped me a lot in writing’
‘WhatsApp group chat is a good practice in the WRIT2214’
‘I like to participate in online discussions … my writing I think is
improved’
My students’ feedback regarding the effect of technology in the classroom is

not conclusive. However, based on my current practice, it shows positive
results with students’ engagement and cooperation, and with the develop-
ment of their skills, especially in the writing classroom. Like Ware (2004) in
his networked environment classroom, technology helped raise student aware-
ness, and increase writing skill, and communication. Warschauer (2002) noted
that less proficient students were more comfortable with technology as it
helped to create a non-threatening environment. To some extent, the current
situation of my Omani students mirrors this. Though some of them could not
write simple, correct sentences, they felt more comfortable with networked
classes and other forms of technology used in providing writing feedback. As
the students’ positive comments show, the use of technology made an impact
on their learning and fostered a connection between instructional goals and
pedagogical approach.
Conclusion and future directions

Effective feedback has been the focus of researchers and academic institutions
for some years now. Studies have shown that feedback and effective assessments
have made an impact on students’ learning and on the development of educa-
tional programmes. Motivating learners through feedback introduces them to a
culture of success where they think and believe that through active participa-
tion, they can enhance their knowledge, skills, and potential. Simple practices
of feedback can lead to significant improvement, most importantly, the
enhancement of assessment literacy. When teachers make students aware of
what they can do in the classroom, and what they have learned, students can
also improve their engagement in the process.
Students know the aspects of their performance that need to improve, and
they are away of opportunities to develop metacognitive awareness to know and
understand their own learning. In the case of language learning in an EFL con-
text, students need systematic and timely feedback as it creates and develops an
effective learning environment where they can claim ownership of their growth
and learning. Providing students with appropriate assessments can also affect
teaching, because good teaching involves more than developing curricular offer-
ings and programmes. Good teaching also values the continuous support of stu-
dents through monitoring their progress and achievement. The works of
Beaumont, O’Doherty, and Shannon (2011), Brown (2019), and Lee (2014)
provide excellent arguments about how feedback can be re-conceptualized by
focusing on feedback quality, strategies, and the contextual and socio-cultural
aspects of students’ work. Their studies will be of great help, as I would like to
re-examine feedback practices at the FLS to determine how we can explore new
ways to enhance assessment literacy and support students’ learning.
References
Al-Issa, A. S. (2016). Meeting students’ expectations in an Arab ICLHE/EMI context:
Implications for ELT education policy and practice. International Journal of Applied
Linguistics and English Literature, 6(1), 209–226.
Al-wossabi, S. A. N. (2019). Corrective feedback in the Saudi EFL writing context: A
new perspective. Theory and Practice in Language Studies, 9(3), 325–331.
Asari, Y. (2019). EFL teachers’ L1 backgrounds, beliefs, and the characteristics of their
corrective feedback. Journal of Asia TEFL, 16(1), 250.
Beaumont, C., O’Doherty, M., & Shannon, L. (2011). Reconceptualising assessment feed-
back: a key to improving student learning? Studies in Higher Education, 36(6), 671–687.
Berg, E. C., (1999). The effects of peer trained response on ESL students’ revision types
and writing quality. Journal of Second Language Writing, 8, 215–241.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Edu-
cation: Principles, Policy & Practice, 5(1), 7–74.
Braine, G. (2001). A study of English as a foreign language (EFL) writers on a local-area
network (LAN) and in traditional classes. Computers and Composition, 18, 275–292.
Borich, G. D. (2010). Effective teaching methods. (8th ed.) Pearson Education Inc.
Brown, G. A., Bull, J., & Pendlebury, M. (2013). Assessing student learning in higher edu-
cation. Routledge.
Brown, J. D. (2019). Assessment feedback. The Journal of Asia TEFL. 16(1), 334–344.
Brown, S. & Knight, P. (1994). Assessing learners in higher education. Kogan Page.
Budimlic, D. (2012). Written feedback in English: Teachers’ practices and cognition
[Unpublished master’s thesis, Norges teknisk-naturvitenskapelige universitet,
Fakultet for samfunnsvitenskap og teknologiledelse, Program for lærerutdanning].
Carson, J. G., & Nelson, G. L. (1996) Chinese students’ perception of ESL peer
response group interaction. Journal of Second Language Writing, 5(1),1–19.
Chen, T. (2016). Technology-supported peer feedback in ESL/EFL writing classes: A
research synthesis. Computer Assisted Language Learning, 29(2), 365–397.
Education Council of Oman (2017). Philosophy of education in the sultanate of Oman.
Ellis, R. (2009). Corrective feedback and teacher development. L2 Journal, 1(1), 1–18.
Ellis, R. (2005). Principles of instructed language learning. System, 33(2), 209–224.
Evans, N. W., Hartshorn, J., & Allen Tuioti., E. (2010). Written corrective feedback:
practioners’ perspectives. International Journal of English Studies, 10(2), 47–77.
Ferris, D. (1999). The case for grammar correction in L2 writing classes: A response to
Truscott (1996). Journal of Second Language Writing, 8, 1–10.
Hattie, J. (2009). The black box of tertiary assessment: An impending revolution. In L.
H. Meyer, S. Davidson, H. Anderson, R. Fletcher, P. M. Johnston, & M. Rees
(Eds.), Tertiary assessment & higher education student outcomes: Policy, practice & research
(pp. 259–275). Ako Aotearoa.
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational
Research, 77(1), 81–112.
Hillocks Jr, G. (1986). Research on written composition: New directions for teaching. National
Council of Teachers of English.
Hyland, F. (2003). Focusing on form: Student engagement with teacher feedback.
System, 31, 217–230.
Irons, A. (2007). Enhancing learning through formative assessment and feedback. Routledge.
Krashen, S. (1982). Principles and practice in second language acquisition. Pergamon.
Lantolf, J., & Thorne, S. L. (2007). Sociocultural theory and second language learning. In
B. van Patten & J. Williams (Eds.), Theories in second language acquisition (pp. 201–224).
Lee, I. (2014). Revisiting teacher feedback in EFL writing from sociocultural perspec-
tives. Tesol Quarterly, 48(1), 201–213.
Lightbrown, P. M., & Spada, N. (1999). Instruction, first language influence, and
developmental readiness in second language acquisition. The Modern Language Journal,
83, 1–22.
Lyster, R. (2011). Content-based second language teaching. In E. Hinkel, Handbook of
research in second language teaching and learning (Vol. 2). (pp. 611–630) Routledge.
Lyster, R., Lightbrown, P. M., & Spada, N. (1999). A response to Truscott’s ‘What’s
wrong with oral grammar correction’. The Canadian Modern Language Review, 55,
457–467.
McCord, M. B. (2012). Exploring Effective Feedback Techniques in the ESL Classroom.
Language Arts Journal of Michigan, 27(2), 11.
Noels, K. (2001). Learning Spanish as a second language: Learners’ orientations and
perceptions of their teachers; communication style. Language Learning, 51, 107–144.
Paulus, T. (1999). The effect of peer and teacher feedback on student writing. Journal of
Second Language Writing, 8, 265–289.
Peacock, M. (2001) Match or mismatch? Learning styles and teaching styles in EFL.
International Journal of Applied Linguistics, 11, 1–20.
Ware, P.D. (2004) Confidence and competition online: ESL student’s perspectives on
web-based discussions in the classroom. Computers and Composition, 21(4), 451–468.
Warschauer, M. (2002). Networking into academic discourse. Journal of English for Aca-
demic Purposes, 1(1), 45–58.
Sadler, D. R. (1989). Formative assessment and the design of instructional systems.
Instructional Science, 18(2), 119–144.
Sauvignon, S. J. (2005). Communicative language teaching: Strategies and goals. In E.
Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 635–651).
Mahwah.
Swaffar, J., Romano, S., & Arens, K. (1998). Language learning online: Theory and practice
in ESL and L2 computers classroom. Labyrinth Publications.
Torrance, H., & Pryor, J. (2002). Investigating formative assessment. Teaching, learning and
assessment in the classroom. Open University Press.
Tsui, A. B., & Ng, M. (2000). Do secondary L2 writers benefit from peer comments?
Journal of Second Language Writing, 9(2), 147–170.
Chapter 6
Using checklists for developing

student teachers’ language
assessment literacy
Olga Ukrayinska
Introduction
By searching for efficient methods of developing student teachers’ language
assessment literacy (LAL), we have concluded that checklists can be a good
option. Thus, this chapter is dedicated to exploring the potential of
checklists to be exploited as a pedagogical tool in pre-service foreign lan-
guage (FL) teacher preparation. This research has been done as part of a
long-term project aimed at designing and further reshaping a course on
assessment for master’s degree students studying English and French as their
second language. Teaching the afore-mentioned course is iterative, but the
current study presents the results obtained after teaching it in 2018–2019 to
experimental groups at three universities in Kharkiv, Ukraine. The objec-
tive of the course is to develop pre-service teachers’ competency in
selecting, adapting, designing, and administering tasks for assessing bache-
lor’s degree students’ reading, listening, speaking, and writing skills, and
assessing their oral and written performance respectively. Owing to their
advantages, checklists have been a helpful instrument for teaching. The
chapter focuses on their functions, types, structure, and implementation,
aligned with the specific aim mentioned above. Some sample checklists are
presented.
This study commenced with a review of checklists available in the area of
language assessment. A qualitative study revealed that most checklists appear
to be evaluative and designed specifically for experienced test developers, in
that they abound with meta language and require profound knowledge of
procedure. This was followed by the adaptation stage at which we tried to
enhance the clarity of rubrics in items by simplifying them and fine-tuning
them to the specific focus of the study. This necessitated a more profound
understanding of the nature of checklists, namely the principles and
requirements of their development, structure, and types. For this we
addressed not only checklists and theory on their design relevant to lan-
guage assessment, but also other areas, with the intention of exploring their
potential for teaching purposes by applying an interdisciplinary approach.
Using checklists 85
We also relied on a practical approach to make the checklist instrument

more economical for university teachers to use in the classroom and to
personalized for students, in order to internalize procedures.
Theoretical background
At the present we can find few areas of life where checklists cannot be applied.
They are extensively used in household as shopping lists, in aviation for
ensuring pre-flight safety, in medicine for diagnostic purposes, in IT industry
for software product quality assurance, etc. The scope of their application is so
varied that enumerating all cases could be endless.
First introduced by Osborn (1953) as a simple tool in the form of a series of
comprehensive questions to encourage creative thinking in approaching complex
design tasks, checklists were meant to be used individually or in groups. The
questions relating to the point of focus should be answered one at a time to help
explore all possible ways to handle a problem.
In Multilingual glossary of language testing terms (ALTE, 1998) we find the
following definition: ‘A checklist is a list of questions or points to be
answered or covered. Often used in language testing as a tool of observation
or analysis’ (p. 137). As can be seen, this definition is fairly laconic and cannot
serve as a guideline for designing checklists. In order to have a better under-
standing of the phenomenon let’s turn to some other definitions in online
dictionaries hit by a Google search.
‘A checklist is a list of all the things that you need to do, information that you
want to find out, or things that you need to take somewhere, which you make in
order to ensure that you do not forget anything; a list of things, names, etc. to be
checked off or referred to for verifying, comparing, ordering, etc.’ (Collins).
‘A checklist is a comprehensive list of important or relevant actions, or steps to
be taken in a specific order’ (WebFinanceInc).
‘A checklist is a list of items required, things to be done, or points to be
considered, used as a reminder’ (Lexico).
‘A checklist is a type of informational job aid used to reduce failure by com-
pensating for potential limits of human memory and attention. It helps to
ensure consistency and completeness in carrying out a task’ (Educalingo).
An overview of these definitions allowed us to formulate generalized points

describing checklists, i.e. their general characteristics (in order to avoid confu-
sion, a structural unit of a checklist is hereinafter referred to as an ‘item’):
a) Related to their structure:

A checklist is a list where items come below another.
86 Olga Ukrayinska
A checklist may comprise questions, statements, or just enumerate

some objects or activities (also in Wilson, 2013).
Enumerated items are logically connected (also in Scriven, 2000; 2007).
Items can be strictly (also in AbdelWahab, 2013, p. 56) or randomly
ordered (also in Scriven, 2000).
A checklist consists of a minimum of one column with items to cover
requiring action, and it may have another column on the right either
for check-off marks or for answers (also in Mukundan & Nimehchi-
salem, 2012, p. 1128).
The length of a checklist is not formally restricted and depends on a
specific research goal (also in Scriven, 2000; Wilson, 2013).
b) Related to their content:
Inclusion of an item on a checklist should be well-grounded, and
underpinned with some theoretical rationale or practical need (also
in AbdelWahab, 2013, p. 58; Mukundan & Nimehchisalem, 2012,
p. 1128; Wilson, 2013).
A checklist can cover main aspects or specific details.
A checklist can refer to theoretical issues or physical actions.
c) Related to their functionality:
A checklist can be used for observation of some phenomenon, human
behaviour, or performance of some device or method, for analysis of
assumptions or obtained results, for ordering according to the impor-
tance or functionality, or for reminding and error prevention (also in
Scriven, 2000; Wilson, 2013).
d) Related to areas of their application:
Checklists can be applied in all possible areas.
Checklists can be used at any stage of the project, from planning and
elaborating, to rounding off and verifying (for example, Wilson, 2013).
Turning to context-specific definitions may shed more light on the nature of
the phenomenon. Fuentes and Risueno Martinez (2018, p. 27) defined (software)
checklists for evaluating language learning websites as follows:
… a checklist introduces a progression of inquiries or categories for

judgement, so the evaluator should give a response as a reaction to all
the information presented through the reviewing procedure. …
checklists … request a yes/no sign or a response along a Likert scale.
Others … additionally incorporate space for open-ended remarks after
particular prompts.
(p. 26)
Using checklists 87
Thus, we can add up another point to the application of checklists:
A checklist can be computer-assisted, or be published and filled in with a

pen or pencil.
We can also add three more points to their structure:
Items can be answered, marked as checked off, and/or do not require any
marking, but can be followed by a physical response.
The responses to the items may be gradable (using a Likert scale).
A checklist may have a third and even a fourth column or space below the
items for comments or remarks concerning some aspect(s) (also in
Mukundan & Nimehchisalem, 2012, p. 1129).
Further examination of definitions of ‘checklist, may be excessive for they

are similar to those we have already considered but are formulated in a slightly
different language.
As it has already been stated, checklists are widely employed. Evaluation and
observation ones are frequently used as an assessment tool in language assess-
ment/testing. Multiple studies reflect their usage to some extent, but in the
present chapter we will refer only to some of them to illustrate the most typical
tendencies. At the development stage, they guide item writers in evaluating the
appropriacy of the task and its components before the submission (Council of
Europe, 2011, p. 64; O’Sullivan, Weir, & Saville, 2002). Weir (2005) specu-
lated on the efficacy of using various checklists for test validation; however,
Fulcher (2015) questioned their practicality due to their being complex, and
the whole procedure being tedious and meaningless at times. O’Sullivan, Weir,
and Saville (2002) also used checklists for development and validation of
speaking tests but remarked that evaluators need training for such usage.
Bachman & Palmer (1996) considered numerous applications of checklists,
from selecting an existing test to logical evaluation of test usefulness.
Thus, this tool can assist in quality assurance at any stage of a single task or
of examination development by identifying necessary steps to be taken and
focusing on even the smallest details: test construction (timing for all the
procedures, weighting of sections/items, presentation of the test to candidates,
layout of input and items, characteristics of tasks [e.g. for listening: construct
elements, types, number, sequence, quality of items, rubric characteristics,
audio and video text characteristics (type, topic, duration, degree of authen-
ticity, the number of speakers, quality of the recording), number hearing the
recording, the way candidates should answer], expected responses), adminis-
tration (venue, conditions, organizations responsible for the exam provision,
people involved in administration of the exam), marking of the examination,
grading, and the examination results analysis and reporting. Mostly often
referred to are checklists developed by Bachman & Palmer (1996) and ALTE
(2001) as being fundamental and comprehensive.
88 Olga Ukrayinska
There is a common practice to use checklists for peer-, self-, and rater
assessment of speaking or writing. As Green (2013) remarks, they help focus on
important aspects of oral performance whilst being straightforward to use,
although he claims that using checklists deprives the process of its real-life
communicative value by concentrating only on the presence/absence of certain
elements. He also finds it possible to develop rating scales out of checklists.
Khalifa & Salamoura (2011) and O’Sullivan, Weir, and Saville (2002) pointed
out the potential of checklists to facilitate a comparison between different
speaking test formats. Cambridge self-evaluation checklists of different profi-
ciency levels are designed to help learners proofread and edit their pieces of
writing (Cambridge Assessment). Wu (2014) offers cognitive processing
checklists to test takers to self-report on their reading.
Checklists can also be used for providing immediate diagnostic feedback to
testees (Green, 2013).
In FL teaching a checklist is an instrument that helps FL practitioners evaluate
language teaching materials. Quantitative and qualitative teacher-made software
checklists are used to evaluate English language learning websites as additional
tools (Fuentes & Risueno Martinez, 2018, p. 27), and checklists are used to
evaluate strengths and weaknesses in an English language textbook (AbdelWa-
hab, 2013).
For assessing pupils’ reading and writing in a native language, Fiderer (1999)
designed a series of checklists. Though an assessment tool, these checklists could
be perfect guidelines for teaching purposes.
In classroom instruction checklists are considered pedagogical tools with vast
functionality: observation of students in the learning process, evaluation of
instruction, an assessment tool, and memory aids/mnemonic prompts (Dudden
Rowlands, 2007; Strickland & Strickland, 2000).
Observation checklists are lists of things to look at when observing either
individuals or a group of learners doing some activity at or after class (Strickland
& Strickland, 2000). They are employed in a search for remedies for teaching
some aspects or teaching specific groups of learners (Alberta Education, 2008;
Dudden Rowlands, 2007). They help to quickly gather information about how
well learners perform, about their strengths or weaknesses, and about their
learning styles. Normally they are written in a yes/no format and contain spe-
cific criteria. They can include spaces below or in the far-right column for brief
comments, which provide additional information not captured in checklists
(Alberta Education, 2008).
Teachers rely on evaluation checklists in collecting data about the efficacy of
their teaching methods, accuracy, appropriacy, and completeness of tasks done
by their students, assessing outcomes. They help teachers clarify what is indi-
cative of a successful performance (Fuentes & Risueno Martinez, 2018, p. 27).
Checklists meant as an achievement assessment tool should cover a specific area
taught in accordance with the curriculum.
Using checklists 89
Some researchers argue that when learners participate in designing checklists

to be further used by them, it can increase their understanding of new complex
materials (Dudden Rowlands, 2007; Green, 2013). Dudden Rowlands (2007,
p. 62, 66) points out that, though being used as additional tools, checklists
intensify students’ metacognitive processes by suggesting to them sequenced
operations and by providing metacognitive cues, which contribute to increasing
their confidence and independence. Checklists make instruction interactive by
making students active agents in their learning. They get to take the initiative
in controlling the process of doing a task. Moreover, with such a tool they can
evaluate their peers, and, thus, better understand the requirements (Dudden
Rowlands, 2007). To sum up, in language teaching, checklists can be used for
carrying out assessments, developing students’ learning strategies, and evaluating
the efficiency of methods and techniques applied by the teacher.
There is no unified classification of checklists. Various researchers have
determined a different number of types and have called them different names,
though there is some overlap. We can explain it, on the one hand, by the vast
area of their application and, on the other hand, by the very specific require-
ments of targeted contexts.
We have already mentioned evaluation and observation checklists used in lan-
guage assessment and language teaching. Evaluation checklists are a tool used for
the assessment of a product against some predefined criteria (AbdelWahab, 2013;
Dudden Rowlands, 2007; Wilson, 2013). AbdelWahab (2013, p. 56) further
subdivided them into qualitative checklists for profound studies, and more reliable
and convenient quantitative ones. For more precise evaluation Scriven (2000)
offers criteria of merit checklists (comlists) with items of a different significance, which
allows for scoring. Observation checklists or behaviour sampling checklists (Wilson,
2013) contain a list of behaviours/actions/activities that an investigator is to check
off during observation at a specified interval. Scriven (2000) assigned the common
name ‘diagnostic’ for evaluation and observation checklists in accordance with the
kind of conclusion made on their basis –evaluative or descriptive.
Checklists which are used only as reminders, and for which the order of actions
does not matter much or at all, are called laundry lists (Scriven, 2000). Checklists
which serve to remind of the order and range of steps to take are called operational
(Dudden Rowlands, 2007), procedure (Wilson, 2013), or sequential (Scriven, 2000).
With these checklists it is the order of steps that matters. They can serve to remind
students of necessary steps and over time students get used to taking them, even
with a different and more complex task (Dudden Rowlands, 2007). Iterative and
one-shot checklists are variations of this type; they still prescribe some order of steps,
but these steps might be taken multiple times until an item is completed or referred
to only once (Scriven, 2000; 2007). Checklists which aim to remind the user of
the range of elements to include in a task/product are called requirement (Dudden
Rowlands, 2007) or feature checklists (Wilson, 2013). They tend to have a more
complex structure in order to include numerous requirements, which are to be
reflected in the final product.
90 Olga Ukrayinska
Entry/exit checklists (Wilson, 2013) relate more to the stage of their appli-
cation, thus, they are used either at the beginning or at the end of the research
or development process to evaluate a product’s degree of readiness for
submission.
Research checklists (Wilson, 2013) have a very specific area of application.
They are meant for researchers to use for the evaluation and/or review of their
own or other’s research.
Gawande (2009) differentiated two types of checklists that depend on how
experienced the respondent is: a) DO-CONFIRM checklists, which are relied on
by experienced people after doing something to make sure all the necessary steps
have been taken; b) READ-DO checklist, which is used by the less experienced
or unexperienced, who do what an item says before proceeding to the next item.
All the types mentioned above are meant to be mnemonic devices (for
example, Dudden Rowlands, 2007; Scriven, 2000; Wilson, 2013), and it is not
their only common feature. As we could see, they all have more in common.
We found these classifications rather confusing due to their ambivalent nature.
In fact, all the procedures described above require both observation and eva-
luation based on some standards/criteria that arranged in a specific order,
though following that order is not crucial.
As we could see checklists have a number of advantages but as well, they also
have a number of limitations. Summing up the theoretical findings on check-
lists, and based on our own experience, we recognize the following
advantages:
Checklists:
Are fairly easy to develop and use (also in AbdelWahab, 2013; Scriven, 2000).
Can be detailed, though short and meaningful.
Allow for exercising a high degree of control by the teacher.
Help the teacher to evaluate, not only a particular student, but also other
students, as well as the results of students’ interaction.
Can be aligned with particular tasks (also in Dudden Rowlands, 2007;
Wilson, 2013).
Allow students to self-monitor their own progress (also in Dudden
Rowlands, 2007).
We regard the following as disadvantages of checklists:
Designing checklists can be time-consuming if done for the first time and
not based on a justified theory framework.
Using printed versions of checklists requires extra material resources.
There is a necessity for double checking as respondents can tick the boxes
without reading the items (also in Scriven, 2000).
Using checklists 91
In any case, checklists should be used only as an additional tool for teaching
and assessing, supplementing other time-proven and well-reputed tools. They
need to be validated to be efficient (for example, AbdelWahab, 2013;
Gawande, 2009; Scriven, 2000).
Gawande (2009) insists on keeping them short in order to fit the limit of
working memory, with simply worded items, and without unfamiliar language.
Judging by these general characteristics, checklists can undoubtedly contribute
to achieving our overall goal of developing student teachers’ language assessment
literacy. However, their efficient implementation in the language classroom
required profound determining characteristics more specific to the context of
classroom learning. Meanwhile, this issue seems neglected by researchers, despite
the evident value of the tool under consideration. There are very few well-
grounded theoretical works, and limited practical evidence dedicated to the use
of checklists by university teachers and students. For that, we revisited the studies
from varied disciplines.
On the basis of the generalized characteristics of checklists presented above,
we compiled the ones specific for our teaching context (See Characteristics of
checklists in Appendix A). These characteristics may serve as requirements for
designing checklists in order to develop LAL pre-service.
Research problem
One of the objectives of the study was to determine each aspect of classroom
assessment’s degree of importance in order to develop a checklist to cover it.
Experimental teaching was used in the checklists’ development process, which
tested the practical significance of each checklist in order to decide which points
should be included into the final checklists and to validate them in further
experiments. First, we are going to illustrate the preparatory steps that should be
taken when adjusting to a particular local teaching context. In fact, to start with,
we need to describe the context. As it was mentioned above, the objective of the
course is to develop master’s degree students’ language assessment literacy (com-
petency). We consider LAL to be part of their assessment literacy, an integrated
competency, which is the cumulative result of learning different disciplines. By
LAL we mean master’s degree students’ ability to perform assessment and evalua-
tion of bachelor’s degree students’ communicative competency, which includes:
planning classroom assessments, preparing assessment materials (selecting or adapt-
ing tasks from respected sources meeting the standards, designing tasks or their
complexes), administering assessments, rating oral and written performance,
marking and scoring papers, reporting the results to students and other stake-
holders, and providing feedback on the obtained results. The course is not meant
to instruct students on test design, calibrating items, validating tasks, or anything
involving repetitive piloting or the mathematical statistics typical of large-scale,
high-stake tests. Moreover, students are supposed to learn about traditional and
alternative methods of classroom assessment.
92 Olga Ukrayinska
The course comprises interactive lectures, seminars, and individual assign-

ments. At the beginning of it we carried out a baseline assessment. The results
of this assessment helped us outline the inclusiveness of the content. As the
students studied the discipline FL teaching at secondary schools, and had pedago-
gical practice at schools in the 4th year, they were familiar with basics of
assessment; however, only one seminar was allocated to this topic. That meant
that the learners required more training in the preparation of assessment mate-
rials, and practice in carrying out assessments.
Lectures and tutorials lacked practical guidance and appeared to be too
abstract and theoretical for the students to digest at their still young age.
Although the level of their mental processes is already high, and perception,
reasoning, and generalization skills are developed, they showed low motivation
in acquiring the targeted competency as they had difficulty seeing the practical
value of it; they mainly relied on Teacher’s books. The students hadn’t
experienced the need for developing assessments for any group of learners, thus
they missed out on the experience of problem solving related to it. Conse-
quently, we decided to help them with information processing while providing
them with valuable practical experience.
The main idea was to involve them in close-to-real-life activities which a
university teacher normally does to perform assessments. To implement the
concept, we had the students interact with the 1st–4th-year students during
practical training at the universities (called ‘pedagogical practice’) and during
their learning process, which required them to allocate time for interviewing
younger students, piloting tasks and scales, and practicing marking, rating and
giving feedback.
Since we had limited time and material resources, there were not so many
piloting opportunities. The students piloted their quizzes, tasks, and scales
only once and with only three representatives, which had a negative impact
on the quality of their final products but undoubtedly was better than none.
It was necessary to have all the products evaluated at various stages of their
development, thus the workload was immense, and it was necessary to opti-
mize the learning process and make it cost-effective but not at the expense of
quality. It is common practice to build language teachers’ LAL with the
application of brainstorming, discussions, tests (gap fill, matching, MCQ,
True/False, ordering, table completion), quizzes, and creative tasks (for
example in Tsagari et al., 2018), but our experience proved that this is not
enough to develop students’ LAL. To sum up, we were seeking a method
that would help us to meet our goal and fit the context.
Having limited time, scarce material resources, and a big workload, along
with the program requirements and the students’ limited teaching experience,
we came up with the idea of extensive use of checklists due to the large scope
of their advantages and the need to standardize the process of learning. How-
ever ready-made checklists did not meet our needs as they included items
concerned with skills the students were not supposed to develop, and did not
Using checklists 93
include items concerned with elements that were key for our students. In some
cases, we failed to find any checklists, such as for text adaptation or the self-
evaluation of formal letter (See Self-evaluation: Formal letter checklist in Appendix
B for the specifics of the concrete task).
Rationale
Working with ready-made checklists, we finally decided on assigning students to
only respond to those items that applied to our specific situation, and to add issues
that we considered important, but that had not been addressed in existing checklists
for item writers, test designers, interlocutors, raters, and markers. For example, the
questions related to audio text recordings or the equipment used to play them were
eliminated from the Listening task checklist as the students were not engaged in audio
and video recording production. They responded to items addressing the use of the
internet, but not all of those related to equipment, as they did not have a choice of
equipment, apart from their mobile phones. The question ‘Is the language of the
rubric/item/option accurate?’ needed specifying as the students did not understand
what they were expected to concentrate on, and just ticked the box when evalu-
ating their own and the other students’ tasks. We found that the grammar mistakes
the students most typically made were failing to agree subject-predicate, missing out
the indefinite article when needed, and not sticking to the word order in questions.
This is why we introduced three subcategories for this item:
The language of the item is grammatically accurate when:
a) The verb is used in the third person singular (with -s) in the Present
Simple.
b) There is an auxiliary verb after the question word.
c) There is a/an before singular countable nouns.
Checklists should cover all necessary steps that have to be carried out when
selecting, adapting, and designing a task, but at the same time should be practical
and realistic tasks for our students. Bearing this in mind, we thoroughly considered
items to be deleted in order to minimize the amount of reading by students (there
was a risk they would overlook some points when bored, tired, or confused), and
to maximize their success with using checklists. Another option was to break
down some longer checklists into separate checklists. For instance, we did not offer
the use of Speaking production assessment checklist at the initial stage of learning, and
students filled in Speaking production assessment checklists (for grammar/range of voca-
bulary/coherence/cohesion/pronunciation/fluency) one at a time while studying the
same oral performance sample. Our rationale was that it is still a challenge for
students to assess all necessary aspects simultaneously in one oral performance.
Therefore, they were trained to assess one aspect at a time using a corresponding
checklist. The list of all the checklists designed in line with the course is presented
in Appendix C.
94 Olga Ukrayinska
In order to eliminate any factors that might hinder success of developing

LAL in classroom we needed to identify possible pitfalls in advance and plan
corresponding preventive measures. We piloted initial versions of checklists
that helped us assess their possible impact on the teaching and learning pro-
cess. This stage included having the students fill them in while designing
needs analysis quizzes, tasks for assessing communicative competency, and
rating scales, and while assessing samples of oral and written performances,
doing peer- and self-evaluation, and evaluating their English or French
teachers’ assessment practices. After that, we evaluated the quality of the stu-
dents’ products and interviewed them about how helpful or misleading the
checklists were. It was important to monitor the application of the checklists
in order to check if everything was being done according to the plan. It
turned out that apart from the factors enumerated above, the students found
some terminology to be confusing. Due to that in making checklists we
decided to use only those terms included by us in the Glossary of language
assessment terms provided to the students at the beginning of the course, and
included in exercises aimed at drilling the essential terminology.
A teacher’s oral instructions do not standardize a procedure to a desirable
extent, as the students may miss something important. In a situation such as
this, the potential of checklists is vast. A teacher may ensure the quality of
products by not only evaluating them themselves, but also having them eval-
uated by students, as checklists can enumerate necessary steps and facilitate the
peer-evaluation of tasks. Checklists can not only be used by a teacher for eva-
luation but also by students as guidelines showing them what to focus on and
thus developing their attention. Standardization is ensured by giving the same
checklists to all students, which they fill in and exchange with three more
groupmates, and finally submit to the teacher, accompanied by the corre-
sponding product. Checklists are done and submitted electronically which
makes it cost effective and quick, though not necessarily convenient to be read
while processing another text on the screen. We used Telegram, an instant
messenger, chosen by the students themselves.
Method
Quantitative and qualitative methods were used for collecting and analyzing
the data related to the efficacy of using checklists in the learning process. This
chapter discusses the results based on the application of the qualitative method.
The applied research methods include systematization of fundamentals of
checklist design and implementation, critical analysis of the checklists and the
products made by the students with their help, simulation of quizzes/tasks/
scales development, and the piloting and application processes at the uni-
versities. Practical methods are qualitative analysis of the items in the checklists,
namely their inclusiveness and clarity.
Using checklists 95
The data presented in the present study were collected through direct
observation and through qualitative analysis of the tasks by the students having
used the tailored checklists. The sample of participants included more than 150
students in three universities in Kharkiv, Ukraine. The participants were those
students who performed the role of item writers, raters, markers, and those
who contributed as assessees.
The sample using the elaborated checklists consisted of, on average, 21-year-old
students with practically no teaching experience.
Results and discussion

In order to design our own materials, we analyzed a number of ready-made
checklists designed for observation and evaluation purposes (ALTE, 2001; Bach-
man & Palmer 1996; Cambridge Assessment, etc.). The analysis revealed some
typical features, which we further exploited in our checklists. Apart from the
content, we experienced the need to clarify the structure requirements we should
rely on. Thus, we found that items can be expressed in a number of ways:
Sentence fragments: Missing subjects with predicates expressed with the

verbs either in the Present Simple or Past Simple: Describes classroom objects;
retells a film; confused by words with multiple meaning.
Modals: May display inconsistency in using …; may hesitate often when speaking;
I can confirm that an item has one correct answer.
Noun phrases: Frequent substitutions for words; weighting, number of tasks.
Gerunds: Distinguishing fact from opinion; scoring; repeating words.
Sentences: Every item is numbered, and option is lettered; the item as a whole
measures the objective.
Questions: Yes/No-questions (Have you developed the topic and provided
details about all aspects of the task? Is the language of the text at an ambiguous
level?), ‘Wh-questions’ (What is the purpose of the task? How adequate are
rubrics?).
We also decided to add the option ‘Imperatives’ (Find and circle…; Check off
the presence of the KEY; Ask a question to encourage the student to speak), as our
checklists are meant to be guidelines and this format fits this context.
In some checklists there is either a space or an extra column for comments,
explanations, and/or references. For all task evaluation checklists to be filled
in by the student’s groupmates, we provided a space for their comments in
case they didn’t check something off or found it inappropriately done. More
complex checklists are normally divided into sections to classify phenomena
under analysis. We found this practice useful as it was a way to attract stu-
dents’ attention to particular things and keep them concentrated on these
subcategories (e.g., Item: Stem, options).
96 Olga Ukrayinska
As we have discussed, there are several ways to make checklist. We suggest

that checklists for developing LAL require ticking, choosing Yes/No, or
answering (e.g., for Assessment by teacher observation checklist). Our decision was
not to use Likert scales in checklists for the time being as this might have
reminded students of rating scales, customized versions of which were already
in use by students. Checklists done by the 5th-year students were not anon-
ymous as the teacher knew the author of the pieces for the purposes of eva-
luation. Checklists offered by the students to younger learners were
completed anonymously so as to not put undue stress on the respondents.
The data collected were analyzed by students and were not subjected to
evaluation by the teacher. AbdelWahab (2013, p. 59) claimed that many
scholars offer checklists with generalizable criteria, but to achieve better
results checklists must be tailored to the needs and wants of students in a
particular context. Therefore, in order to increase the efficiency of the
checklist application, we personalized them (i.e. tailoring them to respond to
particular tasks), which contributed to a better understanding of the processes
by the students. There were checklists to review different task formats, to
evaluate oral/written performance elicited by concrete tasks.
Another challenge we faced was the classification of checklists. There was
a great number of checklists and it was critical to identify the purpose for
which each of them should be used. Hence, we classified them according to
their function, but we had to assign double names: operational guidelines
checklists, reminder submission checklists, evaluation selection checklists (for
evaluation of selected input), evaluation quality assurance checklists (for task
evaluation by a student item-writer), evaluation review checklists (for task/
rating scale reviewing), evaluation observation checklists (for rating oral and
written performance), and evaluation error prevention checklists (for self-
evaluation of written performance). It should be mentioned that the
checklists we developed for students studying French differed from those for
students studying English, as there were some test formats specifically used
in French tests, including those where grammar and vocabulary were the
focus in the rating scales. The students’ end-of-course reflections indicated
increased autonomy in task selection, adaptation, and design, and in rating
scale development and evaluation. They also indicated increased motivation
as students witnessed the practical application of the acquired knowledge
and skills. Furthermore, we observed the positive effect of close-to-real-life
professional collaboration on knowledge construction processes. Experi-
mental data collected during the experimental teaching testified that
checklists can be effectively used for teaching purposes, and for developing
LAL in particular, though the process of their design is fairly demanding
and time-consuming.
Using checklists 97
Implications
Having experimented with checklists and obtained more than satisfactory
results, we are now in a position to claim that this pedagogical tool is suitable
for use in developing LAL pre-service due to its qualities of being straight-
forward, gradable, and flexible. We have determined the practical value of
checklists, though we have not yet explored their full potential. The pre-
sented general characteristics, and the description of the structure, functions,
and ways of using checklists may serve as a foundation for further researches
in this area. Despite the limited scope of this study, our findings can provide
broader implications for university teachers and teacher trainers. These indi-
viduals can further develop and experiment with checklists reflecting their
particular instructional context and the intended outcomes. Checklists are a
practical tool for teachers to track their students’ competency. As we see it,
developing LAL with the help of a checklist is an integral part of the assess-
ment literacy development of preservice teachers. This is an opportunity for
them to develop their professional competency, and to create partnerships
with teachers and younger students. Moreover, students can learn to design
checklists themselves and will thus acquire this tool as their learning strategy.
The results specified in checklists will facilitate their reflection on learning. It
is crucial for future teachers to get real, pre-service teaching practice while
they still have an opportunity to make mistakes, to learn, and to get remedial
help on the spot, which will prevent future problems.
Conclusion
The present chapter was dedicated to the study of the theoretical grounds and
practical application of checklists for developing student teachers’ language
assessment literacy. Theoretical findings and ready-made checklists were revis-
ited with the view of adapting them to the targeted learning context and our
specific goal. We outlined the general characteristics of checklists and fine-
tuned them. This chapter reports the findings of the experimental teaching of
students doing their master’s degrees. The results obtained make us believe that
the use of standardized checklists helps make the learning process more effi-
cient, accurate, and reliable. Checklists provide a practical guide for tasks and
rating scales design, and they indicate what constitutes each process based on
target learners’ characteristics. These standards can be further refined and
applied for other particular tasks. The chapter presents a sample of customized
checklists to enable university teachers and other experts to evaluate them. The
use of checklists is expected to make the teaching and evaluation process more
efficient, accurate, and reliable. Future studies will focus on the reliability and
validity of the elaborated checklists and present statistical evidence of their
efficiency.
98 Olga Ukrayinska
Appendix A: Characteristics of checklists for developing

student teachers’ language assessment literacy
Structure
Checklists have minimum of two columns and maximum of six columns. Some
have spaces for comments. The number of columns depends on the number of
people using the checklist. Thus, checklists with two columns are to be used by
individual students only or by the teacher. Checklists with three columns are to be
completed by the student and by the teacher evaluating their product or the
appropriacy/accuracy of the observation with the teachers’ comments below the
table. Checklists with four columns are to be filled in by the student and three other
groupmates, evaluating the product of the student with the student’s comments
included below the table. Checklists with five columns are to be used by the stu-
dent and three groupmates, and checklists with six columns are to be completed by
the student, three groupmates, and the teacher. The patterns vary depending on the
function of the checklist. If meant to guide, then they are completed by only the
student themselves. If used for evaluation and quality assurance, then they are filled
in by other people and are followed by their commentaries.
Items come each one below others.

Different marking for checking off can be used (ticks, pluses, minuses,
points). An item can be marked if the element is present in the product or
the necessary action is done, if the element described in the item is not
present but should be added, and if the element is present but not appro-
priate and should be changed. In checklists filled in by other people, a
checkbox can be marked only if the element is present and appropriate.
Hence, the patterns are Present/Absent, Present and Appropriate/Present
and Inappropriate, Complete/Incomplete.
Checklist may comprise nouns, phrases, Yes/No questions, and statements.
Enumerated items follow the order of steps to be taken by students and
can be grouped in categories.
Checklists should not be longer than two pages.
Student Student 1 Student 2 Student 3 Teacher

Item + – + – –
Content
The range and number of items should be well-grounded.
Items in checklists should be based on learning objectives and reflect stages
of task development. They should be standards specific for the context,
Using checklists 99
characteristics of the task, and its constituents, and characteristics of oral/

written performance.
Items should be short, clear and not ambiguous, grammatically accurate,
and with a limited range of terms.
Functions
From a student’s perspective:
To guide the processes of task or input selection, task or input adaptation, item
writing, rating scales design and assessment of oral and written performance.
To remind the student of the range of elements to include and order of steps to take
to demonstrate the completeness of the task.
To evaluate the elaborated products (peer and self-evaluation).
To prevent errors in elaborated tasks and written performance
to encourage to reflect on the results (both successes and failures).
From a teacher’s perspective:
To guide students through the selection, adaptation, and design processes, and
to check the processes compliance.
To control the accuracy and completeness of steps taken by students.
To standardize students’ behaviour.
To remind students of the range and order of things to do.
To identify whether key steps have been taken.
To identify the presence or absence of conceptual skills.
To prevent errors.
To evaluate the results.
To know if students need assistance or further instruction.
To simplify complex tasks.
Types
Operational checklists are used to standardize assessment procedures.
Observational checklists are used by students to assess oral and written
performance.
Evaluative checklists are used to evaluate elaborated products for quality assurance.
Application
Checklists are used after the lectures are given, when students have been
familiarized with the terminology and basics of assessment procedures.
They are made use of throughout the course at the planning, develop-
ment, and administration stages.
100 Olga Ukrayinska
Students can complete a checklist either while designing some product or

assessing performance to READ-DO, or, after the task has been com-
pleted, to ensure DO-CONFIRM while they are engaged in specific
activities or processes.
Checklists are designed, filled in, and submitted electronically for the sake
of practicality.
Items are specified in line with particular groups of learners (i.e. years of
study) and particular tasks. Thus, students are involved in their modifica-
tion, though provided with the basic template.
Appendix B: Self-evaluation. Formal letter checklist

Task for the 5th year students: You have received a call to apply to the interactive
webinar on computer-assisted language learning. The number of participants is
limited. Write a letter of application.
Parameters Tick if present

Task achievement / Structure
Formal style:
No contractions (doesn’t, haven’t, isn’t, etc.)
No direct questions
Polite clichés (I would like to …, I would be grateful, I would
appreciate, If you could, etc.)
Paragraphing (at least three paragraphs)
Appropriate salutation (Dear Sir/Madam,)
Introduction:
Self-introduction
Referencing
Goal of writing
Main body:
Stating that participation is needed
Appropriate reasoning why participation is important
Closing lines:
Thanking
‘I look forward to your reply’
Yours faithfully, / Faithfully Yours,
Name, surname
Range of vocabulary
Using checklists 101
Parameters Tick if present

Adjectives: crucial / significant / essential / helpful / indispensable, valuable
(experience), up-to-date, grateful, demanding, rewarding, ultimate (at least
two from the list)
Verbs: contribute to, find smth. to be, to necessitate, to motivate, to involve
into, to meet (the needs), to facilitate, to keep up with, to enquire / inquire,
to respond, to acknowledge, to demonstrate (at least two from the list)
Nouns: course, advanced technologies, targeted learners, computer literacy,
requirements, in regard to / in reference to, significance, provision, relevance,
assistance, awareness (at least two from the list)
Adverbs: essentially, obviously, undeniably, undoubtedly
Range and accuracy of grammar
Range of structures: Present Simple, Present Perfect, Passive Voice,
indirect questions, conditionals, modals, Participles
Subject-predicate agreement(underline in single line all the subjects,
use double-line underline all the predicates, make sure you have both
of them, circle the subjects third person singular, make sure their
predicates have –(e)s if in the Present Simple)
Doubling(find verbs with the endings -ed and -ing, make sure you
doubled the final consonant if needed)
Indefinite article(find countable nouns in singular, use a line of dashes
to underline them, make sure you have the indefinite article in front
them if there is no any other determiner
Coherence / Cohesion
Linking words: moreover, furthermore, in addition, eventually, on the contrary,
meanwhile, similarly, likewise, therefore, consequently, the reason why, in par-
ticular, due to / owing to / on account of, provided (at least three from the list)
Logical development of ideas
Appendix C: The list of checklists elaborated for developing

student teachers’ language assessment literacy
Name of the checklist Application

Planning assessments stage
Needs Analysis Quiz Checklist (for students) to design a quiz for collecting data on
the needs and interests of students of the 1st-4th years
(for the teacher) to evaluate the students’ needs
analysis checklists
Defining the Construct Checklists: to define the construct for assessing targeted skills
for Assessing Reading / Listening in the stated context in line with the standards
/ Speaking / Writing (CEFR, curriculum, syllabus, specific teaching
practice), the learners’ needs and interests to
reflect them in the content and test formats
102 Olga Ukrayinska

Task Analysis Checklists: for (for students) to guide the analysis of ready-made
Assessing Reading / Listening / tasks taken from various textbooks, proficiency tests
Speaking / Writing past papers, tests made by Ukrainian teachers and
students for: a) a better understanding of task struc-
ture; b) learning to evaluate tasks in order to detect
drawbacks or understand whether they meet the
assessment needs in the given context(for the teacher)
to evaluate the tasks designed by the students
Developing assessments stage
Task Selection Checklists: for (for students) to guide the selection of ready-made
Assessing Reading / Listening / tasks meant to be used to assess the level of
Speaking / Writing achievements of the students of the 1st-4th years
(for the teacher) to evaluate the appropriacy of tasks
selected by the students
Task Adaptation Checklists: for (for students) to guide the analysis of ready-made
Assessing Reading / Listening / tasks meant to be used to assess the level of
Speaking / Writing achievements of the students of the 1st-4th years
and to introduce changes to meet the standards
and individual characteristics of the targeted group
of learners(for the teacher) to evaluate the appro-
priacy and accuracy of the students’ task
adaptation
Text Selection Checklist (for students) to guide the selection of an authentic
text to be further exploited for designing a task
for testing reading skills in line with the standards
(curriculum, syllabus, specific teaching practice),
the learners’ needs and interests(for the teacher) to
evaluate the appropriacy of the text selected by
the students
Input Selection (picture / photo, (for students) to confirm that a photo / diagram can
diagram) Checklist be exploited for designing a task for testing
speaking or writing due to its content, clarity and
quality(for the teacher) to evaluate the appropriacy
of the input selection by the students
Audiotext Selection Checklist (for students) to confirm that an audiotext is
appropriate to be exploited for designing a task for
testing listening skills in line with the standards
(curriculum, syllabus, specific teaching practice),
the learners’ needs and interests due to its content
and quality(for the teacher) to evaluate the appro-
priacy of the audiotext selected by the students
Video Selection Checklist (for students) to confirm that a video is appropriate
to be exploited for designing a task for testing lis-
tening skills in line with the standards (curriculum,
syllabus, specific teaching practice), the learners’
needs and interests due to its content and quality
(for the teacher) to evaluate the appropriacy of the
video selected by the students

Text Adaptation Checklist (for students) to guide the adaptation of an
authentic text so that it meets the standards and
individual characteristics of the targeted group of
learners to be further exploited for designing a
task for testing reading skills(for the teacher) to
evaluate the appropriacy and accuracy of the text
adapted by the students
Task Type Selection Checklist (for students) to guide the selection of an appro-
priate task type to assess the level of achievements
of the students of the 1st-4th years in line with the
standards and the characteristics of the pre-selected
input(for the teacher) to evaluate the appropriacy of
the task type selected by the students
Task Submission Checklist (for students) to make sure that all the necessary
information concerning the task, source is pro-
vided for exchanging it with three other students
and then submitting it the teacher(for the teacher) to
make sure that all the necessary information con-
cerning the task, source is provided
Speaking / Writing Rating Scale (for students) to guide development of rating scales for
Development Checklists assessing of particular speaking / writing tasks(for the
teacher) to evaluate scales designed by the students
Speaking / Writing Rating Scales (for students) to guide development of rating scales
Development for Peer-Assessment for assessing of particular speaking / writing tasks
Checklists done by a) students of the 5th year to learn about
the idea of peer assessment; b) students of the 1st-
4th
years(for the teacher) to evaluate scales designed
by the students
Speaking / Writing Rating Scales (for students) to guide development of rating scales
Development for Self-Assessment for assessing of particular speaking / writing tasks
Checklists done by a) students of the 5th year to learn about
the idea of peer assessment; b) students of the 1st-
4th
years(for the teacher) to evaluate scales designed
by the students
Administering assessments stage
Spoken Production Assessment (for students) to help carry out assessment of
Checklists (separate checklists for speaking of students of the 1st-4th years using their
each criterion: Task Achievement own tasks (scales are designed specifically for a
/ Content; Range of Grammar; particular task)(for the teacher) to evaluate the
Range of Vocabulary; Coherence accuracy of the assessment done by the students
/ Cohesion; Fluency; Pronuncia-
tion / Intonation and one check-
list with all the criteria included)
Spoken Interaction Assessment (for students) to help carry out assessment of
Checklists (separate checklists for speaking of students of the 1st-4th years using their
each criterion: Turntaking (Initia- own tasks (scales are designed specifically for a
tiveness) / (Responsiveness) / particular task)(for the teacher) to evaluate the
Interactiveness and one checklist accuracy of the assessment done by the students
with all the criteria included)
104 Olga Ukrayinska

Written Production Assessment (for students) to help carry out assessment of writ-
Checklists (separate checklists for ing of students of the 1st-4th years using their own
each criterion: Task Achievement tasks (scales are designed specifically for a parti-
/ Content; Range of Grammar; cular task)(for the teacher) to evaluate the accuracy
Range of Vocabulary / Spelling; of the assessment done by the students
Coherence / Cohesion and one
checklist with all the criteria
included)
Written Interaction Assessment (for students) to help carry out assessment of writ-
Checklists (separate checklists for ing of students of the 1st-4th years using their own
each criterion (the same as for tasks (scales are designed specifically for a parti-
Production but with specified cular task)(for the teacher) to evaluate the accuracy
descriptors and one checklist with of the assessment done by the students
all the criteria included)
Spoken Production Self-Assess- (for students) to help carry out self-assessment of
ment Checklists (separate check- their speaking in order to understand better the
lists for each criterion and one idea of self-assessment(for the teacher) to evaluate
checklist with all the criteria the accuracy of the assessment done by the
included) students
Spoken Interaction Self-Assess- (for students) to help carry out self-assessment of
ment Checklists (separate check- their dialogue speaking in order to understand
lists for each criterion and one better the idea of self-assessment(for the teacher) to
checklist with all the criteria evaluate the accuracy of the assessment done by
included) the students
Written Production Self-Assess- (for students) to help carry out self-assessment of their
ment Checklists (separate checklists writing in order to understand better the idea of
for each criterion and one checklist self-assessment(for the teacher) to evaluate the accu-
with all the criteria included) racy of the assessment done by the students
Written Interaction Self-Assessment (for students) to help carry out self-assessment of
Checklists (separate checklists for their interactive writing in order to understand
each criterion and one checklist better the idea of self-assessment(for the teacher) to
with all the criteria included) evaluate the accuracy of the assessment done by
the students
Spoken Production Peer-Assess- (for students) to help carry out peer-assessment of
ment Checklists (separate check- their groupmates’ speaking in order to understand
lists for each criterion and one better the idea of peer-assessment(for the teacher) to
Spoken Interaction Peer-Assess- (for students) to help carry out peer-assessment of
ment Checklists (separate check- their groupmates’ dialogue speaking in order to
lists for each criterion and one understand better the idea of peer-assessment(for
checklist with all the criteria the teacher) to evaluate the accuracy of the assess-
included) ment done by the students
Written Production Peer-Assess- (for students) to help carry out peer-assessment of
ment Checklists (separate check- their groupmates’ writing in order to understand
lists for each criterion and one better the idea of peer-assessment(for the teacher) to

Written Interaction Peer-Assess- (for students) to help carry out peer-assessment of
ment Production Peer-Assessment their groupmates’ interactive writing in order to
Checklists (separate checklists for understand better the idea of peer-assessment(for
each criterion and one checklist the teacher) to evaluate the accuracy of the assess-
with all the criteria included) ment done by the students
Miscellaneous
Giving Feedback Checklist (for students) to guide students in giving feedback
to learners on the results of speaking / writing
tests(for the teacher) to evaluate the accuracy of the
feedback given by the students
Assessment by Teacher Observa- (for students) to guide students’ observation of their
tion Checklist English / French teacher assessments(for the teacher)
to evaluate the accuracy of the observation done
by the students
Materials for Submission Checklist (for students) to make sure that all the materials are
ready to be submitted to the teacher(for the teacher)
to make sure that all the materials are submitted
by the students
References
AbdelWahab, M. M. (2013). Developing an English language textbook evaluative
checklist. Journal of Research & Method in Education (IOSR-JRME), 1(3), 55–70.
Alberta Education. (2008). Assessment in mathematics: Assessment strategies and tools: Obser-
vation checklist. Alberta Education. http://www.learnalberta.ca/content/mewa/html/a
ssessment/observation.html
Association of Language Testers in Europe (ALTE). (2001). Resources – free guides and refer-
ence materials. Association of Language Testers in Europe. https://www.alte.org/Materials
Association of Language Testers in Europe (ALTE). (1998). Studies in language testing:
Multilingual glossary of language testing terms. (Vol. 6) Cambridge University Press.
Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice: Designing and devel-
oping useful language tests. Oxford University Press.
Cambridge Assessment. (n.d.). Checklist to improve your writing – Level C1. Cambridge
University. https://www.cambridgeenglish.org/Images/286979-improve-your-
english-checklist-c1.pdf
Collins. (2020). Checklist. Collins English dictionary. Collins. https://www.collinsdictiona
ry.com/dictionary/english/checklist
Council of Europe. (2011). Manual for language test development and examining. Council of
Europe. https://rm.coe.int/manual-for-language-test-development-and-examining-
for-use-with-the-ce/1680667a2b
Dudden Rowlands, K. (2007). Check it out! Using checklist to support student learn-
ing. The English Journal, 96(6), 61–66.
Educalingo. (2020). Checklist. Educalingo: The dictionary for curious people. Educalingo.
https://educalingo.com/en/dic-en/checklist
Fiderer, A. (1999). 40 rubrics & checklists: To assess reading and writing. Scholastic Inc.
106 Olga Ukrayinska
Fuentes, E. M., & Risueno Martinez, J. J. (2018). Design of a checklist for evaluating
language learning websites. Porta Linguarum, 30, 23–41.
Fulcher, G. (2015). Re-examining language testing: A philosophical and social inquiry.
Routledge.
Gawande, A. (2009). The checklist manifesto: How to get things right. Metropolitan Books.
Green, A. (2013). Exploring language assessment and testing: Language in action. Routledge.
Khalifa, H., & Salamoura, A. (2011). Criterion-related validity. In L. Taylor (Ed.), Stu-
dies in language testing: Examining speaking: Research and practice in assessing second lan-
guage speaking. (Vol. 30) (pp. 259–292). Cambridge University Press.
Lexico. (2020). Checklist. Lexico: The English dictionary. Lexico. https://en.oxforddictiona
ries.com/definition/checklist
Mukundan, J., & Nimehchisalem, V. (2012). Evaluative criteria of an English language
textbook evaluation checklist. Journal of Language Teaching and Research, 3(6), 1128–1134.
O’Sullivan, B., Weir, C. J., & Saville, N. (2002). Using observation checklists to validate
speaking-test tasks. Language Testing, 19(1), 33–56.
Osborn, A. (1953). Applied imagination: Principles and procedures of creative problem solving.
Charles Scribner’s Sons.
Scriven, M. (2007). Key evaluation checklist. https://wmich.edu/sites/default/files/atta
chments/u350/2014/key%20evaluation%20checklist.pdf
Scriven, M. (2000). The logic and methodology of checklists. https://web.archive.org/
web/20100331200521/http://www.wmich.edu/evalctr/checklists/papers/logic%
26methodology_dec07.pdf
Strickland, K., & Strickland, J. (2000). Making assessment elementary. Heinemann.
Tsagari, D., Vogt, K., Froehlich, V., Csépes, I., Fekete, A., Green, A., Hamp-Lyons, L.,
Sifakis, N., & Kordia, S. (2018). Handbook of assessment for language teachers. The Eur-
opean Commission.
WebFinanceInc. (2020). Checklist. In Business dictionary. WebFinanceInc. http://www.
businessdictionary.com/definition/checklist.html
Weir, C. J. (2005). Language testing and validation: An evidence-based approach. Palgrave
Macmillan.
Wilson, C. (2013). Credible checklists and quality questionnaires: A user-centered design method.
Elsevier.
Wu, R Wi-fen. (2014). Studies in language testing: Validating second language reading
examinations: Establishing the validity of the GEPT through alignment with the Common
European Framework of Reference. (Vol. 41). Cambridge University Press.
Chapter 7
An investigation into the

correlation between IETLS test
preparation courses and writing
scores
Students’ reflective journals
Fatema Al Awadi
Introduction
The use of the English language in the Arab world and United Arab Emirates
(UAE) has grown. This growth has had an impact on academic achievements
and studies. Hence, one of the requirements to be enrolled at a university in the
UAE is to have English language skills that meet specific standards. Students at all
levels must develop their English as they progress through their school years until
they reach the university level. Due to the rapid changes in education, testing
language skills has become important in measuring students’ language ability.
Several tests have been used as a tool to exit high school and proceed to higher
education, or to exit into a desired major at universities. Research has been
conducted on the effectiveness of these tests as a tool of language measurement
not only within the UAE, but also worldwide (Freimuth, 2014; Gitsaki et al.,
2014; Raven, 2011).
The International English Language Testing System (IELTS) test measures
students’ skills in reading, writing, listening, and speaking. This test may impact
students’ language usage and their performance of university or school tasks
(Ata, 2015). This test has been used as an entrance tool for higher education
institutions in the UAE, as well as for exiting foundation studies into specific
majors of interest (Freimuth, 2014). Despite all the data on the efficiency of
IELTS tests, this efficiency still differs from one culture to another and from
one student to another. This is because various conditions affect the imple-
mentation of the test, the test takers, and the band classification result. The
amount of test preparation by students is another factor that affects test results
(Gitsaki et al., 2014). Students at a particular level or from similar cultural
backgrounds might have common errors when taking the test, and these can be
revealed either during preparation courses, or by the band result (Ata, 2015;
Hughes, 2003) The IELTS test score has influenced the study plans of those
joining colleges and universities within EFL communities. As research has
stated, there is a correlation between test scores, students’ coursework, and how
108 Fatema Al Awadi
students employ English in educational tasks (Gitsaki et al., 2014). In addition,

IELTS test preparation courses help students develop writing skills as well as an
understanding of different discourses. The students learn how to write coherent
and cohesive texts, which enhances their IELTS writing and also writing in
other genres (Qin & Uccelli, 2016).

The foundation of this study consists of three different theories that contribute to
its theoretical scheme. The correlation of constructivist theory, systemic-functional
theory, and the theories of reflective practice and writing in education will be
discussed in detail individually (see Figure 7.1). One reason for choosing these
theories is that they are considered as major components in this field of study, and
they will support and explain the findings related to the correlation between
IELTS scores, preparation courses, and the writing of reflections. Furthermore,
EFL learners construct language through interaction, which gives social-con-
structivist theory achieves greater importance than the other theories. Both the
systemic-functional theory and the theories of reflective practice in education are
key components of the reflective writing, and the teaching written genres in pre-
paration courses, and of IELTS testing, which are the focus of this research.
Social-
Constructivism
Theory of
Language
An Investigation into the

correlation Between IETLS
Test Preparation Course
and IELTS Writing Scores
and Students’ Reflective
Journals
Theories of
Systemic-
Reflective
Fuctional
Practice in
Approach
Education
Figure 7.1 A chart showing the correlation between the theories in the theoretical
framework
An investigation 109
Constructivist theory is based on work by Piaget and Vygotsky, and it posits

that the ‘disequilibration process’ of previous knowledge and new concepts can
lead to a cognitive change in the language (Slavin, 2014). In this theory, learners
are viewed as builders constructing their knowledge through interacting with the
surrounding environment, people, and objects, resulting in the construction of
meanings, understanding, and interpretations (Freeman & Freeman, 2001).
According to researchers, Vygotsky’s work in the 1930s revealed that learners can
perform better with the help of a more knowledgeable person to reach the Zone
of Proximal Development (ZPD), which pointed out the significance of social
interaction in language learning (Berk, 2009; Brewster et al., 2002; Slavin, 2014).
Constructivists take their main foundations from Vygotsky’s theory, particularly
from the four key principles which are: ZPD, social interaction, mediated learn-
ing, and cognitive practice (Slavin, 2014).
The systemic-functional approach (SFL) to genre writing was first developed
by Halliday. It emphasizes the use of language within a genre to build meanings
in humans’ minds (McCarthy & Carter, 1994). This approach influenced the
understanding of how genre is implemented in teaching language and text
analysis (Bawarshi & Reiff, 2010). It reflects the link between the language and
social context of the genre, and it also focuses on how the language is
employed purposefully in chronological order (Hyland, 2007). Halliday’s study
represented a classification of seven functions used by learners, and stressed the
fact that learners use the language purposefully within the surrounding envir-
onmental conditions. Later, students might employ that language within strict
communicated contexts (Emmitt et al., 2003). Emmitt et al., (2003) points out
that ‘Halliday collapsed his original seven functions to three major ones that
became the basis for functional systemic linguistics; ideational, for the commu-
nication of ideas; interpersonal, for the expression of feelings; and textual, for
the relationships within a text’ (p.32). Despite the fact that the systemic-func-
tional approach focuses on language’s textual use, it also emphasizes how
meaning is communicated through the text. Similarly, it pays attention to how
that meaning is constructed within a genre with attention to how the learner is
using the language (Derewianka, 2000).
Reflective journals in education

Reflections are often used by teachers to evaluate teaching abilities, strategies,
and skills, and they enable educators to build on and criticize own abilities
(Kyriacou, 2007). Reflective journaling is based on reasoning drawn from the
teacher’s own practice and covers not only teaching, but also the learning
process and professional development of educators (Johnson 1999). Reflective
written journals offer a chance for teachers to identify solutions to recurring
problems and to recognize incidents. Dewey and Schon (cited in Zeichner &
Liston, 1996), were theorists who viewed and reframed reflective teaching and
teachers. Dewey (cited in Zeichner & Liston, 1996) viewed the catalyst of
110 Fatema Al Awadi
reflection to be the encounter with a difficult event, following which teachers

would provide an analysis of their experience, by revisiting it. Findings from
researches shown that Schon divided reflective teaching into two different
frames which are ‘reflection on action’ and ‘reflection in action’ (Johnson,
1999; Zeichner & Liston, 1996). ‘Reflection on action’ can occur before an
action or afterward, or it can happen unexpectedly so that it requires the tea-
cher to adapt the instructional method. Therefore, solutions can happen spon-
taneously, and the teacher does not have to plan in advance for them (Johnson,
1999). This dimension will be considered in the analysis of reflections and in
identifying students’ understanding of reflective writing.
Not only does reflective journal writing impact teachers’ professional devel-
opment, it also affects EFL teachers’ language development. Being a native or
non-native speaker requires reflection on language use, grammar, and functions
throughout the lesson (Brewster et al., 2002). In fact, when teachers are not
reflective, they are faced with issues in thinking critically about their teaching,
which might make them unable to find alternatives (Kyriacou, 2007; Zeichner
& Liston, 1996). Researchers have found that writing reflective journals can
clarify the link between the implementation of theory and practice (Kyriacou,
2007; Quirke & Zagallo, 2009; Spencer, 2009). In addition, reflective journal
writing should include teachers’ goals and their perspectives on their teaching
skills and instruction, which promotes an in-depth analysis and reflects back
teachers’ essential focus (Kyriacou, 2007; Quirke & Zagallo, 2009). Similarly,
reflective writers often try for accuracy in narrating events, as they are given the
freedom to determine which information to reflect on. Hence, teachers are
provided with opportunities to learn and improve their experience along with
their language (Burton, 2009). Thus, the research intends to use discourse
analysis to look at the correlation between the writing of reflections and factors
of language development in teachers.
Learners at higher education levels are required to take notes from lectures,
write academic essays, and manage referencing skills in English, all of which
leads to the concept of English for academic purposes (EAP) (Harmer, 2001).
EAP focuses on cognitive skills and developing language concepts to succeed at
a university level. This can be achieved not only through learning essential skills
such as reading, writing, speaking, and listening, but also by using critical
thinking skills (Wilson, 2016). With English being the language of instruction
for EFL students, it is meant to provide comprehensible input on the content
being taught, learned, or practiced (Buri, 2012). This concept of EAP, origin-
ally formulated by Krashen (cited in Buri, 2012; Lightbown & Spada, 2013),
shapes students understanding of the concepts acquired through the use of the
second language. Thus, in EAP, writing and reading skills are learned within a
context of interest, and employed further in a context of social experience,
rather simply learning them as isolated skills. Wingate & Tribble (2012) assert
that a learner will develop a critical language awareness through practicing
within a context related to their cultural background and learning experience.
One of the key elements in EAP learning is to understand ‘how’ to commu-

nicate ideas clearly to teachers and other students. As a result, students improve
their reasoning and their ability to discuss events and experiences (Asoodar et
al., 2016; Wilson, 2016). In addition, students in higher education are required
to draw resources, assumptions, and notes from the field of learning to develop
their arguments when using English in academic contexts (Krekeler, 2013).
IELTS in EFL context

The International English Language Testing System (IETLS) has become a
crucial topic of discussion among educators. Many researchers argue that
there are differences between students’ knowledge and their performance in
the IELTS test (Ata, 2015; Panahi & Mohammaditabar, 2015). There are
other factors which impact IELTS tests in EFL contexts beside knowledge
and language performance, such as gender, age, language motivation, and
interest (El Massah & Fadly, 2017).
Within a UAE context, IELTS has become an important test for students
joining higher education institutions, as it determines their level of study based
on test performance (Freimuth, 2014). Due to change in the economy and
English being a central language of communication, teachers have to keep up
and work on improving English language skills. This is the major focus of higher
education institutions in the UAE in terms of equipping EFL Emirati students
with English skills to meet the standards of the workforce and community (Git-
saki et al., 2014; Raven, 2011). This focus determines how EAP is taught to
college students and how they prepare for the IELTS test (Gitsaki et al., 2014).
Thus, a great deal of emphasis is given to academic writing in IELTS preparation
courses (Moore & Morton, 2005). In spite of the differences between preparation
components and academic writing courses at college levels, they all help to build
EFL students’ language skills (Gitsaki et al., 2014; Moore & Morton, 2005).
Educators teaching test skills and writing must pay attention to the strategies that
help students to write cohesive and coherent texts (Hughes, 2003). Many
researchers believe that the more preparation that is done, the more EFL learners
acquire and master the different skills, and the better they will perform in the
IELTS test (Moore & Morton, 2005; Panahi & Mohammaditabar, 2015). They
also believe that test markers and designed descriptors can affect the reliability of
tests, which preparations courses try to achieve as much as possible (Gitsaki et al.,
2014; Panahi & Mohammaditabar, 2015).
Problem and rationale of the study

Teaching and learning standards have been raised due to the rapid development
of the UAE education system and the growth of English language study in
different educational fields (Gitsaki et al., 2014). Therefore, students must be
able to meet those standards to progress through their studies. English as a
112 Fatema Al Awadi
means of communication within higher education institutions has become

more important; therefore, students must demonstrate and increase their abil-
ities in English skills (Freimuth, 2014). Higher education determines students’
enrolment and specific place using IELTS results. Recently, the relationship
between students’ performances in higher education and their IELTS band has
been a topic of debate and concern of educators in the EFL context (Gitsaki et
al., 2014; Panahi & Mohammaditabar, 2015).
College language standards are designed to meet the requirements of the job
market, and to ensure graduating students are able to use English effectively
(Moore & Morton, 2005). As part of students’ progression, special programs are
designed to improve students’ skills in English, as well as raising their bands in
all IELTS skills. This helps students to exit programs with the highest qualifi-
cations (El Massah & Fadly, 2017). Research has considered the importance of
IELTS preparation courses and the correlation between the amount of work
done in advance and the IELTS band students reach (Johnston et al., 2014).
Regarding my experience in the field of education, there is a demand for
IELTS preparation and high IELTS scores as programs such as education,
require future English teachers to demonstrate language knowledge. Students
in such a program must demonstrate the development of language skill
through coursework and assignments (Gitsaki et al., 2014; Johnston et al.,
2014). Teachers tend to use reflective journals as assigned tasks and they are
graded in terms of language ability. Preparing for IELTS and the correlation
between scores and the writing of reflections is a topic researchers are still
debating. This study attempts to investigate the correlation between an
IELTS writing band and students’ performance in reflections assigned at
higher education institutions. It also investigates how preparation courses for
Emirati students have affected the quality of EFL learners’ writing reflections.
This study aims to identify the correlation between IELTS band and the
development of Emirati college students’ reflections. It focuses on the analysis
of students’ journals reflecting on practicum experiences.
The main reason for writing this study is, though discourse analysis, to
investigate the correlation between IELTS scores and students’ production of
reflective journals within the Education department of one higher education
institution in the UAE. The study also examines the impact of IELTS course
preparation on students’ achievements and on their understanding of the
reflective genre. It will consider other factors revealed in the course of research,
and will investigate previous studies of IELTS exams, as well as of teaching
English skills to students at higher education levels. The study’s hypothesis is
that students with prior preparations for IELTS would perform better in writ-
ing reflective journals. A further hypothesis is that students’ differences in
IELTS bands would correlate with their level of achievement in writing
reflections. The research questions help to clarify the possible the correlation
between IELTS writing bands and preparation courses, and students’ reflec-
tions. The questions are as follows:
1 How can IELTS preparation courses impact students’ writing of reflective

journals?
2 What is the correlation between the IELTS writing band and the student’s
writing of reflective journals?
Method
Participants and context

The research was conducted in one of the UAE federal higher education
institutions, in the Education department within that institution. The process of
selecting participants for the study and identifying samples was undertaken in
the Education department, and was based on students’ different levels. Also, the
researcher ensured participants had two different IELTS scores when they
joined and that they were about to exit the program. The researcher also noted
that she had taught the participants. The participants were three female students
who were majoring as Primary Generalists teaching English, science, and math.
These students were selected from different program levels. Student A has
graduated, student B is in her graduation year, and student C is currently in the
middle of study. They all answered the interview questions in detail through
emailing the questions with answers and they agreed to share their reflective
journals and IELTS scores.
All samples were accessible by the researcher who works as an Education
Faculty member at the institution. The teaching and learning experiences
varied between the students according to their level and teaching practicum
experiences. The demographic data shows correlation between study levels in
the B.Ed. program, IELTS writing scores, and reflective writing (see Table 7.1).
Eight samples were collected from each student, except for student C, from
Table 7.1 The participants’ demographic data

Student Major Number of Graduation IELTS IELTS IELTS IELTS
Reflection Writing Writing Overall Overall
Samples Score Score Score Score
Before After Before Before
A B.Ed. 8 Yes 5.5 6.5 5.5 6.5
Primary
Generalist
B B.Ed. 8 No 5.0 5.5 5.5 6.0
Primary Final level
Generalist
C B.Ed. 7 No 5.5 6.0 5.5 6.0
Primary
Generalist
114 Fatema Al Awadi
whom seven samples were collected. The number of participants’ samples is

possibly a limitation of this study.
Research ethics required getting permissions from the Dean of Academic
Operations as well as the research participants. This enabled the researcher to
gain access to the resources needed for the study (Mills, 2014). The research
permission letter given to the Dean of Academic Operations to gain the approval
and get the needed data collection tools explained the research purpose and the
results expected from the study. This information was also verbally commu-
nicated to the Dean. Before the interviews with the participants were conducted,
they were given a verbal explanation and a brief written explanation via email.
Participants’ responses to the questions in this email were considered to be their
agreement to participate in the research. Similarly, participants’ permission to use
their journals, IELTS preparation course materials, and IELTS test scores was
verbally obtained after explaining how the materials would be used. As requested
by the participants, their identities were kept anonymous in interviews and
throughout the study. The letters A, B, and C will be used to refer to each one
of the participants. ‘A’ refers to the student who already graduated, ‘B’ refers to
the student who was at her last education level, and ‘C’ refers to the student who
was in the middle level. The investigation ensured confidentiality in order to
avoid any stress or embarrassment for participants as a result of their participation
and the sharing of the study’s findings (Creswell, 2012; Mills, 2014).
Findings
Genre and reflections topic vocabulary

The analysis of students’ reflections revealed they improved understanding of
the genre as they progressed through their study levels, which was presented
through the ‘topic vocabulary’ to indicate to the context of reflections and the
narrative genre. Since all the reflections were about the same general topic, the
practicum, it was noted that the three participants employed the topic words to
shape the register of the genre. For example, words such as ‘school’, ‘students’,
‘classroom’, ‘lesson plan’, ‘teacher’, and ‘lesson’ were frequently used in almost
all reflections. These topic words indicated the context the reflections discussed,
and it was noted that students A and C managed to use the topic words in a
more linked structure. However, student B developed the use of topic voca-
bulary in reflections as she progressed through the study levels in terms of using
linked, focused topic words related to the main idea of the context discussed.
For example, in level 7 reflections, which were about the verb ‘to have’ and
the family tree lesson, student B mentioned terms such as ‘lesson plan’, ‘plan-
ned’, ‘stage’, ‘teaching’, etc., which all represented the topic vocabulary of the
text and are linked together to narrate the concept. I assume that this devel-
opment was related to the increase of language level and performance in
writing.
As the students’ progressed through the levels in their education, it seems they
further understood the narrative history genre, which was represented through
the increased number of topic words to narrate events. The students used several
words to narrate their experiences to the reader and to represent the events’
sequence and time, and to reflect upon their own teaching performance. For
instance, student A in her level 1 reflection, used minor topic words to indicate
the time, such as ‘in the third week’. Meanwhile, she increased the use of
sequential narrating words when writing her level 7 reflection, such as ‘then’,
‘firstly’, ‘after twenty minutes’, and ‘in addition’. Students A, B, and C also
demonstrated an understanding of the purpose of writing reflective journals,
which included their opinions on their teaching, and which identified issues
along with providing recommendations (see Table 7.2). I assume that the stu-
dents developed an understanding of adding sequential topic terms to the narra-
tive as they proceeded in learning further aspects to improve the genre context.
In addition, in most of the reflections, students used an informal tone with some
formality on occasion. They employed the topic vocabulary and expressions to affect
the tone of the text. Some of these informal expressions were ‘I was delighted’, ‘I felt
happy’, ‘playing hangman game’, ‘relation with the student’ etc. On the other hand,
the students used words to increase the formality of the context such as: ‘rule viola-
tions’, ‘instructor’, ‘my mentor’, ‘special need student’, ‘school facilities’, ‘meetings’
etc. There was also frequent use of first person pronouns, which raised the level of
informality, as the writers were narrating the history of their practicum experiences.
Grammatical & lexical cohesion
Grammatical cohesion
With reference to the literature about the importance of the grammatical
cohesion in reflections in particular, and the narrative history genre in general,
Table 7.2 Topic vocabulary used by the participants in all reflections

Sequential topic vocabulary to narrate Topic vocabulary to represent the context and purpose
events of reflections
My last experience … Students
Firstly, … Lesson plan
Secondly, … Objectives
Thirdly, … Classroom
Every week … Worksheet
At the beginning, Activity
Planning
In my opinion …
I think …
I felt …
It would be better if …
116 Fatema Al Awadi
students A, B, and C developed the use of suitable tenses to enhance the

cohesion of the text. They based the texts on the use of past tenses, either past
simple as a part of the narrative genre, or past continuous to talk, for example,
about routines that occurred in schools in the discussed situation. Despite the
use of past tenses, students sometimes included present or future tenses to
mention a desire or a future decision in the upcoming lesson or plan (see Table
7.3 and Figure 7.6).
Another key feature noticed in reflective journals was the occurrence of con-
junctive cohesion, which includes conjunctions and adverbials. Each student in
the focus group used different conjunctions and developed further understanding
of their use as they went through the levels (see Appendices). Student B used
‘moreover’ in reflection 4 to add further information regarding an event that
happened at the end of the practicum week (see Figure 7.2). Meanwhile, student
C included several conjunctions in the first reflection for different purposes (see
Figure 7.2). She used ‘secondly’ to add a further incident at school, which was
Table 7.3 Pronouns used in some reflections as anaphoric reference

Student A
Subject pronoun Object Possessives Reflexive
pronoun pronouns
Reflection 4 I, She, They, It, You Her, Them My, Their -
Reflection 6 I, She, They, It Them Their, his, my -
Reflection 7 We, I, They, He, It Them Their, Its, My Themselves,
Myself
Student B
pronoun pronouns
Reflection 4 I, He, We, They, It Us, Me, His, My, Their -
Them
Reflection 5 I, You, He, They, It Him, Me, My -
Them
Reflection 7 I, They, It Them, Me Their, My -
Student C
pronoun pronouns
Reflection 2 I, She, They, It Me Their, My -
Reflection 3 I, It, They, We, She Her, Me, My, Their Themselves
Them
Reflection 5 I, She, They, It Her, Them Their, My -
Reflection 6 I Them, Me Their, My
Figure 7.2 Excerpts representing the use of conjunctions by students B and C
followed by ‘suddenly’ to indicate a sudden change in the event; she also used
‘after that’ to follow the sequence of what she narrated. Similarly, she added a
controversy in the event using ‘however’ to indicate that the event changed into
the opposite. In contrast, student A tended to use coordinating linking words like
‘and’ and ‘but’ more frequently in her reflection, and she used other conjunc-
tions less frequently than other students. In spite of this, her reflection pieces still
had a basic type of cohesion since the use of conjunctions was of either sub-
ordinating or coordinating conjunctions. This development in the use of con-
junctions showed that the students gained more language input, which led to
further understanding on the use and functions of conjunctives within the genre.
The existence of different types of references in the focus group’s reflections
showed the achievement of further cohesion. First, the analysis of the texts
revealed that the students used personal pronouns to refer to other participants
in reflections and to avoid repetition. This was clear through the use of ana-
phoric referencing using the 3rd person pronoun ‘he’, ‘she’, ‘they’ or ‘it’ (see
Table 7.3). To elaborate, student A in her level 4 reflection used the third
person pronoun ‘she’ to refer to her MST (Mentor School Teacher) in the
previous sentence. Student C used the pronoun ‘it’ to refer to the activity the
student wanted to choose. Similarly, student B used ‘it’ to refer to the noun
‘language’ in her level 6 reflection. Second, since students A, B, and C were
narrating their own experience, they used the first person pronoun ‘I’ to refer
to themselves as the main character in their own reflections (see Table 7.3).
Third, the students used possessive pronouns for the purpose of possessive
reference, such as the third person possessive ‘their’. For example, student A
used ‘their’ in reflection 7 to refer to the class that she will be teaching. Also,
student B used ‘their’ to refer to the students mentioned in reflection 4, which
was also used in student C’s reflection 2 for the same purpose (see Table 7.3).
118 Fatema Al Awadi
Furthermore, the focus group used the object pronoun ‘her’ to refer to a sin-
gular character in the story, as in student C’s reflection 3, where it was used as a
reference to a girl in the class. The plural form ‘them’ was used more often to
refer to characters in the narrative history genre, and thus referred to the ‘stu-
dents’ in almost all the reflections. Additionally, student A used the reflective
pronoun ‘themselves’ to refer to the ‘literacy class’ in reflection 7 as well as
using ‘myself’ to refer to herself in the same reflection. While student C used
‘themselves’ to refer to ‘one by one student’ and ‘their friends’, student B didn’t
make use of reflexive pronouns in her texts. This is also applied to the use of
the possessive (-s’, -‘s’) to relate things to the nouns in the writing, and this was
mostly found in student A’s reflections.
In addition, the students used mostly the determiners ‘this’, ‘that’, ‘those’,
and ‘these’ for referencing purposes, which in most occasions preceded the
noun (Figure 7.3). These determiners were mostly used by all students to refer
to nouns such as ‘class’, ‘lesson’, ‘strategy’, ‘student’, and ‘teaching practice’ etc.
Only student C used ‘those’ when she quoted a sentence from a book where it
was used to refer to ‘teachers who did not have high quality relationships’.
‘That’ was also mentioned in student C’s reflection 1 to point to ‘warning for
the second time’, and it was also used by student B in reflection 5 to refer to
Figure 7.3 Excerpts representing the use of determiners in reflections

the sentence ‘eyes on me’. Student A used ‘that’ to point to ‘a brilliant lesson
plan’ and ‘preparing a great lesson’ in reflection 3. This shows that the students
understood that the reflective genre is related to their personal experience in
practicum and employed the determiners to refer to the participants in the text.
Another feature of grammatical cohesion was the use of ‘article reference’.
Students used the definite article ‘the’ at several places in the text to refer back
to something that was introduced previously. Figure 7.4 shows that the stu-
dents used ‘the’ mostly when talking about ‘class’, ‘time’, ‘students’, ‘classroom’,
‘teacher’, etc., which all were introduced previously in their reflections and
were mentioned with the definite article ‘the’ since the reader was familiar with
the concept. This demonstrates the students’ ability to relate ideas and showed
links that were developed as they progressed.
Lexical cohesion
Regarding the lexical cohesion aspects found in the reflections, different items
were noticed. The first key device was the use of ‘word repetition’ as part of
tracking the readers’ attention along the reading, which was also mentioned as part
of key topic vocabulary use. Repeating vocabulary as in Table 7.2 helped to tie
together the ideas in the reflections, such as the repetition of the phrases and
vocabulary like ‘students’ learning’, and ‘teacher’ (see Figure 7.5). Student A
repeated ‘students’ learning’ which was the focus of the teacher during that lesson.
Similarly, student B, in her eighth reflection, discussed the importance of the
‘questioning technique’ in the classroom. Thus, she repeated words such as
‘questions’, ‘questioning techniques’, ‘answers’ and ‘guessing’, to keep the
attention on the same topic being discussed.
A similar example can be found in what student C discussed in her first
reflection – that she will observe her Mentor Teacher and consider her tasks
and responsibilities. Therefore, ‘role and responsibilities’ was repeated twice to
Figure 7.4 Excerpts showing the use of ‘the’ for referencing purposes
120 Fatema Al Awadi
Figure 7.5 Excerpt from student A’s reflection 6 showing the repetition of lexical
terms
emphasize the concept. Also, ‘teachers’ and ‘students’ were repeated a lot in the
reflection to lead the narrative (see Figure 7.6). Student A repeated words such
as ‘misbehaviour’, ‘violently’, ‘lesson plan’, and ‘attention’ as well, to emphasize
the importance of a lesson plan in controlling behaviour. This shows that stu-
dent A had a previous ability to tie the ideas together and link points through
‘word repetition’, while the other two improved this as they progressed.
Linked to this point, the students used another feature of lexical cohesion,
which was the synonym. It is true that the amount of repetition exceeded the
number of synonyms in the text, but still some synonyms existed in some
reflections of each student. Student C used ‘students’ rather than saying ‘boys
Figure 7.6 Excerpts representing word repetition used the participants

and girls’ (see Table 7.6). She also used synonyms of ‘teacher’ in the same
reflection like, ‘tutor’ and ‘instructor’. Another use of synonyms was in stu-
dent’s A reflection 1 when she wrote ‘happy – delighted’ in the same text, and
she used synonyms in reflection level 3, such as ‘misbehaviour problems –
misbehaviour actions’, ‘confident – self-assured’ and ‘kids – students – children’
which showed a variation in using different terms that meant the same (see
Table 7.4). Student B used synonyms in reflection 5 such as ‘involve – engage’
and ‘kids – students’ (see Table 7.5). However, she had an error at the end of
reflection 5 when using the synonyms ‘in my opinion’ and ‘I think’ as one
whole phrase at the beginning of the sentence ‘so in my opinion I think my
…’. Considering these data and IELTS preparation course materials, the use of
synonyms shows the students’ attempts to use a range of vocabulary which
could be due to increased language knowledge.
Students appeared to use the antonyms to show contradictory ideas under
similar topics (see Table 7.4, Table 7.5, & Table 7.6). For instance, student A,
in reflection 3, mentioned the following antonyms: ‘kids – adults’ and ‘nega-
tively – effectively’ to indicate classroom management strategies and their
effectiveness. As shown in Table 7.4, student A used ‘higher ability students –
lower ability students’ to show the differences between the levels, ‘together –
one student’ to demonstrate the differences in interaction patterns in class,
‘damaged – fixed’ and ‘the circuit was broken – the circuit must be connected’
to show how she explained and helped the students in the exploration to dis-
cover differences. There was more effective use of antonyms in student A’s
reflection 8, which added a different type of link between the ideas discussed
when she taught a science lesson. Conversely, both students B and C used
fewer antonyms than student A. Student B used antonyms in the first and last
reflections such as ‘confident – low self-esteem’, ‘correct – wrong’ and ‘critical
answer – guessing’, and student C used three antonyms in reflection 5 as the
highest number among reflections, but in the last two reflections she did not
use any.
This analysis shows also the use of collocations as another device to increase
the text cohesion and make the written piece more predictable. All the students
tended to use similar collocations related to the topic vocabulary. These are
demonstrated above in Table 7.4, Table 7.5, and Table 7.6. They were ‘lan-
guage skills’, ‘teaching strategies’, ‘paying attention’, ‘class time’, ‘checking
understanding’, and a ‘teacher assistant’. It appeared that use of collocations in
the reflections increased accordingly, depending on the topic discussed and as
students progressed. To elaborate, the increase occurred when the students
reflected on their teaching experiences rather than when they were observed
for a task to be completed as part of the practicum requirement. For example,
student A in reflections 4, 1, 7, and 8 used more collocations as she talked
about her own experience in teaching and working in schools, while in
reflections 2 and 3 she narrated observation experiences and was instructed to
write about topics such as ‘observing the school mentor’ and ‘the importance of
Table 7.4 Examples of synonyms, antonyms and collocations in student A’s reflections
Focus group Reflections Synonyms Antonyms Collocations
Student A Reflection 1 Student – kids - Built a good relationship
Delighted – happy Played a game
Special need students
Inappropriate behaviour
Every week
Teaching practice
In my opinion,
Reflection 2 Lesson preparation
122 Fatema Al Awadi
Planning – preparation Starts – ends

Managing – organizing Put – remove
Reflection 3 Kids – students – children Kids – adults Lesson plan
Misbehaviour problems – misbehaviour actions Control the classroom – good management skills Strong personality
Confident – self-assured Effectively – negatively Eye contact
Reflection 4 Language skills –reading, writing, and listening - Good idea
Observe – see Paying attention
Children – students Teaching strategies
Interested in
Reflection 5 Care – raising - Teaching practice
Encourage – reinforcement Different things
Break time
Reflection 6 Helps – supports - Classroom environment
Have fun
Reflection 7 Literacy – read and write Themselves – myself Language skills
Helping – scaffold Boys – girls Enough time
Helping – assisting Calm down
Went back
It would be better if …
Reflection 8 A gap in the wires – circuit was broken Higher ability – lower ability Grab attention
Showed – presented Together – one student At the beginning
Simple – easy Wasn’t damaged – was damaged Building knowledge
Students – children Circuit was broken – circuit must be connected Faced problems
Experiencing – exploring Damaged – normal one Discussed together
Damaged – fixed
Table 7.5 Examples of synonyms, antonyms, and collocations in student B’s reflections
Student B Reflection 1 Motivated – active and attractive Misbehave – managing the students Answer the question
Correct – wrong In the right way
Reflection 2 Engage – active and motivated - -
Reflection 3 - Each student – everyone Lesson plan
Move around
Reflection 4 Moreover – in addition, etc. Polite – don’t be angry Teaching practice
Reflection 5 Involve – engage - In my opinion, etc.
In control – manage Eyes on me
In my opinion – I think
Kids – students
I believe – in my opinion
Reflection 6 I think – I believe Whole class – individual Light energy
Control – manage At the end of
In the next day
Reflection 7 Practical – suitable - Family tree
Presented – talked about Lesson plan
Students – learners Feel worried
Scaffolding – support Teaching goal
Reflection 8 Promote – encourage Confidence – low self-esteem Already know
Speak up – talk – share ideas Guessing answers – thinking about Critical thinking
it Background knowledge
Guessing – critical answers Questioning technique
Lesson planning
Thinking about it
Table 7.6 Examples of synonyms, antonyms and collocations in student C’s reflections
Student C Reflection 1 Roles responsibilities Naughty Classroom management
Managing – control students – respectful Morning assembly
Grade 12 art section – students First of all
Grade 12 science section – students Naughty students
Teacher – tutor – instructor Class time
124 Fatema Al Awadi
Job – work Second time

Delighted – happy Bad mood
Good mood
Reflection 2 Learner – students Hard – easy Teaching practice
To learn student – teaches Have fun
Reflection 3 A good relation – high-quality relation- Past – present Working together
ship Laugh – upset In my point of view
Problems – rule violations Classroom management
Class – lesson Overall aim
Reflection 4 Feedback – recommendations Boys – girls Teaching practice
Students – body and girls Challenging – appropriate Classroom rules
Appropriate–suitable Overall aim
Reflection 5 Went well – went good Low – middle – high levels Teacher assistant
Three levels – low, middle and high Boy – girls Moving around
Students – boys and girls Confused – get the idea In the future,
Class time
Reflection 6 Learners – students - Lesson plan
Work more – work hard Do my best
Reflection 7 Vocabulary – new words - New words
Work effectively – wholly involved Correct answers
Remember words – consolidate mean- To conclude
ings of vocabulary words
planning’. Student B also had a similar example of using fewer collocations in

reflections 1, 2, 3, and 4, as they were related to observation tasks and short
teaching lessons as a novice teacher. Conversely, in reflections 7 and 8 student
B used collocations such ‘lesson plan’, ‘teaching goal’, ‘family tree’, ‘questioning
technique’ and ‘background knowledge’. However, student C was a different
case, which could be related to her writing ability. Even though reflection 1
was written based on an observation task, she narrated the events more than the
other students; she used ‘classroom management’, ‘morning assembly’, ‘bad
mood’, ‘good mood’, and ‘class time’.
The use of word sets in reflections was the most notable feature of lexical
cohesion in students’ texts. The students used words such as ‘whole’ and ‘part’
indicate parts of the lesson or the practicum as a whole term. Student C men-
tioned the ‘very hungry caterpillar lesson’ as the whole term and other related
terms, which are; ‘aim’, ‘different activities’, ‘matching worksheet’ and ‘feedback’.
Student A mentioned the ‘science lesson plan’ as a whole term and the other
sub-terms are ‘engagement stage’, ‘building knowledge stage’, ‘transition stage’,
‘worksheet’, ‘played a game’, and ‘exploring’. Similarly, in reflection 8, student B
employed ‘questioning techniques’ as the whole term, while ‘guessing’, ‘critical
answers’, ‘thinking’, ‘expect answers’, and ‘students’ background knowledge’
were the parts. This shows that the students developed an understanding of the
reflective genre’s different purposes and what can support the understanding of
the reader through the lexical cohesion. That improvement can perhaps be
related to the input gained in education classes or IELTS preparation courses.
Discussion
To answer the stated research focus questions, the qualitative data will be dis-
cussed. For this purpose, students’ reflections from different education levels,
IELTS reports, preparation courses materials, and student interviews will be
examined to answer the research questions and hypothesis. The hypothesis that
students with prior preparations for the IELTS test could perform better in
writing reflections, and the fact that their differences in IELTS bands could
correlate with achievement in writing reflections, are considered when grading
students’ exams.
The data revealed different results related to the correlation of IELTS bands
and students’ writing of reflections. The first result found is that the IELTS
band cannot reflect all the students’ actual language ability accurately (Ata,
2015; Freimuth, 2014). For example, some students were recognized to be at a
higher level and already performed strongly in writing reflections, as in the case
of student A. The reason is that student A developed an understanding of
writing the genre of reflections categorized as a narrative history genre (Dere-
wianka, 2000; McGuire et al., 2009). In addition, the students developed a
sense of writing the narrative history genre, which may be due to the devel-
opment of writing skills and the increase in language level as they progressed
126 Fatema Al Awadi
(Harmer, 2001). The reason for this is the efficiency in using the topic voca-
bulary and genre register, which added clarity to the content being discussed
(Quirke et al., 2009). This cannot guarantee the students’ successful under-
standing of the genre, but rather it can represent the understanding of writing
reflections due to course requirements, or being trained to write such texts
(Lightbown & Spada, 2013; Wingate & Tribble, 2012). Therefore, the stu-
dents with high language ability can be expected to have a wide vocabulary,
which can be employed to support the text register and enhance the clarity of
the genre (Bawarshi & Reiff, 2010). This also leads to representing the social
context and the purpose of writing reflective journals through the presenta-
tion of terms that identify the narration, as well as providing critical opinions
to mirror the teaching experiences and recommend further improvements in
learning and teaching (Bawarshi & Reiff, 2010; Lukin et al., 2011). More-
over, based on the interview answers and the previous analysis, students A and
B already had very good language skills, which didn’t relate to the IELTS test
band. This was related to to having strong foundational knowledge of English
from instruction in schools, or when entering university. Some students with
good language skills considered the requirement to take the IELTS test to be
part of completing their studies (El Massah & Fadly, 2017; Freimuth, 2014)
rather than as a way to improve their English, which shows a weakness in the
IELTS test as a testing tool.
In contrast, student B’s language ability was lower than the received IELTS
band when she joined the education program. Hence, that low language ability
was shown in her writing of reflective journals through less coherence, and less
grammatical and lexical cohesion. Perhaps the reason for having an IELTS band
that is higher than the student’s actual language skills could be due to the stu-
dent being trained to answer the questions and practiced in how to deal with
IELTS test types (Hughes, 2003; Mahlberg, 2006; Panahi & Mohammaditabar,
2015). However, writing reflective journals proved that free writing strategies
can show students’ actual abilities in understanding and presenting writing
aspects. Writing reflective journals can also affect written genre comprehension
and the ability to present the purpose of the text, these abilities can also be the
result of having a wide range of grammatical resources and vocabulary
(Cameron, 2001; Leech et al., 2001).
The second result identified is that IELTS bands can still indicate a change in
some students’ language performance, even if that change is considered to be a
slight difference from the actual English language ability (Gitsaki et al., 2014;
Raven, 2011). The representation of the IELTS band can be shown either as
the student’s recent performance or a development in English language skills.
The findings showed that student C’s writing performance matched the IELTS
band 5.5 and student B’s language development; however it did not match
student A’s performance (Brown, 2007; Spencer, 2009). Perhaps this is because
when student C first joined the university’s education programme, her lan-
guage level fell within the range of an IELTS band, which was indicated by her
understanding not only of the IELTS test’s writing genres, but also of reflec-
tions (Asador et al., 2016; Wilson, 2016). However, with student B’s case, and
due to the development of language that occurred at the end of her education
years, the IELTS band described the language learning level, which matches the
analysis of the reflections done in the previous section.
Therefore, a further result can be arrived at: The change in student B’s level
was due to taking preparation courses to improve her language, which was
stated in the interviews questions (Hughes, 2003; Panahi & Mohammaditabar,
2015). This led to the student raising her understanding of writing reflections,
learning how to represent a tied, coherent genre through using a text register,
developing her use of punctuation, and providing a line of connected thoughts
using the grammatical cohesion (Bawarshi & Reiff, 2010; Harmer, 2004; Lukin
et al., 2011). In reflective writing, the student was able to present opinions on
her own teaching, think critically about issues and identify solutions, as well as
judge the classroom experience (McGuire et al., 2009). Student C shared the
same opinion about taking the IELTS band and preparation courses, and
showed similarity between the evaluation of her reflections and the match with
her second score.
Limitations
The study has limitations due to the size of the focus group, as only three stu-
dents were selected for the research. Therefore, the results cannot represent all
English language learners taking the IELTS test, and cannot represent all cases
of students writing reflections in higher education. Also, the research only
involved female students, and therefore cannot be generalized over male lear-
ners. Another limitation is that, due to the time at which the research was
conducted, it was difficult to gain the tools easily from the students, as some
had already graduated and other had lost some documents. For example, stu-
dents A didn’t have the first IELTS band, and students C had not yet reached
level eight and therefore hadn’t completed the eighth reflection. The number
of students taking the preparation courses was considered low, which is not
enough to measure their effectiveness with the consideration of other students’
attitudes toward the course.
Recommendations
Two main types of recommendations are stated to enhance this study; one is
related to the improvement of writing reflective journals, and the other is
related to further research to be conducted related to this study. Based on the
findings, it is recommended that language learning must not be separated from
college assignments, especially if that requires the use of language to write a
report, reflection, or essay. Another aspect is that EFL learners require further
consolidation of language even if they are at a college level, and more emphasis
128 Fatema Al Awadi
on language assessment needs to be presented through assessments other than

the IELTS test. In addition, providing additional sessions to improve language
skills is highly recommended to add further to students’ performances.
Regarding further research to be conducted in a similar field, the study can be
extended to compare the impact of IELTS performance on the new testing
system implemented in UAE higher education, which is called EmSAT (Emi-
rates Standard Test). Another extension of the study could consider the correla-
tion between the EmSAT test and English as foreign language learners’
performance. The use of discourse analysis is recommended for this research,
along with using a wider range of participants. This will help to identify the
improvement of performance in writing reflections, and to consider the impact
of college courses on students writing development. The study can be also
extended to measure the development of language skills other than writing for
novice teachers until their graduation.
Conclusion
This study was implemented to explore the correlation of IELTS band and
preparation courses to students’ writing of reflective journals in one of the UAE
higher education institutions. The research found that the IELTS band cannot
be understood as a final judgment, as there are different factors affecting it.
Students’ level in English, previous language knowledge, study input, and
preparation courses can all affect the IELTS test as a tool for language testing
(Ata, 2015; Quirke & Zagallo, 2009). Yet, the study revealed that preparation
courses had an impact on reflective writing as they added further input and
consolidation of the language for two participants.
The students’ language and writing showed improvement as the students
progressed through the levels. They built further confidence towards writing
reflective journals, which was observed through their reflection samples (Bam-
berg, 2011; Hughes, 2003). Their ability to critically identify issues, anticipate
solutions, critically judge the experience, and provide opinions on and recom-
mendations for further enhancement were effectively constructed (Hyland,
2007). That change was notable at the end of the study levels and after the
second IELTS band test, which could be due to their education studies or
preparation courses.
The students’ development in English writing and other skills was identified
through the students’ responses to questions in the interviews. The IELTS
reports, and materials from the preparation courses provided a clearer idea of
the amount of change in the language. The students who took the preparation
courses were provided with opportunities to practice and improve their writing
and language skills while preparing for the IELTS test. Lessons in the prepara-
tion courses were designed to cover broader skills of writing and text organi-
zation than just those required for IELTS writing tests (Panahi &
Mohammaditabar, 2015; Slavin, 2014; Wilson, 2016). However, the student
who didn’t take the course also developed her language abilities, which can be
related to the input given through her college study. In spite of this, the IELTS
score was considered highly important simply because it is a requirement; as the
students stated in the interviews, there were other factors leading to the change
in their language and writing of reflections.
The amount of knowledge gained through education courses and the
assessment of reflections also helped to shape the students’ understanding of
reflective journals. The students learned to narrow their focus to narrating the
practicum experience and they identified strengths and weaknesses (Spencer,
2009). The findings also show that the course criteria for language assessment
may not match the ones for IELTS, which presents another factor to support
that IELTS requirements may differ from academic writing at university. To
conclude, students’ second language learning performance and ability cannot be
judged by only a language test such as the IELTS. The students face challenges
in learning language and understanding different genres, which they can over-
come through additional practice and input given by instructors.
References
Asador, M., Marandi, S., Vaezi, S., & Desmet, P. (2016). Podcasting in a virtual English for
academic purposes course: learner motivation. Interactive Learning Environments, 24(4),
875–896.
Ata, A. W. (2015). Knowledge, education, and attitudes of international students to
IELTS: a case of Australia. Journal of International Students, 5(4), 488–500.
Bamberg, M. (2011). Narrative discourse. Narratology beyond literary criticism: Mediality,
disciplinarity. In J. C. Meister, T. Kindt, & W. Schernus (Eds.), (pp. 213–237). Walter
de Grutyer.
Bawarshi, A., & Reiff, M. (2010). Genre: An introduction to history, theory, research and
pedagogy. Parlor Press.
Berk, L. (2009). Child development. Pearson.
Brewster, J., Ellis, G., & Girard, D. (2002). The primary English teacher’s guide. Penguin
English.
Brown, P. (2007). Reflective teaching, reflective learning. MA. Thesis. University of
Birmingham.
Buri, C. (2012). Determinants in the choice of comprehensible input in science classes.
Journal of International Education Research, 8(1), 1–18.
Burton, J. (2009). Reflective writing-getting to the heart of teaching and learning. In J.
Burton, P. Quirke, C. Reichmann, & J. Peyton (Eds.), Reflective writing: a way to life-
long teacher learning (pp. 1–11). TESL-EJ Publications.
Cameron, L. (2001). Teaching languages to young learners. Cambridge University Press.
Creswell, J. (2012). Research design: qualitative, quantitative, and mixed methods approaches.
Sage Publications.
Derewianka, B. (2000). Exploring how texts work. Primary English Teaching Association.
El Massah, S., & Fadly, D. (2017). Predictors of academic performance for finance stu-
dents: Women at higher education in the UAE. The International Journal of Educational
Management, 31(7), 854–864.
130 Fatema Al Awadi
Emmitt, M., Pollock, J., & Komesaroff, R. (2003). Language and learning: An introduction
for teaching. Oxford University Press.
Freeman, D., & Freeman, Y. (2001). Between worlds: Access to second language acquisition.
Heinemann.
Freimuth, H. (2014). Cultural bias in university entrance examinations in the UAE. The
Emirates Occasional Papers, 85, 1–81.
Gitsaki, C., A. Robby, M., & Bourini, A. (2014). Preparing Emirati students to meet
the English language requirements for higher education: a pilot study. Education,
Business and Society: Contemporary Middle Eastern Issues, 7(3), 167–184.
Harmer, J. (2004). How to teach writing. Pearson.
Harmer, J. (2001). The practice of English language teaching. Longman.
Hyland, K. (2007). Genre pedagogy: Language, literacy and L2 writing instruction.
Journal of Second Language Writing, 16, 147–164.
Hughes, A. (2003). Testing for language teachers. Cambridge University Press.
Johnson, K. (1999). Understanding language teaching: Reasoning in action. Heinle ELT.
Johnston, N., Partridge, H., & Hughes, H. (2014). Understanding the information lit-
eracy experiences of EFL (English as a foreign language) students. Reference Services
Review, 43(4), 552–568.
Krekeler, C. (2013). Languages for specific academic purposes or languages for general
academic purposes? A critical reappraisal of a key issue for language provision in
higher education. Language Learning in Higher Education, 3(1), 43–60.
Kyriacou, C. (2007). Essential teaching skills. Nelson Thornes.
Leech, G., Cruickshank, B., & Ivanic, R. (2001). An A-Z of English grammar & usage.
Pearson.
Lightbown, P., & Spada, N. (2013). How languages are learned. Oxford University Press.
Lukin, A., Moore, A., Herke, M., Wegener, R., & Wu, C. (2011). Halliday’s model of
register revisited and explored. Linguistics and the Human Sciences, 4(2), 187–213.
Mahlberg, M. (2006). Lexical cohesion: corpus linguistic theory and its application in
English language teaching. International Journal of Corpus Linguistics, 11(3), 363–383.
McCarthy, M., & Carter, R. (1994). Language as discourse perspectives for language teaching.
Longman Group.
McGuire, L., Lay, K., & Peters, J. (2009). Pedagogy of reflective writing in professional
education. Journal of the Scholarship of Teaching and Learning, 9(1), 93–107.
Mills, J. (2014). Action research: a guide for the teacher researcher. Pearson.
Moore, T., & Morton, J. (2005). Dimensions of difference: A comparison of university
writing and IELTS writing. Journal of English for Academic Purposes, 4(1), 43–66.
Panahi, R., & Mohammaditabar, M. (2015). The strengths and weaknesses of Iranian
IETLS candidates in academic writing task 2. Theory and Practice in Language Studies, 5(5),
957–967.
Qin, W., & Uccelli, P. (2016). Same language, different functions: a cross-genre analysis
of Chinese EFL learners’ writing performance. Journal of Second Language Writing, 33,
2–17.
Quirke, P., & Zagallo, E. (2009). Moving towards truly reflective writing. In J. Burton,
P. Quirke, C. Reichmann, & J. Peyton (Eds.), Reflective writing: a way to lifelong teacher
learning (pp. 12–30). TESL-EJ Publications.
Raven, J. (2011). Emiratizing the education sector in the UAE: contextualization and
challenges. Education, Business and Society: Contemporary Middle Eastern Issues, 4(2),
134–141.
Slavin, R. (2014). Education psychology theory and practice. Pearson.

Spencer, S. (2009). The language teacher as language learner. In J. Burton, P. Quirke,
C. Reichmann, & J. Peyton (Eds.), Reflective writing: A way to lifelong teacher learning
(pp. 31–48). TESL-EJ Publications.
Wilson, K. (2016). Critical reading, critical thinking: Delicate scaffolding in English for
academic purposes. Thinking skills and creativity, 22, 256–265.
Wingate, U., & Tribble, C. (2012). The best of both worlds? Towards an English for
academic purposes/academic literacies writing pedagogy. Studies in Higher Education,
37(4), 481–495.
Zeichner, K., & Liston, D. (1996). Reflective teaching: an introduction. Lawrence Erlbaum
Associates.
Part 3
Teachers’ language assessment

literacy
Chapter 8
Language assessment literacy of

novice EFL teachers
Perceptions, experiences, and training1
Aylin Sevimel-Sahin2
Introduction
Language assessment is an important element of language education because it
feeds language teaching and can be used for different purposes. On the one hand,
assessment gives feedback about the effectiveness of instruction, the progress of
language learners in the target language, whether the expected learning outcomes
are achieved, what students learn or do not learn, and whether the syllabus, the
teaching method, and the materials are useful for the ongoing language learning
(Bachmann, 2005; Brennan, 2015; Hidri, 2018). On the other hand, assessment
helps teachers to make results-dependent decisions in order to improve their
instruction, to foster a better language learning context, as well as to evaluate
themselves in respect to their teaching (Rea-Dickins, 2004). Moreover, language
assessment can motivate both teachers and students – for teachers, they can find
out how effective their teaching is, and for students, they can detect their strengths
and weaknesses regarding their language development (Heaton, 2011). As a con-
sequence, they can be motivated to enhance their teaching/learning more.
Language teachers need to be competent in the target language they teach,
to know what and how to teach, and to know how to assess. Among these
three qualifications, language assessment can be considered an essential part of
professional competence because it is a teacher’s guide to planning their
teaching to support learning, and to regulating their pedagogical decisions.
Accordingly, it can be deduced that language teachers have two roles – a tea-
cher and an assessor (Scarino, 2013; Wach, 2012). In order to carry out efficient
and suitable assessment practices, language teachers need to be assessment lit-
erate. Language assessment literacy (LAL) means possessing the required
knowledge of assessment and the ability to perform assessment practices. If
teachers are assessment literate, then they can respond to the needs of their
educational context more effectively. Therefore, LAL is a fundamental aspect of
their teaching competence. However, to be able to assess effectively and take
advantage of assessment results appropriately are not easy as they seem. Since
language teachers are not born with the required competence or ability related
to assessment (Jin, 2010), they need to be trained and equipped with the
136 Aylin Sevimel-Sahin
necessary knowledge and skills in this respect to reinforce language learning/

teaching. Then, they become able to conduct effective and proper language
assessment procedures, which indicates they are assessment literate.
Review of literature
Recently, there has been much more emphasis on language assessment due to
changing notions and approaches in language education. Earlier, the concepts of
teaching and assessment were considered separate, and assessment was performed
independently of teaching, especially, in the form of testing (Viengsang, 2016).
But as a result of sociocultural theories of learning which have also influenced the
domain of language assessment, the testing notion has been transformed into the
concept of assessment (Berger, 2012; Hidri, 2016; Inbar-Lourie, 2017;
O’Loughlin, 2006), which incorporates not only measuring language proficiency
of learners as in testing, but also monitoring and improving their progress in the
target language (Csépes, 2014). Hence, the importance of language assessment to
motivate and reinforce language learning by tracking the development of learners
has been realized. That is, tests or assessment tools can be used to improve
learning apart from just measuring or testing language knowledge (Heaton,
2011). So, assessment for learning has become more important than assessment of
learning. In line with these considerations, researchers have focused more on
how teachers can be effective assessors according to their teaching contexts in
order to support language learning, what kind of characteristics they need to have
for this, and how they can better administer assessment. For all these issues,
researchers have recently explored what it means to be language assessment lit-
erate, and whether teachers have acquired LAL necessities and have been able to
practice them in their classes successfully.
Conceptual framework
The concept of LAL originated from the term assessment literacy (AL) introduced
by Stiggins (1995) in general education. Yet, the concept of AL remained rather
general. Thus, researchers became concerned with the specific competencies of
teaching subject areas that may require different kinds of assessment. For for-
eign language teaching/learning environments, they tried to describe what
LAL refers to and what it constitutes. Also, since teachers are the main stake-
holder in foreign language education (Giraldo, 2018), the definitions of LAL
are mostly based on the assessment literacy of classroom teachers, which is also
called ‘language teacher assessment literacy’. LAL refers to the competence of
knowing what, when, and how to use assessment to gather information about
language learners’ development as well as using the results to enhance the
quality of instruction (Jeong, 2013). Nonetheless, Fulcher (2012) presented a
comprehensive definition of LAL:
LAL of novice EFL teachers 137
The knowledge, skills and abilities required to design, develop, maintain or

evaluate, large-scale standardized and/or classroom-based tests, familiarity
place knowledge, skills, processes, principles, and concepts within wider
historical, social, political, and philosophical frameworks in order to
(p. 125)
Considering this definition, LAL does not only mean the knowledge and skills
of assessment but also it deals with reasoning, impact, frameworks, and con-
textual and ethical issues related to language assessment. Similarly, Giraldo
Aristizabal (2018) indicated language teachers need to be competent both in
classroom assessment and large-scale testing to make their students develop in
the target language. Hence, Fulcher’s definition of LAL can be considered quite
extensive in terms of its meaning. However, it should be noted that the term
foreign language assessment literacy (FLAL) is preferred in the current study in
order to emphasize foreign language learning context; that is, the English as a
foreign language (EFL) context.
As for the constituents of LAL, researchers have reported their own frameworks
corresponding to their own LAL perspectives. For instance, Davies (2008) asserted
LAL is made up of ‘knowledge’ (defining language measurement and framework),
‘skills’ (designing, administering, and analyzing) and ‘principles’ (using properly
and ethically). In the same vein, Inbar-Lourie (2008) demonstrated the compo-
nents of LAL as ‘what’ is to be measured as language construct, ‘how’ assessment is
carried out, and ‘why’ a certain assessment practice is conducted. Fulcher (2012)
also suggested a three-layered model of LAL in which there are ‘practices’ (con-
structing, applying, evaluating), ‘principles’ (fundamentals, ethics, concerns) and
‘contexts’ (origins, frameworks, impact). In addition, Hill (2017) focused on the
elements of classroom teacher language assessment literacy and argued that LAL
consists of ‘practice’ (knowledge and skills), ‘concepts’ (own understanding and
beliefs), and ‘context’ (impact of teaching environment). Likewise, Inbar-Lourie
(2013) and Stabler-Havener (2018) emphasized the importance of teaching con-
texts with respect to LAL components and they pointed out that knowledge and
skills of assessment should be employed suitable to a teacher’s own context. Fur-
thermore, Giraldo Aristizabal (2018) and Hidri (2016) highlighted that beliefs,
conceptions, and previous experiences about language assessment affect the way
how LAL is constructed. As a conclusion, the proposed constituents of LAL in the
literature have a lot in common such as knowledge, skills, practices, concepts,
contexts, and principles.
When it comes to the characteristics of language assessment literate teachers,
several researchers found similar dispositions (Gotch & French, 2014; Huang &
He, 2016; Khajideh & Amir, 2015; Rogier, 2014; Ukrayinska, 2018). For
instance, language assessment literate teachers are able to distinguish between

sound and unsound assessment. They are also able to determine aims of assess-
ment and use assessment ethically. Besides, they are able to know how to
select, design, administer, and analyze assessment, and as a consequence, they
know how to interpret and integrate the results of assessment for student
learning. Moreover, they are able to relate assessment to their instruction,
approaches, or techniques. After all, language teachers need to have such qua-
lifications to be assessment literate because LAL is useful to plan, regulate, and
reinforce teaching, and to support and motivate student learning (Howerton,
2016; Newfields, 2006; White, 2009).
In conclusion, the concept of LAL is essential for every language teacher to
be beneficial to language learning, and the knowledge, skills, and characteristics
of LAL are mostly achieved by the help of teacher education programs (DeLuca
& Klinger, 2010; Lam, 2014). So, training programs are expected to educate
their teacher candidates to be assessment literate for their future career.
Relevant research studies

Over the last decade, there have been significant attempts to describe and
characterize the concept of LAL due to the greater emphasis on the role of
teachers as assessors (Csépes, 2014). However, research into this concept is
acknowledged to be new in foreign language education settings (Fulcher,
2012). The research studies have mostly focused on the effects of demo-
graphic variables, beliefs, and practices of EFL teachers on their language
assessment literacy as well as the impact of workshops designed as in-service
training for LAL.
To start with, the studies about the perceptions of in-service EFL teachers
about what they think of language assessment revealed that most teachers
tended to consider assessment or evaluation something equal to testing or
measurement (Berry et al., 2017; Duboc, 2009). That is, in-service EFL tea-
chers thought that assessment is only used for the purpose of measuring the
language knowledge of learners, and thus, they did not know how to take
advantage of testing for other purposes, such as to improve their teaching and
learners’ language development (Klinger, 2016; Tsagari, 2013). Some studies
have also shown most EFL teachers were not familiar with the concept of
being language assessment literate (Berry et al., 2017; Semiz & Odabas, 2016).
In addition, the research into what extent in-service EFL teachers feel ready
to assess yielded that most teachers did not feel ready to undertake effective
assessment procedures, which has indicated lower levels of teacher LAL
(Buyukkarci, 2016; Fard & Tabatabei, 2018; Tsagari & Vogt, 2017; Vogt &
Tsagari, 2014). Similarly, some EFL teachers were also found to be familiar
with basic knowledge assessment but nothing more (Semiz & Odabas, 2016;
Turk, 2018). Besides, Hudaya (2017) presented that most of the participant
EFL teachers felt prepared to assess, but stated their difficulty in giving
feedback. Likewise, Ukrayinska (2018) demonstrated that most of the EFL

teachers had some assessment challenges in their classrooms even after they
took their pre-service training for LAL.
With respect to whether teaching experience has an effect in LAL, some stu-
dies reached the conclusion that more experienced teachers had less difficulty in
their assessing skills while less experienced ones encountered significantly more
challenges in their assessment activities (Hakim, 2015; Oz, 2014; Turk, 2018;
Yan et al., 2018; Yastibas & Takkac, 2018). It is because more experienced tea-
chers were more aware of assessment purposes and tools, and that experience
then affected their preferences and uses of assessment. Also, some studies have
established that teaching experience did not create any differentiating point for
EFL teachers in terms of their language assessment procedures because most tea-
chers had a great deal of knowledge in how to assess language learners, but most
of them used similar assessment techniques such as traditional testing tools in the
form of summative assessment (Buyukkarci, 2016; Jannati, 2015; Onalan & Kar-
agul, 2018; Oz & Atay, 2017; Sahinkarakas, 2012).
Moreover, several research studies about the relationship between assessment
beliefs and practices confirmed there was a mismatch between what EFL tea-
chers believed and what they practiced in their classrooms regarding assessment.
For example, most EFL teachers believed in, and held positive attitudes
towards, formative assessment; nonetheless, they did not prefer it. Instead, they
preferred to use summative assessment techniques to measure only the product
of language learning (Buyukkarci, 2014; Karagul et al., 2017; Klinger, 2016;
Munoz et al., 2012; Oz & Atay, 2017; Tsagari & Vogt, 2017). Also, most EFL
teachers were found to test grammar and vocabulary knowledge as a sign of
language development and without much assessment of other language skills
(Duboc, 2009; Mede & Atay, 2017; Semiz & Odabas, 2016; Tsagari, 2013;
Tsagari & Vogt, 2017; Wach, 2012). In the same vein, most teachers utilized
traditional testing tools as written pen-and-paper tests, multiple-choice, fill in
the blanks, and similar techniques to find out how much students learned and
what sort of knowledge was not acquired in terms of grammar and vocabulary
(Buyukkarci, 2014; Duboc, 2009; Oz, 2014; Semiz & Odabas, 2016; Tsagari,
2013; Tsagari & Vogt, 2017). Yet, some studies concluded most EFL teachers
wanted to implement formative assessment in their classes but their government
policy about education did not allow such assessment owing to the fact that the
policy adopted a high-stakes examination system, which caused them to
employ exam-oriented approaches (Saka, 2016; Yan et al., 2018).
Regarding the relationship between the knowledge and the practice of lan-
guage assessment, some studies revealed most EFL teachers had adequate
knowledge; nevertheless, they were not able to put their knowledge into
practice (Giraldo Aristizabal, 2018; Hakim, 2015; Jannati, 2015; Tsagari &
Vogt, 2017). Thereupon, some studies investigated the effect of LAL training
on their knowledge and practices of assessment. On the one hand, some studies
showed EFL teachers had not had any LAL training, so they did not know how
to assess (Berry et al., 2017; Vogt & Tsagari, 2014). On the other hand, LAL
training given through teacher education programs was found to be fruitful.
For instance, Hilden and Frojdendahl (2018) demonstrated EFL teachers gained
certain assessment abilities and developed more learner-centred conceptions for
assessment by means of their teacher education training programs.
In spite of the benefits of training, especially with respect to knowledge base,
some other studies focused on the points missed by such training, because EFL
teachers came across difficulties in assessment despite their training, and as a
result, they found teacher education programs could not develop the LAL of
their future teachers as expected (Djoub, 2017; Fulcher, 2012; Gebril, 2017;
Klinger, 2016; Lam, 2014). For example, Turk (2018) indicated pre-service
training related to language assessment knowledge was good, but teachers were
not able to take effective assessment procedures. Also, Hatipoglu (2015) and
Lam (2014) reported such training was mostly based on testing-oriented issues
while highlighting the importance of testing in language learning/teaching.
Considering the points of lack in such training, most studies determined that
LAL training could not provide EFL teachers with practical assessment skills
(Hatipoglu, 2010; Lam, 2014; Sariyildiz, 2018; Sheehan & Munro, 2017; Yan
et al., 2018) and hence, EFL teachers could not perform appropriate assessment
strategies in their classes. In addition, Mede and Atay (2017) noted that EFL
teachers had difficulties in designing assessment and giving proper feedback due
to the lack of training in those points. Semiz and Odabas (2016) also revealed
training should include testing language skills apart from grammar and voca-
bulary. Likewise, Mede and Atay (2017), and Turk (2018) suggested EFL tea-
chers need training, especially in formative assessment, because the training they
had at university did not improve their assessment for learning strategies.
Moreover, Tsagari and Vogt (2017), and Vogt and Tsagari (2014) pointed out
that as EFL teachers criticized, LAL training did not emphasize the local con-
ditions with respect to language assessment, and therefore, they had difficulties
in responding to the assessment needs of their local contexts.
It can be seen that there have been notable attempts to investigate concept of
LAL in the literature. Yet, few of them have investigated how LAL is viewed
from the eyes of EFL teachers, especially the beginner ones, and what LAL
training contains, and whether it is beneficial to teachers. This issue is also
present in Turkey. Therefore, there is a need to understand how LAL is per-
ceived and experienced by in-service EFL teachers, especially by novice ones,
because they have just started in their teaching profession and their language
assessment knowledge and skills can be regarded as being updated and fresh.
The current study is designed to investigate what Turkish EFL teachers know
about the concepts of ELTE (English Language Testing and Evaluation) and
FLAL (Foreign Language Assessment Literacy), whether their training is suffi-
cient for their recent assessment practices, and how they put what they learn
into practice. In this way, the present study is believed to provide some insight
into the perceptions and experiences of novice EFL teachers with respect to
their LAL, to reflect on the assessment training of novice EFL teachers, and to
contribute to the field of LAL research due to the limited numbers of similar
studies, especially with novice teachers.
For these purposes, the following research questions were addressed:
1 What do novice EFL teachers think about the concepts of ELTE and FLAL?
2 How do novice EFL teachers evaluate their LAL training taken at
university?
3 How does novice EFL teachers’ LAL training affect their practices of language
assessment in their classes?
Methodology
The present study was designed as a phenomenological study, which is one of
the qualitative research designs. The main goal of phenomenology is to
obtain a deeper understanding of the meanings of lived experiences through
describing peoples’ multiple perceptions to them by means of finding
common shared experiences of a concept or a phenomenon (Creswell, 2007;
Fraenkel et al., 2011; Patton, 2002). Since the purpose of this study is to
reveal the perceptions and experiences of novice EFL teachers about the LAL
concept, phenomenology was the preferred approach to investigate, describe,
and interpret the LAL commonalities among novice EFL teachers to get a
better understanding of LAL.
Research context and participants

In Turkey, there is a four-year training in teacher education programs at uni-
versity for those who wants to be an English language teacher. During this
training, undergraduate students take several courses about English language
teaching (ELT), and, in their last two semesters, attend a teaching practicum.
After training is completed successfully, most of the graduates are appointed by
the Ministry of Education (MoNE) to the state schools of Turkey across dif-
ferent age groups and language levels each year.
In terms of LAL training, students take an ELTE course at their last semester
before graduation. This course consists of the topics such as approaches and
principles of language testing, different kinds of testing tools, and how to design
and evaluate tests/tasks for each language skill and area, all of which are about
classroom assessment suitable for different language levels and age groups. As a
coursebook, Heaton’s (2011) Writing English language tests is employed and the
course lecturers try to supplement this course by adding other resources as well.
It should be noted teacher candidates are not evaluated for their testing and
assessment skills in the practicum; only their teaching skills are evaluated. This
ELTE course is assumed to provide a course-based LAL training for the future
EFL teachers in Turkey.
As for the participants, 22 novice EFL teachers working at state schools teaching
different language levels across Turkey took part in the study. They graduated
from the same university and the same program (ELT) within five years. Since this
study was conducted in the Spring Semester 2018, the graduates of 2013, 2014,
2015, 2016, and 2017 participated in the research. All of them had taken the
ELTE course by the same course lecturers with the same syllabus and the course-
book mentioned above. They were novice in-service EFL teachers because their
teaching experience ranged from six months to a maximum of five years. There-
fore, the sampling of the participants was based on purposeful sampling, and all the
participants took part in the study voluntarily (see Table 8.1).
Data collection procedure and analysis

The present study was carried out in the Spring Semester of 2018 and the fol-
lowing steps were taken to collect and analyze the data. First, to collect the
data, open-ended questions were constructed by the researcher and were eval-
uated by three field experts for reliability and validity issues. For this phenom-
enological study, open-ended questions were preferred as the data collection
tool instead of interviews in order to reach the sample easily and broadly across
Turkey. There were eight questions with sub-questions about novice EFL
teachers’ understanding of ELTE and FLAL, the ELTE course content as rela-
ted to their LAL training, their current assessment practices in their schools, and
background information to identify the sample characteristics. There was also a
consent form to show their voluntariness in the study. All the open-ended
questions with the consent forms were inserted into Google Forms. Second, an
invitation email to take part in the study was sent to alumni who had graduated
from the same university within five years and who were working as EFL tea-
chers. 37 novice EFL teachers responded to the questions in Google Forms, but
only 22 of the responses were valid due to some missing responses.
Table 8.1 The profile of novice EFL teachers (numbers)

School grade (all state)
primary elementary high school total teaching
experience
Graduation year
2017 0 2 1 3 6 months
2016 1 3 3 7 1.5 years
2015 2 1 2 5 2.5 years
2014 1 1 2 4 3.5 years
2013 1 0 2 3 4.5 years
total 5 7 10 22
Third, interpretive phenomenological analysis (IPA) was conducted to analyze the

data. In IPA, researchers try to get the clusters of meanings to identify the common
thoughts and experiences about a phenomenon by analyzing the extracts of each
participant; these clusters are then categorized thematically (Creswell, 2007; Fraenkel
et al., 2011; Smith et al., 2009). In other words, IPA is made up of the methods used
in thematic and content analysis, which are performed inductively (Smith et al.,
2009). Therefore, all the responses were subjected to an initial coding to get a general
vision of the data and to assign a few themes by the researcher. Then, the data were
analyzed in detail by the help of NVivo 11 Pro program: The data were coded the-
matically, the number of data chunks were calculated, and the organization of the
codes and themes were done. Afterwards, to establish the interrater reliability, two
experts in the field of English language assessment, and who have taught a university
ELTE course for fifteen years, examined the consistency between the codes and the
themes. According to their feedback, some modifications were performed, and the
last version of the data analysis was generated (see Figure 8.1). Intercoder agreement
Nodes
Name Files References
− FLAL-Eng (Alumni) 3 367
− Experiences about ELTE (past & present) 2 252
− ELTE course-related experiences (previous experi 2 198
− General benefits of ELTE course 2 98
+ Acquired knowledge & skills (content lea 2 37

Preparing for future 2 3
+ Responding to needs 19
Showing own FLAL 2 3
+ Sufficiency of the course content 2 36
− Shortfalls of ELTE course 2 100
+ Shortcomings 2 12
+ Suggestions 2 88
− Practices of using ELTE course in career life (curr 2 54
+ Able to use ELTE info in class (how) 2 27
− Difficulties in using 2 27
+ Difficulties as external factors 2 9

+ Difficulties as internal factors 2 18
− FLAL-ELTE Concept 2 115
+ Definition of ELTE 1 44
+ Importance of ELTE 1 27
− Perceptions of own FLAL 1 44
+ Perceived competency level 1 22

+ Teaching experience 1 22
Figure 8.1 The view of the themes and subthemes in NVivo 11 Pro Program
was also calculated through Miles and Huberman’s (1994) suggested procedure and
the agreement was found high between the raters (97%).
Finally, in accordance with the determined themes, the findings were inter-
preted and discussed narratively by giving some examples from participants quotes.
Findings
The novice EFL teachers conveyed their thoughts and experiences about English
language assessment in terms of both their ELTE training at university and their
ELTE applications in their current classrooms. The analysis of their responses
yielded two themes: ‘FLAL-ELTE Concept’ and ‘Experiences about ELTE (past
& present)’ (see Figure 8.2). The sample commented more on experiences about
ELTE (past & present) (f=252) than the FLAL-ELTE concept (f=115).
The first theme is about how the FLAL-ELTE concept is perceived by
novice EFL teachers. For ‘FLAL-ELTE Concept’, novice EFL teachers first
defined what ELTE means. They mostly concentrated on the issue of language
proficiency (language skills and areas) and the aims of language learning. Most
of them indicated language assessment means measuring language proficiency
and determining the deficiencies in the target language. Still, only a few
reported that ELTE refers to the understanding of whether course aims are
achieved, the planning and evaluation of language learning progress, and
reporting the findings. For example, one noted that ‘assessment is to measure
the proficiency level of language learners as well as evaluating their successes
and progress in that foreign language’ (Low.T.10).
Apart from this, novice EFL teachers focused on the importance of testing
and evaluation in English language classes. Most of them underscored the fact
that ELTE is important because of its significant role in testing language profi-
ciency, determining and evaluating language progress, showing whether the
aims are achieved, and getting feedback about deficient points in that language.
For instance, one of the novice EFL teachers pointed out that:
In the teaching/learning context, feedback should be given in order to

evaluate and improve language proficiency of learners. So, feedback
comes from the assessment results of learners, and is used again to
develop their language proficiency. In this way, the shortfalls of the
education may be fulfilled.
(High.T.6)
As far as their perceptions of their own levels in FLAL – that is, how much
qualified they felt in English language assessment – half of the participants
perceived themselves to be highly competent (n=11) whereas only two tea-
chers felt themselves inadequate; others felt they could deal with assessment at
the moderate level (n=7). Nonetheless, only one novice EFL teacher, who felt
highly competent in terms of FLAL, mentioned that teachers should have the
Figure 8.2 The whole findings of novice EFL teachers

knowledge and skills of ELTE itself (i.e. knowing how to construct exams,
how to interpret results, and how to report them). In terms of FLAL compo-
nents, one underlined that:
Language assessment means to have knowledge about the concepts and

methods related to the testing of language skills, to examine test preparation
processes related to language proficiency and development, to prepare a
language test for a specific group of students, and to plan, implement and
evaluate the results via item analysis. And also, it means to report assessment
results in written format and present them verbally.
(High.T.17)
The second theme revealed by the findings of novice EFL teachers’ statements
is ‘Experiences about ELTE (past & present)’. For ‘Experiences about ELTE
(past & present)’, they highlighted their ‘previous experiences about ELTE
course’ (f=198), and their ‘current experiences about using’ their knowledge
and skills in their professional lives (f=54). The first subcategory is about their
‘previous experiences about ELTE course’ they took in their ELT training
program. First, they mentioned what they acquired after they took the ELTE
course. For example, they stated they learned a lot about language assessment,
the criteria for effective testing (i.e. reliability, validity), how to design several
language testing tools to test students’ language skills, and the kinds of tests or
tasks that can be used to measure language proficiency. They also underscored
that the course was useful to prepare teacher candidates for their future teach-
ing career regarding language assessment, and their expectations and needs were
more or less met thanks to this course. Therefore, most of them found the
content of the course was sufficient and beneficial to their teaching lives. For
instance, some of them exemplified that:
Surely, this course [ELTE in teacher training] has helped me a lot. Simply
put, it has been helpful with respect to such topics as ‘how a language test
should be designed, what it should be covered, how the questions/tasks
should be organized, and how the evaluation should be made.
(Moderate.T.7)
I think the course [ELTE course] was adequate. It provided us with

necessary knowledge and skills about assessing language skills.
(Moderate.T.2)
On the other hand, despite being helpful, there were some aspects of the
ELTE course that were criticized by all the novice EFL teachers irrespective of
their perceived FLAL levels, and they, made some suggestions for further
improvement. For example, the course was found to be very theoretical and
lacking in the practical dimension because the respondents said they could not
practice what they learned during the course until they were appointed. The
novice EFL teachers believed training about language assessment should have
included more practical exercises and implementations such as simulations to
perform tests. Besides, few of them underlined that the class hours were
inadequate, and the last semester was too late to take the course. For instance,
some commented that:
The content of the course [ELTE course] was satisfying in terms of theory
but the practical side was not enough.
(Moderate.T.20)
The [ELTE] course was very theoretical. More practice would have been
better. For example, for each language skill, students should prepare an
assessment tool every week, and the criteria for the evaluation should be
interpreted.
(High.T.1)
Besides, some novice EFL teachers argued that they did not have any
opportunity to test language learners in the practicum; they were only eval-
uated in terms of their teaching skill and not testing skills. Thus, according to
them, teacher candidates should be made to prepare and administer some tests
in their teaching practice for the sake of gaining experience in assessment in
addition to teaching. Therefore, one of them suggested that: ‘Before teacher
trainees start [in the] teaching profession, they should be provided with
experience [practice] in terms of assessment process by making them to prepare
at least a part of a language exam during micro-macro training [practicum
courses]’ (Moderate.T.5).
Moreover, some novice EFL teachers complained they did not learn any-
thing about the evaluation aspects of testing such as scoring and feedback. For
this reason, new topics such as analyzing test results should be added. One
participant teacher recommended that: ‘[To the ELTE course content], exer-
cises of preparing various test items, techniques of evaluation and analysis, and
statistical calculations [can be added as new topics]’ (Moderate.T.2).
Furthermore, some participant teachers discussed the great degree to which
the ELTE course content concentrated on testing grammar and vocabulary and
how some attention was given to receptive skills testing. They also emphasized
how they learned to design multiple choice tests, cloze tests, and the like.
Nevertheless, the testing of productive skills (speaking and writing) was not
much emphasized in the course or in other types of assessment, such as alter-
native assessment methods. For instance, one of them indicated that:
I think I acquired the essential knowledge about assessing grammar, lis-

tening, and reading. But I think the [ELTE] course was not successful
enough in terms of the topics about assessing writing and speaking. […]
Especially, I cannot assess students’ speaking and writing skills. In my opi-

nion, the knowledge about assessing speaking and writing skills that I
gained at university is inadequate. It is because there was not much prac-
tice about that topic.
(Low.T.10)
In order to make the ELTE better, some novice EFL teachers suggested
in-service training as a way to cover some of the missing parts of the
course, which would help to eliminate their inabilities related to language
assessment. For instance, they might learn from other teachers’ experiences
by sharing their testing techniques and discussing their assessment ideas.
Also, they assumed in-service training might be helpful to remind them of
previous ELTE knowledge and practices they learned as they complained
they forgot certain things about assessment throughout years. For that, one
of them stated that:
I think in-service training [about ELTE] is necessary because in-service

teachers prepare language exams without paying attention to some assess-
ment criteria after a while. Therefore, constantly, in-service seminars
should be provided suitable to the changing needs of the system and what
is learned at university should be kept fresh all the time.
(Moderate.T.12)
With respect to in-service training, some also reported that not all teachers had
language assessment training at their universities and they criticized that even
those who took ELTE training, might still be incompetent in language assess-
ment because not all universities took care of the course at the expected level
and to the expected extent. For instance, one of them argued that:
In-service training [about ELTE] should be provided. I think each uni-

versity does not put emphasis on this topic in the same way. I have come
to this point on the basis of the materials that are prepared by my
colleagues.
(High.T.19)
The second subcategory is about their ‘current experiences in their career

life’; that is, how they used and practiced what they learned at the end of
their LAL training. For this topic, most of the novice EFL teachers gave
specific examples how they used testing knowledge to assess their students’
language proficiency. Regardless of their perceived LAL levels, they mostly
discussed the test development criteria and techniques to measure students’
proficiency rather than how to monitor students’ progress in the target
language during the teaching process. One of them stated that:
It did work [ELTE course was helpful at work]. For example, I pay attention
to the difficulty level of test items while preparing exams. I avoid giving clues
about correct answer in true-false test items. I try to be careful about the
clarity of questions. I prepared each test item which measures only one skill. I
pay attention to the length of the gaps in the fill-in-the-blanks test items.
(High.T.8)
Likewise, one of the novice EFL teachers, who felt highly competent, focused
on their ability to prepare language tests according to students’ levels and
course aims in contrast to their colleagues who used available exams regardless
of such points. That participant reported that:
Considering my first experience [about language assessment], I can say that

I have not had any difficulty in testing language [at my school]. I am able
to construct my own assessment criteria. Although the other teachers in
the school [colleagues] prefer ready-made assessment tools, I created my
own tests. Since my tests are more appropriate to my students as well as
the learning outcomes, I have achieved more success.
(High.T.4)
On the other hand, although they stated they could apply their testing
knowledge, and did well in designing own language tests, they mentioned
they still had some challenges. For example, they stated their students had
lower levels of English language proficiency even though their age and edu-
cation levels were high; and thus, they could not apply high-level tests to
their students. In other words, they could not respond to the needs of their
local contexts as the external factor. This is because there was nothing about
Turkish contextual issues related to language assessment in their LAL training;
they just studied imaginary situations and hence, the course was considered
somewhat idealized by the participants. One of the novice EFL teachers drew
attention to the Turkish examination system, which is based on high-stakes
testing in the form of multiple-choice questions, and how such exams have
priority in Turkey. Therefore, they had to conduct written exams like that in
their classrooms.
In addition, most of the novice EFL teachers focused on the lack of prac-
tice as the internal factor, and complained about their inability to perform
effective testing practices in their current classes because they were unable to
adapt their theoretical knowledge to their teaching contexts. It is because, as
they mentioned before, they did not have any chance to experience language
testing during their LAL training at university. For that, one of them argued:
‘I have difficulties in practice and implementation. I have theoretical knowl-
edge [about language assessment], but I have difficulty in combining such
knowledge with the teaching style and testing system at state schools [in
Turkey]’ (Moderate.T.20).
To sum up, it can be concluded that most of novice EFL teachers had appar-
ently great knowledge of ELTE thanks to the course and learned how to design
language tests, but they had some difficulties, especially in practicing, due to lack
of experience and practice during the course. While they defined the concept of
ELTE from testing perspective and believed the importance of assessment, they
mostly emphasized the notion of summative assessment both in terms of the
definitions and their current uses of assessment in teaching contexts.
Discussion
The present study has focused on the perceptions and experiences of novice
EFL teachers as well as their LAL training. The findings yielded two themes:
‘FLAL-ELTE Concept’ and ‘Experiences about ELTE (past & present)’ that
illustrate the responses to the research questions thematically.
Regarding the first research question (what the sample thought about ELTE
and FLAL), the responses gathered under the theme ‘FLAL-ELTE Concept’
indicated their perceptions related to these concepts. Novice EFL teachers
perceived ELTE as measuring language proficiency and determining the success
or deficiencies of students in the target language. Therefore, they mostly con-
centrated on the testing purposes of achievement and diagnosis. However, even
though they appreciated the importance of ELTE in teaching for several of its
benefits, only few of them were aware of other assessment purposes, such as to
improve learning/teaching. So, it can be inferred that their perceptions were
based on testing rather than assessment. This result was also similar to the stu-
dies which showed EFL teachers thought of assessment as only testing (Berry et
al., 2017; Duboc, 2009; Klinger, 2016; Tsagari, 2013). As Giraldo Aristizabal
(2018) has maintained, beliefs and previous experiences affect the perceptions
of teachers, and testing perception in this study can be attributed to the content
of their ELTE course, which focused on only testing issues; therefore, their
perceptions were shaped according to the training they underwent. This also
reflected in their practice in such a way that they used summative assessment to
determine the product of learning. As Herrera and Macias (2015) have put
forward, testing means summative assessment while ignoring other purposes, as
in this study.
As for the LAL concept itself, few of the novice EFL teachers’ responses
highlighted the dimensions such as ‘knowledge’ and ‘skills’ (Davies, 2008; Hill,
2017), ‘what’ and ‘how’ (Inbar-Lourie, 2008) and ‘practices’ (Fulcher, 2012)
along with some ‘contextual issues’ with respect to ‘local conditions’ (Hill,
2017; Stabler-Havener, 2018). However, when compared to the proposed
components of LAL in the literature, the dimension of ‘principles’ (use of tests
ethically and appropriately) (Davies, 2008; Fulcher, 2012), the element of ‘why’
(reasoning behind assessment) (Inbar-Lourie, 2008), and certain contextual
issues, such as historical and philosophical frameworks that form the origins of
assessment (Fulcher, 2012), were not much discussed. Therefore, it can be
concluded that the novice EFL teachers were not very familiar with what being
assessment literate means, which is similar to the conclusions of Berry et al.’s
(2017), and Semiz and Odabas’ (2016) studies. This finding can be related to
their training content which focused on only knowledge and skills, and thus,
they were not introduced to LAL concept before graduation.
When it comes to perceived levels of LAL, most of the participant novice
EFL teachers felt they were good at assessment, which indicates they had
higher perceived levels of LAL. This finding is different from some studies
which found lower levels of LAL combined with teachers not feeling prepared
to test (Buyukkarci, 2016; Fard & Tabatabei, 2018; Tsagari & Vogt, 2017; Vogt
& Tsagari, 2014). This point of difference can be the result of their belief that
their knowledge of language assessment was good and hence, they felt con-
fident in assessment knowledge though they were confronted with difficulties
in practicing their testing skills.
Considering the second research question (how the sample evaluated their
LAL training as the course-based training (ELTE course) at university), the
responses for the subtheme ‘Previous Experiences about ELTE’ demonstrated
their opinions and experiences of, and their suggestions for the course. Most
of them stated they learned a lot about language testing at the end of the
course, and all the knowledge they gained was useful to their testing knowl-
edge in their teaching. Therefore, they found the course sufficient. It seems
that at the end of such LAL training, novice EFL teachers became aware of
language testing issues, and familiar with certain knowledge about and abil-
ities in language testing. This finding is similar to some studies in the literature
that showed ELTE course provides teachers with a basic training, which
makes them familiar with language testing issues and procedures (Hilden &
Frojdendahl, 2018; Semiz & Odabas, 2016; Turk, 2018). In contrast to the
study of Gebril (2017), which revealed the training was ineffective in devel-
oping LAL of its attenders, this study reached positive impacts. Nevertheless,
when the novice EFL teachers evaluated the ELTE course, they stressed
testing issues, rather than assessments, such as learning how to design multiple
choice tests to measure grammar and vocabulary knowledge. The did not
report anything about formative assessment, assessment for learning, or alter-
native assessment techniques; they just reported the ways summative assess-
ment tests what is learned in the end. Also, they noted that little emphasis on
testing productive language skills was given during the course period. Thus,
these findings can be associated with the content of the course itself and the
book used as the main resource; they were based on language testing topics
and there were not any recent topics in the syllabus. This was also illustrated
by some research studies in the literature that noted that the content of
training was exam-oriented, as in the current study (Hatipoglu, 2015; Lam,
2014). Therefore, as Mede and Atay (2017), and Turk (2018) have suggested,
training should include the topic of formative assessment in order to equip
teachers with using assessment for learning.
Apart from these opinions, novice EFL teachers criticized the course for
being very theoretical and they noted it did not improve their language testing
skills; they just became familiar with how to design tests, and what kind of test
items can be used for each language skill testing. Further, they could not gain
experience in their teaching practice because only their teaching skills were
taken into account in the practicum. Therefore, despite being good in terms of
theory, the course was deprived of practice. Similarly, this finding has also been
much present in the literature; LAL training lacked the issue of developing
practical skills in terms of language assessment (Hatipoglu, 2015; Lam, 2014;
Sariyildiz, 2018; Sheehan & Munro, 2017; Yan et al., 2018).
Most of the participant novice EFL teachers argued there was nothing about
analyzing, interpreting, and evaluating test scores in the course; this topic was
ignored in the training phase. As Hudaya (2017), and Mede and Atay (2017)
have demonstrated training should educate teachers in how to interpret scores
and accordingly, how to give feedback to improve language learning. They also
stressed that class hours were inadequate since there was no time for practice, and
the last semester to take the course was too late. All in all, it seems the course-
based LAL training was beneficial, and the content was satisfying in terms of
theory. However, though novice EFL teachers received training, they felt they
still needed more training to be good assessors of their teaching contexts and to
improve their practical skills, as underlined by some studies in the literature
(Djoub, 2017; Fulcher, 2012; Klinger, 2016). Thus, as some novice EFL teachers
highlighted, in-service training could provide the missing course content.
As for the third research question (how the sample’s training affected their
assessment practices in teaching), the responses for the subtheme ‘Current
Experiences about ELTE’ illustrated whether the novice EFL teachers were
able to use their acquired knowledge and skills in their career life. Most of
novice EFL teachers, in line with the course content, focused on the testing
issues: They exemplified how they paid attention to testing criteria, how they
designed language tests, and how they measured what was learned by using
testing techniques such as multiple choice, true/false statements, and fill-in-the-
blanks. Because the ELTE course content was testing-oriented, it can be
inferred that the novice EFL teachers used mostly language tests and traditional
testing techniques to measure the sum of learning, and thus, they were able to
apply what they learned in the course to their teaching lives. In contrast to
Ukrayinska’s (2018) study, which showed that teachers still had some challenges
in designing language tests even after training, the participants of the present
study were able to properly design their own language tests.
However, similar to those studies in the literature that showed teachers
employed mostly traditional testing tools rather than portfolio or other types of
assessment to measure knowledge of language than language skills (Buyukkarci,
2014; Duboc, 2009; Mede & Atay, 2017; Oz, 2014; Semiz & Odabas, 2016;
Tsagari, 2013; Wach, 2012), novice EFL teachers in this study also used such
testing procedures. This finding can be attributed to the fact that Turkey is an
exam-oriented country and the sample was most familiar with multiple-choice
exams and, as in other studies in the literature (Saka, 2016; Yan et al., 2018), they
were expected to use them. Thus, they preferred such testing tools in their classes
due to the testing policy of Turkey, as well as what they learned in the training.
In addition, though novice EFL teachers reported they had good knowledge
of designing language tests, they also stated they had some difficulty in applying
their tests, due to the local needs of their teaching context. Thus, as some stu-
dies indicated, LAL training should include more discussion of the local con-
ditions in relation to language assessment to assist future teachers (Tsagari &
Vogt, 2017; Vogt & Tsagari, 2014). Moreover, most of them highlighted that
they lacked practice, owing to the fact they could not gain experience during
the course period, and they just started to practice their language testing skills.
They thus had some challenges in relation to this; they could not put their
knowledge into practice as they expected. Several studies in the literature
concluded that most in-service teachers had knowledge of assessment proce-
dures, but could not practice them effectively; so too did the present study
(Giraldo Aristizabal, 2018; Hakim, 2015; Jannati, 2015; Mede & Atay, 2017;
Oz & Atay, 2017; Yan et al., 2018). The need for further training was stressed
in much of the literature (Hatipoglu, 2010; Sariyildiz, 2018; Sheehan &
Munro, 2017; Tsagari & Vogt, 2017; Wach, 2012), therefore novice EFL tea-
chers need much more LAL training in relation to certain topics to be assess-
ment literate.
After all, though the present study revealed the ELTE course provided basic
LAL training, and thus trained teachers in terms of language testing knowledge
and techniques, novice EFL teachers still need much more training to be good
assessors of their teaching contexts and to gain the characteristics of assessment
literate teachers to respond to the needs of their students.
Conclusion
The current study aimed to investigate the perceptions and experiences of
novice EFL teachers about language assessment, and also, the effect of their
teacher education training on their development of LAL. Overall, as LAL
training, the ELTE course was found to be beneficial and helpful because of
knowledge of language testing and skills of how to design language tests were
acquired in the course, and accordingly, applied in professional life. Yet, the
perceptions, experiences, and practices of novice EFL teachers regarding lan-
guage assessment were limited to testing, rather than assessment itself. Further,
the course contributed to theoretical knowledge of language testing, rather
than practical knowledge. Therefore, there are still some gaps to be filled in
order to make novice EFL teachers better in terms of LAL, as demonstrated by
some studies in the literature (Hatipoglu, 2010; Sariyildiz, 2018; Sheehan &
Munro, 2017; Tsagari & Vogt, 2017). For this reason, some pedagogical
implications can be shared.
Since the perceptions of novice EFL teachers regarding language assessment

were mostly shaped in their training years, and then in their experiences in
their teaching lives, course-based LAL training should be redesigned. EFL
teachers need good models of language assessment training early in learning –
that is, in their teacher education years – in order for it to be useful in their
future teaching career, and to shape their LAL perceptions (Herrera & Macias,
2015; Volante & Fazio, 2017). For instance, the course should explicitly study
the terms of testing, assessment, and evaluation, as well as forms and purposes
of testing and assessment other than summative ones. The course should also
discuss and provide examples of traditional testing tools. In this way, novice
EFL teachers may be made more familiar with these terms, and thus poten-
tially increase the chance that other forms of assessment to support learning
will be used.
In addition, other and more dimensions of LAL should be integrated into
training such as reasoning, ethical use, origins, principles, washback effect, and
contextual issues, which were not much mentioned by the sample. To do this,
information about such issues should be given and some exercises should be
performed to better understand them. Much more attention and time should
be devoted to assessing language skills in addition to grammar and vocabulary.
Moreover, new topics can be added to the course to improve teachers in terms
of assessment, such as formative assessment, alternative assessment (i.e. portfolio,
computerized testing, self-assessment, etc.), the Turkish examination system
and policy, ethical concerns about assessment, analyzing, interpreting results and
giving feedback, evaluation tasks, and the like.
Although novice EFL teachers mentioned they learned how to design lan-
guage tests, they could not practice that skill, even in their practicums. There-
fore, both in the ELTE course and in the practicum, the practical side of
assessment should be developed; in other words, both testing and teaching
should be involved in the practicum as Viengsang (2016) has indicated. Thus,
to practice assessment before starting to teach may make EFL teachers more
confident and skilful in terms of language assessment. In brief, it is hoped that
such recommendations will make training better for EFL teachers to enhance
their assessment literacy. Teachers need to know what to test for which pur-
pose, to be able to select, construct, and administer assessment, and to reflect on
assessment results for better assessment practices (Hidri, 2018), and thereby, for
effective teaching opportunities and developed language learning. All of this
leads to higher levels of LAL.
For further research, more empirical studies can be conducted to improve
and observe practicing skills of language assessment. Comparisons between pre-
service and in-service teachers in terms of LAL can be made to obtain a better
understanding, and the effect of teacher educators in the training process may
be investigated. Other types of instruments such as observations and other types
of research designs might be useful to further enlighten understanding of the
concept of LAL.
Notes
1 This paper is based on the doctoral dissertation titled ‘Exploring foreign language
assessment literacy of pre-service English language teachers’.
2 Corresponding Author: Dr. Aylin Sevimel-Sahin, ELT Department, Anadolu University,
Eskisehir, Turkey, aylinsevimel@anadolu.edu.tr
References
Bachman, L. F. (2005). Statistical analyses for language assessment. Cambridge University
Press.
Berger, A. (2012). Creating language-assessment literacy: A model for teacher education.
In J. Hüttner, B. Mehlmauer-Larcher, S. Reichl, & B. Schiftner (Eds.), Theory and
practice in EFL teacher education: Bridging the gap (pp. 57–82). Short Run Press Ltd.
Berry, V., Sheehan, S., & Munro, S. (2017, May 3–5). Exploring teachers’ language
assessment literacy: A social constructivist approach to understanding effective practices [Paper
presentation]. ALTE 6th International Conference Learning and Assessment: Making
the Connections, Bologna, Italy. http://eprints.hud.ac.uk/id/eprint/33342/
Brennan, M. (2015, May 21–23). Building assessment literacy with teachers and students: New
challenges? [Paper presentation]. ACER EPCC Conference, Sydney, Australia. https://
www.acer.org/files/eppc15-Brennan-Building-Assessment-Literacy-with-teachers-a
nd-students2.pptx+&cd=1&hl=tr&ct=clnk&gl=tr&client=firefox-b-d
Buyukkarci, K. (2014). Assessment beliefs and practices of language teachers in primary
education. International Journal of Instruction, 7(1), 107–120.
Buyukkarci, K. (2016). Identifying the areas for English language teacher development: A
study of assessment literacy. Pegem Egitim ve Ogretim Dergisi [Pegem Journal of Education
and Instruction], 6(3), 333–346.
Creswell, J. W. (2007). Qualitative inquiry and research design: Choosing among five approa-
ches (2nd edition). SAGE Publications.
Csépes, I. (2014). Language assessment literacy in English teacher training programmes
in Hungary. In J. Hovarth, & P. Medgyes (Eds.), Studies in honour of Marianne Nikolov
(pp. 399–411). Lingua Franca Csoport.
327–347.
DeLuca, C., & Klinger, D. A. (2010). Assessment literacy development: Identifying gaps
in teacher candidates’ learning. Assessment in Education: Principles, Policy & Practice, 17
(4), 419–438.
Djoub, Z. (2017). Assessment literacy: Beyond teacher practice. In R. Al-Mahrooqi, C.
Coombe, F. Al-Maamari, & V. Thakur (Eds.), Revisiting EFL assessment (pp. 9–27).
Springer International Publishing.
Duboc, A. P. M. (2009). Language assessment and the new literacy studies. Lenguaje, 37
(1), 159–178.
Fard, Z. R., & Tabatabei, O. (2018). Investigating assessment literacy of EFL teachers in
Iran. Journal of Applied Linguistics and Language Research, 5(3), 91–100.
Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2011). How to design and evaluate research
in education (8th edition). McGraw-Hill.
Quarterly, 9(2), 113–132.
Gebril, A. (2017). Language teachers’ conceptions of assessment: An Egyptian perspec-

tive. Teacher Development, 21(1), 81–100.
Giraldo Aristizabal, F. G. (2018). A diagnostic study on teachers’ beliefs and practices in
foreign language assessment. Íkala: Revista de Lenguaje y Cultura, 23(1), 25–44.
Gotch, C. M., & French, B. F. (2014). A systematic review of assessment literacy measures.
Educational Measurement: Issues and Practice, 33(2), 14–18.
Hakim, B. (2015). English language teachers’ ideology of ELT assessment literacy.
International Journal of Education & Literacy Studies, 3(4), 42–48.
Hatipoglu, C. (2010). Summative evaluation of an English language testing and eva-
luation course for future English language teachers in Turkey. English Language Tea-
cher Education and Development (ELTED), 13, 40–51.
Hatipoglu, C. (2015). English language testing and evaluation (ELTE) training in
Turkey: expectations and needs of pre-service English language teachers. ELT
Research Journal, 4(2), 111–128.
Heaton, J. B. (2011). Writing English language tests (new edition). Longman Group.
Herrera, L., & Macias, D. (2015). A call for language assessment literacy in the education
and development of teachers of English as a foreign language. Colombian Applied
Linguistics Journal, 17(2), 302–312.
Hidri, S. (2018). Introduction: State of the art of assessing second language abilities. In
S. Hidri (Ed.), Revisiting the assessment of second language abilities: From theory to practice
(pp. 1–19). Springer International Publishing.
Hilden, R., & Frojdendahl, B. (2018). The dawn of assessment literacy – exploring the
conceptions of Finnish student teachers in foreign languages. Apples – Journal of
Applied Language Studies, 12(1), 1–24.
Hill, K. (2017). Understanding classroom-based assessment practices: A precondition for
teacher assessment literacy. Papers in Language Testing and Assessment, 6(1), 1–17.
Howerton, A. M. (2016). Elephant on a stepladder: an exploration of pre-service English tea-
cher assessment literacy (Publication No. 10240273) [Doctoral dissertation, Northern
Illinois University]. ProQuest Dissertations and Theses Global.
Huang, J., & He, Z. (2016). Exploring assessment literacy. Higher Education of Social Sci-
ence, 11(2), 18–27.
Hudaya, D. W. (2017). Teachers’ assessment literacy in applying principles of language
assessment. Proceedings of Education and Language International Conference, 1(1), 247–260.
Inbar-Lourie, O. (2008). Constructing a language assessment knowledge base: A focus
on language assessment courses. Language Testing, 25(3), 385–402.
Inbar-Lourie, O. (2017). Language assessment literacy. In E. Shohamy, L. Or, & S. May
(Eds.), Language Testing and Assessment (pp. 257–270). Springer International
Publishing.
Inbar-Lourie, O. (2013, November 1–2). Language assessment literacy: What are the ingre-
dients? [Paper presentation]. The 4th EALTA-CBLA SIG Symposium, Nicosia, Cyprus.
http://www.ealta.eu.org/events/CBLAcyprus2013/Lectures_workshops/O_Inba
r-Lourie%20-%20plenary%20-1.pdf
Jannati, S. (2015). ELT teachers’ language assessment literacy: Perceptions and practices.
The International Journal of Research in Teacher Education, 6(2), 26–37.
Jeong, H. (2013). Defining assessment literacy: Is it different for language testers and
non-language testers? Language Testing, 30(3), 345–362.
Jin, Y. (2010). The place of language testing and assessment in the professional pre-
paration of foreign language teachers in China. Language Testing, 27(4), 555–584.
Karagul, B. I., Yuksel, D., & Altay, M. (2017). Assessment and grading practices of EFL
teachers in Turkey. International Journal of Language Academy, 5(5), 168–174.
Khadijeh, B., & Amir, R. (2015). Importance of teachers’ assessment literacy. Interna-
tional Journal of English Language Education, 3(1), 139–146.
Klinger, C. J. T. (2016). EFL professors’ beliefs of assessment practices in an EFL pre-service
teacher training undergraduate program in Colombia (Publication No. 10239483) [Doctoral
dissertation, Southern Illinois University Carbondale]. ProQuest Dissertations and
Theses Global.
Mede, E., & Atay, D. (2017). English language teachers’ assessment literacy: The
Turkish context. Dil Dergisi-Ankara Universitesi TOMER [Language Journal-Ankara
University TOMER], 168(1), 43–60.
Miles, M. B., & Huberman, A. M. (1994). An expanded sourcebook: Qualitative data ana-
lysis (2nd edition). SAGE Publications.
Munoz, A. P., Palacio, M., & Escobar, L. (2012). Teachers’ beliefs about assessment in
an EFL context in Colombia. Profile, 14(1), 143–158.
Newfields, T. (2006). Teacher development and assessment literacy. Authentic Commu-
nication: Proceedings of the 5th Annual JALT Pan-SIG Conference, 48–73. http://hosted.
jalt.org/pansig/2006/PDF/Newfields.pdf
O’Loughlin, K. (2006). Learning about second language assessment: Insights from a
postgraduate student online subject forum. University of Sydney Papers in TESOL, 1,
71–85.
Onalan, O., & Karagul, A. E. (2018). A study on Turkish EFL teachers’ beliefs about
assessment and its different uses in teaching English. Journal of Language and Linguistic
Studies, 14(3), 190–201.
Oz, H. (2014). Turkish teachers’ practices of assessment for learning in the English as a
foreign language classroom. Journal of Language Teaching and Research, 5(4), 775–785.
Oz, S., & Atay, D. (2017). Turkish EFL instructors’ in-class language assessment literacy:
Perceptions and practices. ELT Research Journal, 6(1), 25–44.
Patton, M. Q. (2002). Qualitative research and evaluation methods (3rd edition). SAGE
Publications.
Rea-Dickins, P. (2004). Understanding teachers as agents of assessment. Language Test-
ing, 21(3), 249–258.
Rogier, D. (2014). Assessment literacy: Building a base for better teaching and learning.
English Language Teaching Forum, 3, 2–13.
Sahinkarakas, S. (2012). The role of teaching experience on teachers’ perceptions of
language assessment. Procedia – Social and Behavioral Sciences, 47, 1787–1792.
Saka, F. O. (2016). What do teachers think about testing procedures at schools? Proce-
dia – Social and Behavioral Sciences, 232, 575–582.
Sariyildiz, G. (2018). A study into language assessment literacy of pre-service English as a
foreign language teachers in Turkish context [Unpublished master’s thesis]. Hacettepe
University.
Scarino, A. (2013). Language assessment literacy as self-awareness: Understanding the

role of interpretation in assessment and in teacher learning. Language Testing, 30(3),
309–327.
Semiz, O., & Odabas, K. (2016). Turkish EFL teachers’ familiarity of and perceived
needs for language testing and assessment literacy. Proceedings of the Third International
Linguistics and Language Studies Conference, 66–72. https://www.academia.edu/
34827097/Turkish_EFL_Teachers_Familiarity_of_and_Perceived_Needs_for_Langua
ge_Testing_and_Assesment_Literacy
Sheehan, S., & Munro, S. (2017). Assessment: Attitudes, practices and needs. ELT Research
Papers 17.08. British Council. https://www.teachingenglish.org.uk/sites/teacheng/
files/pub_G239_ELTRA_Sheenan%20and%20Munro_FINAL_web%20v2.pdf
Smith, J. A., Flowers, P., & Larkin, M. (2009). Interpretive phenomenological analysis:
Theory, method and research. SAGE Publications.
Stabler-Havener, M. L. (2018). Defining, conceptualizing, problematizing, and assessing
language teacher assessment literacy. Teachers College, Columbia University Working
Papers in Applied Linguistics & TESOL, 18(1), 1–22.
Stiggins, R. J. (1995). Assessment literacy for the 21st century. Phi Delta Kappan, 77(3),
238–245.
Tsagari, D. (2013). EFL students’ perceptions of assessment in higher education. In D.
Tsagari, S. Papadima-Sophocleous, & S. Ioannou-Georgiou (Eds.), International
experiences in language testing and evaluation (pp. 117–143). Peter Lang.
Tsagari, D., & Vogt, T. (2017). Assessment literacy of foreign language teachers around
Europe: Research, challenges and future prospects. Papers in Language Testing and
Turk, M. (2018). Language assessment training level and perceived training needs of
English language instructors: A mixed methods study [Unpublished master’s thesis].
Bahcesehir University.
Ukrayinska, O. (2018). Developing student teachers’ classroom assessment literacy: The
Ukrainian context. In S. Hidri (Ed.), Revisiting the assessment of second language abilities:
From theory to practice (pp. 351–371). Springer International Publishing.
Viengsang, R. (2016). Exploring pre-service English teachers’ language assessment lit-
eracy. Modern Journal of Language Teaching Methods (MJLTM), 6(5), 432–442.
Volante, L., & Fazio, X. (2007). Exploring teacher candidates’ assessment literacy:
Implications for teacher education reform and professional development. Canadian
Journal of Education, 30(3), 749–770.
Wach, A. (2012). Classroom-based language efficiency assessment: A challenge for EFL
teachers. Glottodidactica, 39(1), 81–92.
White, E. (2009). Are you assessment literate? Some fundamental questions regarding
effective classroom-assessment. OnCUE Journal, 3(1), 3–25.
Yan, X., Zhang, C., & Fan, J. J. (2018). ‘Assessment knowledge is important but …’:
needs of language teachers. System, 74, 158–168.
Yastibas, A. E., & Takkac, M. (2018). Understanding the development of language
assessment literacy. Bingol University Journal of Social Sciences Institute, 8(15), 89–106.
Chapter 9
Teachers’ assessment of
academic writing
Implications for language assessment
literacy
Zulfiqar Ahmad
Introduction
In most academic settings, the course teachers are responsible for creating,
administering, and grading all the course assessment interventions, which include
but are not limited to: quizzes, in-class assignments, portfolio management, and
mid and final term examinations. The course teachers are expected to produce
and report academically reliable accounts of students’ performance as charted out
in curricular, institutional, and national policies. This multifaceted role, coupled
with pedagogic assignments, anticipates a high level of assessment literacy (AL),
which Stiggins (1995, p.240) understands as ‘knowing the difference between
sound and unsound assessment’. Gapsin understanding and executing the princi-
ples of sound assessment are liable to produce inaccurate test results that may be
vulnerable to faulty interpretations and decisions, and may adversely affect the
stakeholders’ perceptions of assessment, more specifically test takers’ perceptions
(Rahimi, Esfandiari & Amini, 2016).
In the field of language teaching, the term language assessment literacy (LAL)
has been introduced to differentiate this specialized form from its more global
variant of AL (Giraldo, 2018). LAL is based on the premise that the raters are
knowledgeable about the language they teach and test, as well as adequately
trained and skilled in the theoretical and practical underpinnings of language
testing (Davies, 2008; Fulcher, 2012; Inbar-Lourie, 2013). Following these
assumptions and Malone (2013), the operational construct for this study has been
situated in teachers’ ability to create and follow appropriate assessment rubrics as
well as grade academic writing, paraphrasing in this case, as closely to the con-
struct of the writing task as is possible.
Several studies report teachers’ lack of suitable training and skills in LAL (Lin,
2014; Popham, 2006), but most of these studies are based only on survey reports
involving different stakeholders related to LAL. One serious limitation of these
type of studies is that they base their findings and conclusions on the perceptual
understanding of the participants without actually analyzing the teachers’ real-life
assessments of any specific language skills. This research gap in LAL prompted the
researcher to use already graded examination scripts as the unit of analysis in order
160 Zulfiqar Ahmad
to find out the appropriateness of exam rubrics, the measurement scale, and tea-
chers’ use of these rubrics and measurement scale. The researcher anticipated that
the relationship of these variables with the test scores would help not only to
identify gaps in assessment practices, but also to foreground implications for the
LAL training of teachers of academic writing in particular and English as a Foreign
Language in general.

The review of literature encompasses theoretical perspectives on LAL, especially in
the context of academic writing; paraphrasing as an academic literacy skill; issues in
the analysis of paraphrasing; and a brief overview of research work on LAL.
Language assessment literacy and assessment of academic writing

One of the primary aims of summative assessment (SA) is to showcase the extent
to which pedagogic interventions have been successful in achieving course learn-
ing objectives (CLO). Considering academic writing to be the most complex and
challenging of the language skills (Nunan cited in Ahmad, 2017b), SA of writing
in academic contexts assumes a special significance, for it unfolds not only the
writing proficiency student writers have achieved at the end of a course, but also
the relevance of the teaching methods, instructional materials, and assessment
practices (Thomas, Allman, & Beech, 2004). Teachers’ lack of competence in
language assessment (Nunan, 1988) may render performance indicators unreliable
and invalid, thereby resulting in negative washback effect, which may challenge
the entire content and delivery design of anacademic writing programme. Fol-
lowing López and Bernal’s (2009) emphasis on training language teachers in LAL,
it seems crucial that writing teachers be trained in LAL, which, though ‘a large and
still developing construct in applied linguistics’ (Giraldo, 2018, p.191), could pro-
vide them with the skillset and knowledge base essential for reliable assessment of
the academic texts. LAL for teachers of academic writing refers to their ability to
analyze and grade writing samples in compliance with assessment rubrics. How-
ever, the term carries much more than this oversimplified view of the assessment
role. Fulcher (2012, p.125) elaborates on this basic premise of LAL as such:
The knowledge, skills, and abilities required to design, develop, maintain, or

evaluate, large scale standardized and/or classroom-based tests, familiarity with
test processes, and awareness of principles and concepts that guide and underpin
practice, including ethics and codes of practice. The ability to place knowledge,
skills, processes, principles, and concepts within wider historical, social, political,
and philosophical frameworks in order to understand why practices have arisen
as they have, and to evaluate the role and impact of testing on society, institu-
tions, and individuals.
Teachers’ assessment 161
LAL thus refers to the execution of skills and knowledge in a way that is
grounded in theory, and which aims toempower the teacher to have a clear
cognizance of his or her role as an assessor of academic writing. The role, which
may appear supra-academic in its orientation, involves an understanding of the
nature, application, and implications of the what, why, when, and how of assess-
ment. What refers to the language trait being assessed, whymeans the purpose of
assessment, when means the learning or course stage when a particular trait should
be tested, and how includes the assessment processes inclusive of test design,
administration, and grading. Designing an academic writing test is primarily the
job of a language assessment specialist or trained writing examiners, as is done in
large scale standardized tests such as the International English Language Testing
System (IELTS) and Test of English as a Foreign Language (TOEFL). But in
most academic contexts it is the teachers of academic writing who have the
responsibility of managing all the essentials of assessment. This indicates the need
for training in test construction, assessment rubrics, and consistent measurement
of the writing tasks. Owing to contextual variations and curricular preferences, it
is hard to establish a workable construct for writing (Weigle cited in Ahmad,
2019), and even a small deviation from contextual parameters, which are situated
in institutional policies and course objectives, can adversely affect the purpose of
assessment as well as performance of the teachers as raters. It is equally important
to ascertain the timeframe for assessment or the learning stage when a particular
assessment intervention is to be used.
LAL also prioritizes the rationale for assessment so that the teachers must know
what they are assessing for and why in this specific way. However, the most sig-
nificant dimension of LAL seems to be its focus on the ‘how’ of assessment which
entails holistic yet rationalistic implementation of the skills and knowledge
received through training and experience. LAL expects the raters to be able to
produce a reliable and valid assessment of the writing sample despite individual
differences. Stiggins (cited in Herrera &Macias, 2015, p.307) bases his notion of
LAL competence on the following benchmarks: (a) identifying clear rationale for
assessment, (b) explicitly stating anticipated outcomes, (c) using appropriate assess-
ment strategies and methods, (d) designing reliable assessment items, rubrics, and
sampling, (e) eliminating rater bias, (f) reporting the results honestly, and (g)
employing assessment as a pedagogic tool.
Researchers (e.g. Lin & Su, 2015; Sultana, 2019, etc.) have identified a
lack of appropriate training in LAL among English as a Foreign Language
(EFL) teachers. Mai (2019, p.104), for instance, argues that ‘most teachers of
English at all levels of language education still face the challenge of identi-
fying “criteria” for writing assessment scales’. Issues like this can have serious
implications for the performance of teachers of academic writing who are
responsible for designing and grading writing exams. Most teacher training
programmes do include a module on language testing and assessment but
they are not comprehensive enough to equip the teachers, especially novi-
ces, to confidentlyundertake language assessment. There is a dearth of both
162 Zulfiqar Ahmad
pre-service and in-service training in language assessment with the result

that the theoretical base that is developed in academic degree programmes
such as the MA in Teaching English to Speakers of Other Languages
(TESOL) or Applied Linguistics is not suitably refined for the practicum in
real-life teaching and assessment situations.
Unskilled or inappropriate assessment by teachers can raze the entire assessment
edifice to the ground. A teacher who is not knowledgeable enough to identify, for
instance, the misuse of cohesive devices or a lack of coherence in the text cannot
effectively assess student writing. Teachers have also been found to adopt holistic
assessment when analytic or criterion-based assessment was required. This causes
visible deviations from the assessment rubrics with the result that, not only are the
immediate assessment cases rendered ineffective, but the other measures of course
assessment and evaluation become unreliable. In some cases, the test design or the
assessment rubrics are flawed. There have been huge gaps in the writing construct
and assessment rubrics with the consequence that theteacher assessment cannot
produce an accurate picture of the students’ proficiency in academic writing. The
issues with the assessment of academic writing revealdeficiencies in LAL among
those involved in assessment. One obvious implication is the need to identify the
LAL needs of the writing teachers so that they canbe trained in the dynamics of
assessment. The present study proposes to use samples of students’ paraphrases
assessed summatively to find out the LAL needs of the teachers engaged in the
assessment of academic writing.
Research studies on LAL

Most studies on LAL issues use surveys to gauge perceptions which focus mainly
on the knowledge base in regard to language assessment (e.g. Bailey & Brown,
1996; Fulcher, 2012). Plake and Impara (1997), for instance, used their 35-item
survey on 555 teachers across the US and found alarmingly low levels of profi-
ciency in language assessment. Similar findings were reported by Campbell,
Murphy, and Holt (2002) and Mertler (2004), who employed Plake and Impara’s
(1997) framework. Jin (cited in Lin &Su, 2015), in a comprehensive analysis of
language testing programmes across Chinese universities, found the assessment
perspective being relegated in favour of the testing and measurement perspective.
Lin and Su (2015) applied Coombe et al.’s (2007) framework to measure LAL of
Chinese EFL teachers. The results revealed statistically non-significant variations in
regard to teacher experience and training in assessment.
In another study, Kalajahi and Abdullah (2016) surveyed LAL levels of 65
Malaysian university teachers and concluded that the participants lacked
adequate training in assessment literacy. In Turkey, Öz1 and Atay (2017,
p.25) interviewed EFL teachers to collect their perceptions about in-class
language assessment and its implication for actual practice. They found tea-
chers generally knowledgeable about the theoretical aspects of assessment
but found disparities between ‘assessment literacy and classroom reflection’.
In Tunisia, Hidri (2016) investigated university and secondary school tea-

chers’ conceptions of assessment using a three-factor inventory on LAL and
concluded that in this context they had conflicting and fuzzy ideas about
LAL, and that in most of it, assessment meant irrelevance.
Similarly, Ölmezer-Öztürk and Aydın (2018) used their own frame-
work, Language Assessment Knowledge Scale (LAKS), containing 60 items
with four constructs to gauge the assessment skills and knowledge of 542
Turkish university teachers. The participants received less than half of the
total score, which significantly questioned their skill and knowledge base
in assessment matters. Janatifar and Marandi (2018) used Fulchers’ (2012)
LAL survey to identify characteristics of LAL in Iranian EFL settings. The
participants mentioned deficiency in LAL and felt the need for hands-on
training in practical language assessment apart from theoretical issues.
Mellati and Khademi (2018) used teachers’ assessment literacy inventory, semi-
structured interview, non-participatory observation, and the Writing Compe-
tence Rating Scale to measure the impact of Iranian teachers LAL competence in
assessing writing programmes. The study found significant associations between
teachers’ LAL competence and students’ writing achievement. A study by Sultana
(2019, p.1) of the LAL levels of Bangladeshi EFL teachers indicated ‘how the
inadequate academic and professional testing background of teachers hindered
their performance in conducting assessment-related tasks and contributed to their
limitations in the use of assessments to improve teaching’. Ölmezer-Öztürk and
Aydın (2019) found that a lack of training in LAL in both pre-service and in-
service Turkish EFL teachers was the main reason for low levels of language
assessment knowledge, and found that the instructors felt insufficiently equipped
to assess the individual language skills competently.
Paraphrasing as an academic literacy skill

Paraphrasing is an important academic literacy skill that aims at transforming the
source text (ST) into a meaningfully compatible text with due acknowledgement
to the original source. A paraphrase can be identified as a substantial paraphrase, a
patchwriting paraphrase, a superficial paraphrase, or a completely inaccurate para-
phrase (Sun & Yang, 2015). Paraphrases that employ only specific terminology or
general words that are frequently used in the ST are substantial paraphrases (Keck,
2006). Patchwriting, on the other hand, is a paraphrasing strategy which employs
direct copying from the ST and then omitting a few words or phrases, transform-
ing syntactic structures or providing synonyms for individual content words
(Howard, 1995). The extent of direct borrowing determines if the paraphrasing is
superficial or not. Some scholars (Roig, 1999; Shi, 2004) consider the borrowing
of five or more consecutive words to be a superficial paraphrase. Inaccurate or
unacceptable paraphrase can be understood as the unchanged borrowing from the
ST either of individual words or syntax (Oshima & Hogue, 1999), or it could be a
semantically deviated or ambiguous replica of the ST.
164 Zulfiqar Ahmad
Paraphrasing can be assumed as the backbone of academic research as

authors frequently paraphrase relevant research to substantiate, integrate, and
support their argument . Hard to master (Yamada, 2003), paraphrasing as an
academic literacy skill showcases the discourse competence of writers in terms
of their ability to use a variety of syntactic structures an morphological pat-
terns, and lexical range and diversity. More importantly, paraphrasing reveals
the extent to which a writer comprehends the text and reports it to maintain
originality of meaning and content. Following Shi, Fazel, and Kowkabi
(2018), expert writers are also expected to incorporate their authorial stance
into the paraphrased text alongside the ST. Strict adherence to the prescribed
academic conventions is the norm and any deviations can question the aca-
demic integrity of the writer or the researcher.
Issues with the assessment of paraphrasing

Assessment of paraphrasing is a challenging task, especially in the absence of
a ‘consensus on what constitutes a good paraphrase’ (Shi cited in Shi et al.,
2018, p.32). Paraphrasing a ST involves a thematically and semantically
equivalent text; however, variations in the paraphrased text length can cause
certain issues which may affect test scores, as has been revealed in the stu-
dies by Kennedy and Thorp (2007) and Mayor et al. (2007) where higher
scores were statistically positively correlated with longer text and clause
length respectively. Though there is no prescribed length for the paraphrase
in regard to the ST, a much smaller sample is likely to be read as a sum-
mary or in some cases a précis of the ST. Paraphrases from novice writers
with shorter text length may miss important ideas or content from the ST.
Then there is the issue of developing an assessment scale for measuring the
presence or absence of ideas. The paraphrased text itself can lend itself to
varying levels of interpretation and grading in the absence of measurement
criteria which segregate different types of paraphrases and remove rater bias
to ensure optimum levels of reliability. The quality of paraphrase is directly
linked with the writer’s comprehension of the ST (Sun, 2012), and a
patchwriting or superficial paraphrase, despite language errors, can be mean-
ingfully appropriate or otherwise. How to assess and grade understanding of
the ST, especially in relation to substantial patchwriting and superficial para-
phrasing, could become a daunting task for the assessors. The assessment criteria
must operationalize the notion of plagiarism, and clearly determine the extent
to which direct borrowing of the ST is permissible. Similarly, the issues of
language use, lexical range, and other discourse features should receive due
place in the scheme of assessment. All of these, and a few unanticipated issues
that might crop up during the process of actual assessment, put the onus of
responsibility on the assessors. Without proper training in LAL, chances are that
the assessors would resort to their subjective preferences and may deviate from
the rubrics, and thereby measure inconsistently.
Aims and significance

There is no study, especially in the Arab EFL context, which presents an
analysis of academic literacy skill – paraphrasing in this case – to find gaps
in assessment practices from an LAL perspective. With this gap in focus, the
researcher proposed to use samples of paraphrases of novice EFL under-
graduate students to see how they had been assessed in compliance with the
assessment criteria and what gaps had been left in the assessment perfor-
mance of the raters which could provoke a need for training in LAL. More
specifically, the researcher set the following aims to find out the implica-
tions of teacher-led assessment of academic writing for LAL:
i the extent to which the assessment criteria and rubrics provide for appro-
priate analysis of students’ paraphrasing skills
ii the extent to which the test scores correlate with the assessment criteria and
rubrics
Method
This section of the chapter details the participants and research context of the
study, the characteristics of the writing samples collected for paraphrase analysis,
and the analytical procedures adopted for analysis of the data.
The participants and the research context

The study was conducted at Yanbu English Language Institute (YELI), Yanbu
Al-Sinaiyah, Saudi Arabia. YELI provides English language training to the Saudi
male and female students enrolled in various science and technology, business
studies, and humanities programmes at the Preparatory Year, the Associate, and
the undergraduate levels. The participants of the present study were male
undergraduate students taking a two-semester mandatory Academic Writing
course based on Oshima and Hogue’s (2006) book. Before starting this academic
writing course, the participants had already completed the Preparatory Year as
well as the Associate Level English Language courses in the same institute. The
course was taught by qualified language instructors who were recruited from
across the globe on the basis of their qualifications, experience, and suitability for
the teaching context. Apart from the course delivery, they were also responsible
for the course assessment and evaluation. The academic writing programme at
YELI trained students in writing academic essays and acquiring academic literacy
skills such as paraphrasing and academic writing conventions. The learner
achievement was assessed through summative assessment which included in-class
writing assignments, as well as mid- and final-term examinations. The students’
performance was evaluated on a grade/point system, and the results were for-
mally communicated and later reflected in their transcripts.
166 Zulfiqar Ahmad
The writing samples

Following Best and Kahn (cited in Ahmad, 2017a), that a sample size of n
= 30 or more can yield significant results, the sample paraphrases (n = 55)
were randomly collected from the YELI. The task – a ST of 207 words
(Appendix A) was an excerpt from Chase (cited in Verma, 2015, p.491).
Saudi EFL undergraduate students had produced these paraphrases in
response to a final-term examination question which aimed to test their
proficiency in academic literacy skills and academic writing through para-
phrasing a ST. The examination scripts were assessed by the teachers of
academic writing on this course based on the assessment criteria and rub-
rics detailed in Table 9.1:
Table 9.1: Assessment Criteria and Rubrics

Standard Not Progressing2-3 Proficient4-5 Exemplary6-7
Met0-1
Para- The text is Student is in Student uses Student uses
phrasingcon- plagiarized due minor viola- effective para- effective para-
tents-expres- to a major vio- tion of one of phrasing strate- phrasing stra-
sion& lation of para- the paraphras- gies and does tegies, does
Plagiarism phrasing rules. ing rules not violate any not violate
(order, phras- of the para- any of the
ing, ideas), but phrasing rules paraphrasing
the text cannot (order, phras- rules (order,
be considered ing, ideas), but phrasing,
plagiarized. the paraphrased ideas), and
ORThe para- text is not develops a
phrased text completely smooth, nat-
fits in the Pro- smooth and ural sounding
ficient cate- controlled. paraphrase of
gory, but is the original
awkward and text with
fairly grade level
uncontrolled. appropriate
conventions.
0-0.5 1 2
Grammar & Serious and Errors in There are There are few
Spellings numerous mechanics, some errors in or no errors in
errors in usage, gram- mechanics, mechanics,
mechanics, mar, or spelling usage, gram- usage, gram-
usage, gram- interfere with mar, or spelling mar, or
mar, or spelling the audience’s spelling.
block the understanding
audience’s of the process
understanding
of the process.
Analytical procedures
The first step after the paraphrase samples had been collected was to type the
hand written student writing in a Word document with all the errorsintact to
maintain originality and transparency. Each text was allotted a code, and word
length and exam score were recorded for later analysis. The assessment rubrics
had four performance descriptors which were graded on four-point criteria –
order, paraphrasing, ideas, and language use (grammar and mechanics). The
next step was to devise a measurement scale because the exam scripts had been
marked holistically with a rounded score for the overall performance instead of
the four-point criteria stated in the rubrics. Because the focus of the study was
to investigate how the teachers had assessed the sample paraphrases and not the
grammatical issues, the researcher decided not to analyze language problems in
view of the absence of specific marks for the language use, and to consider the
teacher-awarded scores to account for the three measurement criteria, namely:
order, paraphrasing, and ideas. However, a few interventions had to be intro-
duced to facilitate the analytical process. The researcher developed a template
which was used to segregate and analyze the sample paraphrases by the criteria
of: order, paraphrasing, and ideas. Since all paraphrases followed the order of
the ST, no further analysis was done.
For paraphrasing, the rubrics were found to be vague and ambiguous as they
did not provide for a systematic scale or criteria which could be used to analyze
teachers’ assessment of paraphrasing. Therefore, the researcher had to first
establish a text length which could be considered a paraphrase. Two groups of
paraphrases were identified – paraphrases with 150 or more words were
assumed to be a reliable text-length equivalent of the ST, and paraphrases with
149 or lesser words were assumed to be either summaries of the ST or an
unreliable version of the ST. To find out if the sample texts were substantial,
superficial, patchwriting, or inaccurate, the written samples were analyzed for
these paraphrasing standards based on the difference between the original and
the plagiarized parts of the text. For analysis, plagiarism was operationalized to
be the incidence and frequency of five or more consecutive words (Shi, 2012;
Sun & Yang, 2015) from the ST or repetition of the ST words with minor
changes in word order. The last measurement criteria – order– was analyzed
based on the count of missed ideas per text. The samples were also analyzed for
correlation of the exam scores with the performance descriptors. The texts
were also analyzedfor citing the source in compliance with the academic
conventions.
Statistical Package for the Social Sciences (SPSS) was used to obtain descriptive
statistics for the word length, exam scores, paraphrasing, ideas, and missed ideas.
Percentage scores were obtained for these variables as well as for citations given
or not, and for the four performance descriptors. Non-parametric correlation
analysis was also done to ascertain the presence of any statistically significant
correlations between different variable of the study.
168 Zulfiqar Ahmad
Results
The paraphrases had been examined on the four-point assessment criteria
(order, paraphrasing, ideas, and language use) which had been set for the ori-
ginal assessment at the research site. Since all the paraphrases (n=55) were
found to adhere to the order of the STs, no further analysis was conducted. As
for the paraphrasing, all the paraphrased texts were both patchwriting and
inaccurate. However, the third descriptor (i.e. ‘ideas’) was segregated between
complete and incomplete paraphrases to allow for further analysis. A little more
than half of the paraphrases failed to achieve the operationalized word length
for this study. Percentage scores reveal that 47.27% of the paraphrases were 150
or more words, whereas 52.72% of the paraphrases were found to be in the
range of 50 to 149 words. The major reason for this seems to be the number of
ideas that had been dropped by the students in their attempt to paraphrase the
ST. Only 20% of the paraphrases restated all of the ideas from the ST. 21.81%
of paraphrases were found to have missed 3 ideas while 16.36% of paraphrases
had 2 and 4 dropped ideas respectively. 33.89% of the paraphrased texts were
plagiarized, with the minimum being.75% and the maximum 87.37%.
The students’ test scores ranged from 4 to 8 out of 10. 36.36% of the para-
phrases were awarded 6 followed by 16.36% of the texts awarded 6.5 and 7
points, and 12.72% by paraphrases awarded either 5 or 5.5 points. 60.09% of
paraphrases were found to be ‘Proficient’ with the score range from 6 to 7.
Following the exam rubrics, 20.09% of the paraphrases with the score of 5.5,
6.5, 7.5, and 8, could not be identified with any of the performance descrip-
tors. 87.27% of the paraphrases did not cite the source of the ST.
SPSS was used to obtain descriptive statistics and correlation analysis for the
text length, test scores, paraphrasing, and the missing ideas. The results for the
Text Length (TL), Test Score (TS), Paraphrasing (PP), and Missed Ideas (MI)
were found to be M = 153.91; SD = 34.576, M = 6.08; SD =.744, M =
52.16; SD = 36.156 and M = 3.15; SD = 2.360 respectively. These figures
indicate that the paraphrased texts varied considerably in their length in com-
parison with the ST. Most of the TS range was, however, closer to the mean.
On the other hand, PP and MI were unevenly dispersed among the corpus of
the paraphrased texts and thus illustrated why most of the paraphrases were not
closer to the word length of the ST i.e. 207 words. Spearman’s rho (rs) failed to
identify any statistically significant association between the variables except for
between the TL and PP, rs =700; p =.01; the statistically negative one between
TL and MI, rs =-.712; p =.01, and PP and MI, rs = -.435; p =.01.
The results for the Text Length Range 1(TLR1) 50 to 149 words per para-
phrase, TLR1, Test Score Range 1 (TSR1), Paraphrasing Range 1 (PPR1), and
Missed Ideas Range 1 (MIR1) had M = 124.73; SD = 22.326, M = 6.15; SD
=.822, M = 27.73; SD = 15.517 and M = 4.58; SD = 2.230 respectively.
Spearman’s rho test of correlation failed to find any statistical relationship among
these variables. On the other hand, the descriptive statistics for the same variables
in Text Length Range 2 (TLR2) with 150 or more words were found to be M =
180.07; SD = 19.007, M = 6.02; SD =.675, M = 74.72; SD = 35.425 and M =
1.86; SD = 1.642 respectively. Spearman’s rho was negatively significant between
the TLR2 and Missed Ideas Range 2 (MIR2), rs = -.602; p =.01. The results
indicated that the word length and the plagiarized text did not affect the test scores
in the two groups; however, texts with shorter word range had more missing ideas
than the texts with 150 or more words. Similarly, the missing ideas did not seem to
determine the text scores as there was a fraction of a difference between the mean
scores of the two groups. The results for the second group also revealedthat the
higher the number of words, the lesser the number of missing ideas.
Discussion
This section of the study focuses on the discussion about the relevance of the
assessment criteria, rubrics, test scores, and teachers’ performance as assessors to
figure out the implications for LAL.
The results of the paraphrase analysis reveal serious shortcomings both in the
assessment criteria and the assessment process. These findings support Mellati and
Khademi (2018) that teachers’ assessment literacy affects students’ writing per-
formance results. The standards set as performance descriptors are both vague and
ambiguous to the extent that they do not permit uniform and reliable assessment
of the students’ paraphrases. There is no descriptor category for the score range of
5.5 and it is not clear if this score should be considered ‘Proficient’ or ‘Exemp-
lary’ in terms of performance description. The same is true of the score range for
7.5 and above. In addition, the labelling of the descriptors into ‘Standard not
met’, ‘Progressing’, ‘Proficient’, and ‘Exemplary’ may well describe linguistic
competence but not paraphrasing as an academic literacy skill. There is no such
explanation for paraphrasing in the research studies done on the subject. Fol-
lowing Shi (2012) and Sun and Yang (2015), paraphrasing is either substantial, or
patchwriting, or superficial, or inappropriate. The following excerpts in Table 9.2
from students’ paraphrasing illustrate the point:
Table 9.2: Examples of Sample Paraphrasing

Source Text Paraphrasing Classification
Critical care nurses function in a Nurses play on important role in substantial
hierarchy of roles. the staff structure of clinics.
This person oversees the hour- The role of this person is to patchwriting
by-hour functioning of the unit monitor the hour-by-hour func-
as a whole tioning of the unit as a whole
On each shift a nurse assumes the Role of resource nurse can be superficial
role of a resource nurse. assumed by nurse in every shift.
Critical care nurses function in a The hierarchy of roles in critical inaccurate
hierarchy of roles. care nurses function.
170 Zulfiqar Ahmad
The paraphrasing rules of ‘order’, phrasing’,and ‘ideas’ do not adequately

provide for a reliable assessment of the students’ ability to paraphrase. Since the
students are formally taught how to paraphrase, they maintain the order of the
ST intact, as all the participants did in this study. Itis the same case with ‘ideas’,
which possibly refers to the content of the ST. A missed idea does not reveal
students’paraphrasingability, though it may reflect the task completion compo-
nent of the exam. The most relevant descriptor is the ‘paraphrasing’,which
should have been further segregated into assessment scales to properly ascertain
students’paraphrasingability. There is no mention of, for example, any pointer
that measures students’ understanding of the ST. The grade description is vague
in the sense that the use of words like ‘awkward’,‘fairly uncontrolled’, ‘smooth’,
etc. are open to rater bias at the cost of reliable assessment. The exam rubrics
carry a separate criterion for ‘grammar and spelling’and do not provide for
other features of the written discourse. It is also not clear how these types of
errors ‘interfere with the audience’s understanding of the process’ as well as
what ‘the process’ is (Appendix A). There was no provision in the measure-
ment scale to distinguish a paraphrase from a summary. Texts ofmuch smaller
word lengththan the ST were assessed on the same criteria as the texts with
higher word length without any significant effect on the exam scores.
Following the assessment criteria, the paraphrased texts should have been
marked for the individual descriptors with clear differentiation of points for the
grammar and spelling component. Conversely, all the texts were assessed in
clear violation of the assessment criteria as the teachers resorted to subjective
grading by allocating a rounded score to every paraphrase. These findings col-
late with Popham’s (2009) study which reported teachers using subjective
marking while assessing productive skills such as the writing. The findings also
support Öz and Atay (2017, p.39) whose participants did not use any ‘table of
specification’ to assess in-class performance and depended solely on their
‘instinctive judgments’.
This not only challenges the teachers’assessment practices but also makes the
test scores unreliable. The main issue with assessment standards, as revealed by
the results, is that the teachers could not differentiate between paraphrasing and
summary writing. This overlooking of an important aspect of paraphrasing also
affects the correlation of the four assessment descriptors with the exam scores.
In most cases, there is no statistical correlation between the degree of com-
pleteness of the paraphrases and the exam scores awarded. Also,in some cases a
text with a shorter length has a higher score than the texts with longer word
length. Despite the fact that there is an absence of a provision for text-length as
an assessment descriptor in the assessment criteria, it is evident from the analysis
that text length affected the grades awarded, and thus, teachers did not follow
the rubrics on an analytical or criterion-based framework. Paraphrasing as an
academic literacy skill entails that teachers have cognizance of what constitutes
a paraphrase in terms of text length. Similarly, the teachers do not seem to
appropriately identify issues which, though not explicitly stated in the rubrics,
form an essential feature of any writing product, especially the paraphrasing.

For example, in many instances, the paraphrased texts deviated from the ST in
terms of meaning, which reflects students’ poor understanding and interpreta-
tion of the ST. However, these texts with comprehension issues do not differ
in their test score from those who show better understanding of the ST in the
paraphrased version.
The lack of quality and uniformity in assessment procedures as witnessed in
the sample texts has serious implications for the assessment systems, especially
from the LAL perspectives. Incompetence in introducing and processing a
transparent and reliable assessment design can adversely affect not only students’
grades but also pedagogic input, institutional policies, and wider educational
objectives. The samples of teacher assessment for the present study cannot be
used to provide feedback to the students; however they can be used as a very
effective tool of feedback to ascertain students’ academic literacy development
and language proficiency, as well as the course evaluation processes which may
include appraisal of the teaching quality and instructional materials.
The assessment results for the present study offer a few suggestions for the
training of academic writing teachers who are also involved in the assessment
process in the dynamics of LAL. Following Mellati and Khademi (2018, p.15),
the training programmes in LAL should base themselves in imparting knowl-
edge related to assessment objectives, content and methods, measurement,
identification of errors, teacher-led feedback, interpretation and communica-
tion of results, learners’ involvement in the assessment process, and assessment
ethics. However, simply exposing teachers to the theoretical underpinnings of
LAL does not equip them with the skills essential for satisfactory assessment.
Very rigorous training in the practical matters of assessment must go hand in
hand with theoretical knowledge. The teachers should be engaged in devel-
oping test rubrics and measurement scales, designing test items, grading and
reporting the results, and evaluating the course outcomes, all under the super-
vision of experts in the field of language assessment. Importantly though, the
‘interpretative framework’ should determine any LAL training programme
prioritizing ‘the teacher’s teaching context, social perspectives, beliefs, and
understandings’ (Scarino cited in Sultana, 2019, p.12).
The study is not without its limitations. First, it was conducted in one
institution with a limited number of sample texts. A larger sample size from a
diverse and extended population may produce more generalizable results.
Second, it is likely that teachers in other English as a Foreign Language
(EFL)/English as a Second Language (ESL) contexts will have much more
sophisticated exam rubrics and training in LAL. They may produce different
results from that of the present study. Third, the participants were all EFL
student writers who were receiving training in academic literacy and learning
how to write. Analyses of paraphrases by expert writers, or students with
higher levels of academic literacy and discourse competence, may yield dif-
ferent results, and thereby different levels of assessment practices. The samples
172 Zulfiqar Ahmad
from the present study were examination scripts and the paraphrases were
treated as an exam activity. A study which collects samples of paraphrases
from, for instance, research articles or term papers may reflect a different
response both in terms of student performance and rater assessment. The
study also did not include teachers’ and students’ perceptions. The relation-
ship between teachers’ and students’ beliefs can provide further insights into
the matters encompassing LAL.
Conclusion
Paraphrasing is an important academic literacy skill, and following Ahmad
(2019, p.279), students’ exposure to ‘the contemporary practices in the domain
of academic literacy’ helps them to ‘gain membership of their specific discourse
community’– one of the very basic aims of academic writing programmes.
Such aims cannot be materialized if the language assessment system is not
properly supported by background training of the assessors in LAL. Lack of
competence in LAL can affect teachers’ judgment and decisions, which in turn
can challenge the academic veracity of students’ results, course objectives,
course assessment and evaluation, and broader institutional, social, and national
policies. A lot is expected from teachers in terms of course delivery and assess-
ment. They must be facilitated through awareness-raising and practical training
programmes in LAL, both at the pre- and in-service levels for the benefit of the
learners and the academia.
Appendix A: Source text

Critical care nurses function in a hierarchy of roles. In this open heart surgery
unit, the nurse manager hires and fires the nursing personnel. The nurse man-
ager does not directly care for patients but follows the progress of unusual or
long-term patients. On each shift a nurse assumes the role of a resource nurse.
This person oversees the hour-by-hour functioning of the unit as a whole, such
as considering expected admissions and discharges of patients, ascertaining that
beds are available for patients in the operating room, and covering sick calls.
Resource nurses also take a patient assignment. They are the most experienced
of all nurses. The nurse clinician has a separate job description and provides for
quality of care by orienting new staff, developing unit policies, and providing
direct support where needed, such as assisting in emergency situations. The
clinical nurse specialist in this unit is mostly involved with teaching in orienting
new staff. The nurse manager, nurse clinician, and clinical nurse specialist are
the designated experts. They do not take patient assignments. The resource
nurse is seen as both a caregiver and a resource to other caregivers … Staff
nurses have a hierarchy of seniority…Staff nurses are assigned to patients to
provide all their nursing care. (Chase, 1995 p.156)
References
Ahmad, Z. (2017a). Academic text formation: Perceptual dichotomy between pedago-
gic and learning experiences. Journal of American Academic Research, 5(4), 39–52.
Ahmad, Z. (2019). Analyzing argumentative essay as an academic genre on assessment
frameworks of IELTS and TOEFL. In S. Hidri (Ed.), English language teaching research
in the Middle East and North Africa: Multiple perspectives (pp. 279–299). Palgrave
Macmillan.
Ahmad, Z. (2017b). Empowering EFL learners through a needs-based academic writing
course design. International Journal of English Language Teaching, 5(9), 59–82.
Bailey, K. M., & Brown, J. D. (1996). Language testing courses: What are they? In A.
Cumming, & R. Berwick (Eds.), Validation in language testing (pp. 236–256). Multi-
lingual Matters.
Campbell, C., Murphy, J. A., & Holt, J. K. (2002). Psychometric analysis of an assessment
literacy instrument: Applicability to preservice teachers. Paper presented at the Annual
Meeting of the Mid-Western Educational Research Association, Columbus, OH.
Coombe, C., Davidson, P., O’Sullivan, B., & Stoynoff, S. (Eds.), (2012). The Cambridge
guide to second language assessment. Cambridge University Press.
327–347.
Quarterly, 9(2), 113–132.
Giraldo, F. (2018). Language assessment literacy: implications for language teachers.
Herrera, L., & Macías, D. (2015). A call for language assessment literacy in the education
and development of teachers of English as a foreign language. Colombian Applied
Linguistics Journal, 17(2), 302–312.
Howard, R. M. (1995). Plagiarism, authorships, and the academic penalty. College Eng-
lish, 57, 788–806.
Inbar-Lourie, O. (2013). Language assessment literacy. In C. A.Chapelle (Ed.), The
encyclopedia of applied linguistics (pp. 2923–2931). Blackwell.
Janatifar, M., & Marandi, S. S. (2018). Iranian EFL teachers’ language assessment literacy
(LAL) under an assessing lens. Applied Research on English Language, 7(3), 307–328.
Kalajahi, S. A. R., &Abdullah A. N. (2016). Assessing assessment literacy and practice-
samong lecturers. Pedagogika/Pedagogy, 124(4), 232–248.
Keck, C. (2006). The use of paraphrase in summary writing: A comparison of L1 and L2
writers. Journal of Second Language Writing, 15, 261–278.
Kennedy, C., & Thorp, D. (2007). A corpus-based investigation of linguistic responses
to an IELTS academic writing task. In L. Taylor, & P. Falvey (Eds.), Studies in lan-
guage testing: IELTS collected papers – Research into speaking and writing assessment (Vol.
19, pp. 316–379). Cambridge University Press.
Lin, D. (2014). A study on Chinese middle school English teachers’ assessment literacy
[Unpublished Doctoral Dissertation]. Beijing Normal University.
Lin, D., & Su, Y. (2015). An investigation of Chinese middle school in-service English
teachers’ assessment literacy. Indonesian EFL Journal, 1(1), 1–10.
174 Zulfiqar Ahmad
López, A., & Bernal, R. (2009). Language testing in Colombia: A call for more teacher
education and teacher training in language assessment. Profile: Issues in Teachers’ Pro-
fessional Development, 11(2), 55–70.
Mai, D. T. (2019). A review of theories and research into second language writing and
assessment criteria. VNU Journal of Foreign Studies, 35(3), 104–126.
Malone, M. E. (2013). The essentials of assessment literacy: Contrasts between testers
and users. Language Testing, 30(3), 329–344.
Mayor, B., Hewings, A., North, S., Swann, J., & Coffin, C. (2007). A linguistic analysis
of Chinese and Greek L1 scripts for IELTS academic writing task 2. In L. Taylor, &
P. Falvey (Eds.), Studies in Language Testing: IELTS collected papers – Research in speak-
ing and writing assessment (Vol. 19, pp. 250–314). Cambridge University Press.
Mellati, M., & Khademi, M. (2018). Exploring teachers’ assessment literacy: Impact on
learners’ writing achievements and implications for teacher development. Australian
Journal of Teacher Education, 43(6), 1–18.
Mertler, C. A. (2004). Secondary teachers’ assessment literacy: Does classroom experi-
ence make a difference? American Secondary Education, 33(1), 49–64.
Nunan, D. (1988). The learner centred curriculum. A study in second language teaching.
Ölmezer-Öztürk, E., & Aydın, B. (2018). Investigating language assessment knowledge
of EFL teachers. Hacettepe University Journal of Education, 34(3), 602–620.
Ölmezer-Öztürk, E., & Aydın, B. (2019). Voices of EFL teachers as assessors: Their
opinions and needs regarding language assessment. Eğitimde Nitel Araştırmalar Dergisi–
Journal of Qualitative Research in Education, 7(1), 373–390.
Oshima, A., & Hogue, A. (1999). Writing academic English (3rd edition). Addison-Wesley
Publishing Company.
Oshima, A., & Hogue, A. (2006). Writing academic English (4th edition). Longman.
Öz, S., & Atay, D. (2017). Turkish EFL instructors’ in-class language assessment literacy:
perceptions and practices. ELT Research Journal, 6(1), 25–44.
Plake, B. S., & Impara, J. C. (1997). Teacher assessment literacy: What do teachers
know about assessment? In G. D. Phye (Ed.), Handbook of classroom assessment: Learn-
ing, achievement, and adjustment (pp. 53–68). Academic Press.
Popham, W. J. (2006). All about accountability: A dose of assessment literacy. Improving
Professional Practice, 63(6), 84–85.
Into Practice, 48, 4–11.
Rahimi, F., Esfandiari, M. R., & Amini, M. (2016). An overview of studies conducted
on washback, impact and validity. Studies in Literature and Language, 13(4), 6–14.
Roig, M. (1999). When college students’ attempts at paraphrasing become instances of
potential plagiarism. Psychological Reports, 84, 973–982.
Shi, L. (2012). Rewriting and paraphrasing source texts in second language writing.
Journal of Second Language Writing, 21, 134–148.
Shi, L. (2004). Textual borrowing in second language writing. Written Communication,
21, 171–200.
Shi, L., Fazel, I., & Kowkabi, N. (2018). Paraphrasing to transform knowledge in
advanced graduate student writing. English for Specific Purposes, 51, 33–44.
Stiggins, R. J. (1995). Assessment literacy for the 21st century. The Phi Delta Kappan, 77
(3), 238–245.
Sultana, N. (2019). Language assessment literacy: An uncharted area for the English
language teachers in Bangladesh. Language Testing in Asia, 9(1), 1–14.
Sun, Y. C. (2012). Does text readability matter? A study of paraphrasing and plagiarism
in English as a foreign language writing context. The Asia-Pacific Education Researcher,
21, 296–306.
Sun, Y. C., & Yang, F. Y. (2015). Uncovering published authors’ text-borrowing
practices:Paraphrasing strategies, sources, and self-plagiarism. Journal of English for Aca-
demic Purposes, 20, 224–236.
Thomas, J., Allman, C., & Beech, M. (2004). Assessment for the diverse classroom: A
handbook for teachers. Florida Department of Education, Bureau of Exceptional Edu-
cation and Student Services. http://www.fldoe.org/ese/pdf/assess_diverse.pdf
Verma, S. (2015). Technical communication for engineers. Vikas Publishing.
Yamada, K. (2003). What prevents ESL/EFL writers from avoiding plagiarism? Analyses
of 10 North-American college websites. System, 31, 247–258.
Chapter 10
Reliability of classroom-based
assessment as perceived by
university managers, teachers,
and students
Olga Kvasova and Vyacheslav Shovkovy
Introduction
After Ukrainian higher education joined the Bologna process in 2005,
decision-makers report that all university curricula have been redesigned in
compliance with modules and credits. Generally, such redesign is accompanied
by the development of a national quality assurance system to ascertain that the
level of education is of the required standard. Therefore, the introduction of
modules and credits has critically increased the role of the summative assess-
ment of levels attained by students at the end of each academic course and at
graduation.
The recent British Council (BC) report on the state of teaching English for
specific purposes in Ukraine reveals that the Bologna requirements to define
English language curriculum modules and credits have been implemented par-
tially, whereas the development of a meaningful quality assurance system is still
pending (Bolitho & West, 2017). One aspect of the issue is that the evidence of
students’ achievements is not based on the external (standardized) tests, which
impedes comparability of results across various institutions in the country.
Another such aspect refers to the quality of internal, institutional assess-
ment, which in actual fact substitutes external quality assurance, and is
therefore setting-specific. Focusing on institutional quality assurance, the
experts concluded that there were generally poor standards of tests and
examinations resulting from a lack of testing and assessment expertise in
those who prepare assessment materials.
In Ukraine, test preparation is solely the responsibility of instructors since no
test development units, with specially trained staff, have been included in uni-
versities as yet. Nationwide, summative tests are constructed by teachers whose
major function is to teach and implement assessment for learning; this allows us
to refer to teacher-constructed summative tests as ‘classroom-based summative
tests’. The authorship of summative tests raises concerns about the reliability of
information regarding attained language levels, which is primarily required of
assessment. Decision-making based on inaccurate information may have far-
reaching consequences on education policies in the national scope.
Reliability of classroom-based assessment 177
This inference is confirmed by the BC researchers who argue that ‘there is

a pressing need for training in modern, valid testing and assessment proce-
dures to enable teachers to feel confident in assessing their students against
international standards, and to assure the Ministry that standards are being
achieved’ (Bolitho & West, 2017, p. 77). Since language testing courses have
been only recently introduced in the curricula of several teacher-training and
classical universities, an apparent lack of a national psychometrician cadre will
remain critical in the short and medium terms. The issue of serving teachers’
assessment literacy (TAL) is therefore viewed as an imperative under the cir-
cumstances (Hidri, 2016).
The authors of this chapter, who work for the Department of Language
Teaching Methodology, is engaged in the development and implementation of
pre-service teacher training programmes. The Department has pioneered stand-
alone courses on language testing and assessment (LTA) to undergraduate and
postgraduate students (Kvasova, under review), and has an aspiration to pro-
mote establishment of a national scientific school of LTA. The Department’s
staff make up a core of the Ukrainian Association for Language Testing and
Assessment (UALTA) which promotes TAL across the country by organizing
workshops involving international and local experts.
With the view to contribute to the development of a reliable system of
assessment in higher education, we undertook an examination of summative
assessment practices in several Ukrainian universities. The survey involved
central stakeholders of assessment – university managers (organizers and super-
visors), teachers (assessors) and students (assessees). The analysis of the responses
aimed to reveal the areas allowing for the compatibility of perceptions and the
identification of threats to reliability of measurement, as well as to diagnose
what resources to improve reliability are available today, and how they could
be enhanced in the foreseeable future.
Review of literature
Summative assessment has the purpose of reporting on learning achieved at a
certain time, therefore its accuracy and objectivity cannot be questioned. In
Western tertiary education systems, summative assessments have long been
used to meet the increased demands for accountability. These assessments are
regular, systematic, rational, and formalized, and have provided plentiful rea-
sons for appraisal and critique. The mandatory character of summative assess-
ment is opposed to the continuous, informal character of formative assessment
and its emphasis on promoting better learning. As Houston & Thompson
(2017) point out ‘[f]ormative (feedback) assessment is intended to help stu-
dents with future learning, whereas summative (feedout) assessment warrants
or certifies student achievements to others, including potential employers’
(p.2). Lau (2017) challenges the artificially created dichotomy of ‘summative’
and ‘formative’ assessment wherein summative is bad and formative is good,
178 Olga Kvasova and Vyacheslav Shovkovy
and advocates for the idea that formative and summative assessment need to
work in harmony without being opposed to each other. Brown (2019) goes
so far in his argument by asserting that formative assessment – assessment for
learning – is a meaningful teaching framework rather than assessment the
major function of which is ‘verifiability for its legitimacy as a tool for deci-
sion-making’. How fair are all these claims? It is worthwhile considering the
purposes of assessment specified in official documents on assessment practices
in the contexts where quality assurance is well-established.
Among the four purposes of assessment classified in the code of practice
for assessment offered by the UK Quality Assurance Agency (QAA), the
pedagogy-related purpose of ‘providing students with feedback to promote
their learning’ is given priority; this is largely pertaining to formative
assessment. The next two purposes – measurement (‘evaluation of student
skills’) and standardization (‘providing a mark or grade to establish the level
of a student’s performance’) – seem to serve both formative and summative
assessments. The certification purpose (‘communicating to the public the
level of individual achievement as reflecting the academic standard’) is
overtly pertinent to summative assessment (QAA, 2006 as cited in Norton,
2009 p.134). Despite the seeming balance amongst assessment purposes, in
Western higher education, assessment of learning predominates over assess-
ment for learning. However, more attention has been recently placed on the
complementary characteristics of formative and summative assessments
(Houston & Thompson, 2017), on the synergy of these two types of
assessment differing in form and function (Carless, 2006), as well as transi-
tion to alternative forms of summative assessment that are better compliant
with the requirements of 21th century education (HEA, 2012).
In Ukraine, on the contrary, formative assessment has always been deep-
rooted in classroom practices and is currently being implemented through a
variety of traditional and innovative methods (Dovgopolova, 2011; Shadrina,
2014; Olendr, 2015). In their dedication to assessment for learning (Kvasova &
Kavytska, 2014), Ukrainian teaches share beliefs revealed in Muñoz et al.’s
study (2012): According to this research, teachers view assessment as a means of
improving students’ performance and teaching methods rather than a route
towards accountability and certification. Nevertheless, a teacher’s professional
duties of evaluating learners’ achievements at the end of learning a subject or
course have always been implemented in Ukrainian education. The shift of
focus to assessment for reporting has considerably increased teachers’ workload
and responsibilities, while lack of hands-on recommendations on the develop-
ment of these high stakes summative tests have raised concerns of teachers, who
are the immediate actors of assessment.
The difference in function and use of the two types of assessment is
explained by Harlen (2007). She argues that in the course of assessment for
learning the major goal pursued by educators is promoting students’ learning.
In this case, evidence of progress is frequently expressed in grades. Reliability of
formative assessment is important but not paramount since teachers’ judgements

are informed in multiple ways. When it comes to making decisions about stu-
dents’ achievements, with the grades affecting educational paths and opportu-
nities of students, the reliability becomes crucial. Harlen (2007) reiterates that
‘the use of assessment results for external purposes demands that they are seen
to be as “fair” as possible. Fairness in this context refers to technical reliability
or the extent to which results can be said to be of acceptable consistency or
accuracy for a particular use’ (p.4).
In standardized testing, reliability is universally defined as the consistency of
scores or test results and is viewed as a fundamental criterion of a good test
(Alderson et al., 1995; Bachman & Palmer, 1996; McNamara, 2000; Hughes,
2003; Fulcher & Davidson, 2007; Douglas, 2010). The bulk of language
testing theory discusses the interdependence of reliability and validity wherein
validity refers primarily to the quality of a test and reliability is ‘rather a
characteristic of test scores obtained from a given test administration or
administrations’ (Chapelle, 2013, p.4918). It is universally acknowledged that
a good test is the one that provides reliable information on language
achievements and/or proficiency. However, we assume that in classroom
contexts, reliability of judgements is directly dependent on aspects of test
administration and scoring.
Fulcher and Davidson (2007) question whether the concept ‘reliability’
adopted in standardized testing and viewed as an attribute of norm-referenced
assessment may be directly applied in classroom-based, criterion-referenced
assessment. They associate reliability in the latter with ‘trustworthiness’, thus
opposing technical and everyday meanings of the word ‘reliability’. They fur-
ther specify the differences between:
Large-scale and classroom assessments, defining the former as construct-irrelevant

and the latter as ‘non-construct irrelevant, but directly relevant to assessment of
learners in a particular setting’ (p.25).
Test tasks included in traditional tests (requiring numerous scores on indepen-
dent tasks to arrive at a decision about candidate’s proficiency level) and those
in classroom assessment whose role is primarily to assess the current abilities of
the learners.
The roles of an impartial tester and a person immediately involved in teaching
and concerned with learners’ achievements.
Purposes of these assessment (to certify a proficiency level and award a grade to
ascertain a learner’s place in the cumulative achievement of the class).
Type of evidence collected in each case (independent in large-scale testing and
performance-based, mostly collaborative in the classroom setting).
Generalizability of score meanings (calculation of reliability coefficient using

statistics in large-scale testing vs establishing if decisions made about learners’
achievements are proper and fair (Fulcher & Davidson, 2007).
This discussion leads to an understanding that, within education reliability,

what helps ensure that assessments accurately measure student language level is
a key test property.
Clearly, standardized and teacher-constructed tests belong to different mea-
surement paradigms; however, it is assessment of learning where the locus of
tensions between them is, as Green argues (2014). He maintains that classroom-
based assessment depends on a plenitude of variables, which in itself impedes
objective measurement. None the less, he further agrees with Harlen (2007)
that the results obtained on summative classroom tests may become as reliable
as on standardized tests provided scoring criteria are clear, teachers trained, and
their implementing assessment is properly moderated.
To date, there exists evidence suggesting that validity and reliability of
assessments developed by teachers are rather low (Harlen, 2004; Gareis et
al., 2015). Keeping that in mind, we find it reasonable to consider the
factors that have negative effects on the reliability of assessments. These are
the factors related to the quality of test (consistency of test formats, content
of questions, length of test, timing), administrative factors (appropriacy of
settings and test administration procedure), and affective factors (students’
ability to perform well despite test anxiety/fatigue, personal characteristics)
(Coombe et al., 2007). Recommendations on raising reliability of assess-
ments are summed up by Green (2014). These recommendations mostly
concern standardized or locally standardized testing: i.e. prepare longer tests,
focus on limited scope of skills in one test, employ multiple measures,
employ more scorers and raters, assess learners with varied levels of skills.
There are, however, recommendations which are not only workable in
classroom context, but which seem to present a rigorous requirement of it:
Make tasks clear and unambiguous, standardize the test taking conditions,
and control how the assessments are scored.
As was noted above, validity and reliability are closely interwoven, although
a universal maxim tells us that a valid test is always reliable, but a reliable test is
not necessarily valid. In classroom conditions, validating a test to ensure its
quality as an instrument of measurement requires certain expertise and intel-
lectual effort, and consumes the time of teachers who are notoriously over-
worked. Although theoretical and practical investment in the enhancement of
teachers’ assessment literacy has recently increased (Taylor, 2009; McNamara &
Hill, 2011; Fulcher, 2012; Inbar-Lourie, 2013; Pill & Harding, 2013; Vogt &
Tsagari, 2014; Green, 2016; Giraldo, 2018; Tsagari et al., 2018), our own
experience suggests that language testing remains, as Alderson put it, an ‘arcane’
and ‘scary’ area to most practicing teachers (Alderson, 1999). Green (2014)
asserts that following a few basic principles can help improve assessment
practices and raise teacher assessment literacy. These principles are ‘Planning
and Reflection [that] lead to Improvement, when supported by Cooperation
and informed by Evidence’ (Green 2014, p. 21). In other words, in classroom
conditions where the assessment literacy of teachers is not generally very high,
it is mandatory to collaborate on all stages of test development, administration,
and analysis.
Following this line of thought, and stimulated by Coombe et al.’s (2007)
suggestion that ‘it is easier to assess reliability than validity’ (p. xxiv), we con-
ceived of a research project examining issues of the reliability–trustworthiness
of summative assessments in university conditions. Examining reliability
through sophisticated statistical analysis is hardly possible in Ukraine, where
LTA is still in its infancy; despite this, an empirical, qualitative study of relia-
bility of classroom summative assessments is quite feasible. In the following
section of this chapter, we will provide the research rationale and methods of
investigation employed in our study.
Current study
Research rationale
The ultimate goal of this research was to explore the practices of summative
assessment adopted in Ukrainian universities with the view to establish the degree
of reliability–trustworthiness of assessment results. To this end, the study intended
to survey the experiences of three central groups of stakeholders – university
managers, teachers, and students – involved in organization, implementation, and
decision making based on summative assessment. We adopted a working
hypothesis that reliability of summative tests in universities may be directly
dependent on the actual level of TAL. It was also expected that the survey
would offer insights into the ways to maximize teacher training in LTA. Our
initial task, in this respect, was to determine all aspects of classroom summative
assessment that are empirically observable and assessable.
We proceeded from the assumption that reliable university summative
assessment should first be uniform for specific groups of learners (year of stu-
dies, specialty). It means it should aim at measuring the level of the same skills
that are determined in curricula, use the same testing techniques, be collected
within the same procedure (written test paper, timing), and be graded based
on the same criteria. Second, the process should adhere to the test develop-
ment cycle the major stages of which are planning (defining the test con-
struct) and the collegial choice of testing techniques, followed by item/tasks
writing, pre-testing, and test modification. The quality of the developed test
should be necessarily assured by those managers or teachers whose level of
assessment literacy is higher than average. Third, test administration procedure
should be strictly followed as far as real-life educational context allows. We
refer to maintaining test transparency (informing students about what is going
to be measured on the test) and test security (preparing more than one variant
of test papers, conducting assessments within relatively close dates, ensuring
academic honesty). No less important are accuracy and timeliness of grading
test papers, documenting and reporting test results, and analysis of the evi-
dence collected. The post-administration assurance of the test quality is also
viewed as advisable, if not mandatory, in terms of promoting the develop-
ment of better, valid tests in the future. Fourth, since the summative assessment
that we explore is classroom-based or teacher-constructed, it should be com-
plemented by feedback provided to learners particularly in terms of mid-term
assessment. Feedback should be timely and effective, otherwise it is useless
(Coombe et al., 2007). Feedback may and should impact students’ determination
to learn better. Fifth, washback, or feedback from assessees on their satisfaction
with the fairness of test results, should be monitored and regulated. So far, the
studies show that the lowest level of student satisfaction refers to grades and feed-
back (Norton, 2009; HEA, 2012). Finally, the evidence collected through sum-
mative assessment should be supported by multiple measures, such as alternative
forms of assessment. This possibility puts classroom-based, context-relevant assess-
ment at an advantage over large-scale testing which is absolutely context-free.
Given internationally determined perspectives to involve alternative types of
assessment in the function of summative assessment (HEA, 2012), these forms of
assessment should also become integrated in the summative classroom-based
assessment in Ukraine.
The above reflections resulted in distinguishing pre-requisites to reliable sum-
mative assessment, such as:
1 Uniformity of requirements towards summative assessment. In Ukrainian

education, this is understood as coordination and consistency amongst the
regularity of test administration and the reporting of test results, equivalence
of tests set to all cohorts of students that were taught the same curriculum,
and uniformity of test administration procedure.
2 Adherence to the principles that ensure development of a valid test (test
development cycle). 3) Providing feedback.
3 Monitoring washback.
4 Use of alternative assessment.
5 Prospects of enhancing quality of summative tests, aka teachers’ assessment
literacy.
The research questions in this study are:
1 To what extent do the perceptions of reliability–trustworthiness by central

stakeholders overlap?
2 To what extent do the actual procedures of test development and admin-
istration enable reliable summative assessment?
3 What ways of enhancing TAL are perceived as effective in today’s Ukrai-

nian higher education?
To resolve these questions, we employed a qualitative method of inves-

tigation – surveying the perceptions of summative assessment by three
groups of respondents. The three questionnaires included questions that
were centred around the pre-requisites to reliable summative assessment
identified by us. In each questionnaire, the questions reflected the actual
conception of assessments typical of and relevant to different groups of
respondents. We aimed to elicit information related to all specified pre-
requisites from different perspectives, therefore, some of the questions
overlapped in two or three types of questionnaire, which allowed for
comparability of the perceptions.
Implementation
Methods of research
The questionnaires were prepared in consultation with specialists from the
Academy of Higher Education of Ukraine; before administering the survey, we
pre-tested the questionnaires with the help of university managers (3), fellow
teachers (6), and students (15).
Questionnaire 1 was intended for university managers. It purported to
elicit information and personal perceptions of the aspects that immediately
reflected responsibilities of the organizers and supervisors of summative
assessment within their departments, as well as the person accountable for
assessment results. This questionnaire consisted of 22 questions including 20
with the option ‘own answer’, 1 open-ended, and 1 requesting rank-
ordering. Several questions concerned uniformity, test development, and
quality assurance issues. There were also questions focused on the develop-
ment and administration of mid-term tests and end-of course tests, which
are fairly high-stake assessments for many students. The two later questions
inquired about the prospects of improving the quality of summative tests
through enhancing TAL.
Questionnaire 2 was designed for teachers and included 28 questions (27
with the option ‘own answer’ and 1 requesting rank-ordering). Part of the
questions coincided with those aimed at university managers (uniformity of
test papers and administration procedures, test preparation procedure)
although they were formulated from a somewhat different perspective – of
the staff responsible for maintaining uniformity of tests and administration, as
well as for test development and ensuring its quality/validity. Another part of
the questions reflected teachers’ practices in terms of feedback provision and
use of alternative assessments. Teachers were also invited to share their per-
ceptions of possible washback. Question 27 inquired about the forms of
training in assessment literacy actually received by respondents. Question 28

inquired about the prospects of enhancing TAL and the formats preferred by
respondents.
Questionnaire 3, consisting of 20 questions with the option ‘own answer’,
was meant to elicit information about summative assessments from students
as major stakeholders in the learning and assessment process. This ques-
tionnaire included the questions that could elicit students’ perceptions of the
uniformity of test papers and test administration. Other questions inquired
about effectiveness of feedback, use of alternative assessments and possible
washback. The focus of some questions was placed on students’ satisfaction
with their results on summative tests. So, the questionnaires were designed
in a way that allowed us to correlate and cross-check the information about
the major pre-requisites to fair and equitable summative assessment.
Participants
The participants in the survey were three distinct though interconnected
groups of respondents: 1) university managers (UMs), 2) teachers (Ts), and
3) students (Sts). Eleven institutions from all regions of the country were
involved in the survey (Western, Southern, Eastern, and Central parts of
Ukraine), which made our sample fairly representative in terms of reflecting
local practices. Although we collected many more responses than initially
planned, we had to exclude a considerable number of inaccurately com-
pleted questionnaires. In the end, we processed 10 questionnaires of UMs,
50 of those collected from Ts, and 50 questionnaires completed by Sts.
Participation in the survey was anonymous (excluding the UMs) and
voluntary. To ensure full comprehension of the questions by all groups of
respondents, questionnaires were formulated in Ukrainian, with the meta-
language excluded.
Data collection
The survey was conducted in paper-and-pencil format in the autumn of 2018.
The sets of responses that arrived from each university contained responses
provided by: one university manager (head of department), 5–10 teachers
working for those departments and 5–10 students who were taught by those
teachers. Consequently, the responses collected from three groups of respon-
dents in each of the ten institutions allowed the researchers to note salient
features of the assessment practices adopted in each local context. Although it
would not be difficult to align all three groups of evidence and arrive at certain
conclusions, for ethical reasons we did not do that, thus leaving the original
scope of the study unchanged.
Results and discussion

In this section of the chapter we will present and interpret the findings in line with
the previously determined pre-requisites: uniformity of requirements towards sum-
mative assessment; adherence to the principles that ensure development of a valid
test (test development cycle); feedback; washback; alternative assessment; actual
TAL; and preferred formats of enhancing quality of summative tests and TAL.
Uniformity of requirements to summative assessment is viewed by us as the
major issue that promotes fairness of assessments across all cohorts of assessees in
the local context; therefore, the issue was considered from three perspectives of
the stakeholders. While the UMs and Ts were asked direct questions about the
uniformity, the students were asked in an indirect way which could cross-check
the perceptions by the two other categories of the respondents (educators). As
had been expected by us, both UMs and Ts confirmed the existence of uni-
form requirements; at the same time, a much smaller percentage of Sts (61%)
revealed their sensitivity towards tests being identical or different for all cohorts
of assessees, whether tests existed in several variants and were conducted on the
same day for all students.
A more specific question, related to uniformity of test administration procedure,
was targeted at educators only: 100% UMs and 79% Ts confirmed that tests were
administered in the equally proper way across all courses. When asked if they
thought that the procedure was followed by all teachers, both respondent groups’
beliefs appeared much more moderate, with a slight variance. UMs believed that
50% of the staff never violated the procedure while 50% did. In Ts’ opinion, 52%
of their fellow teachers always administered tests by the rules, with only 6% (cf.
50% mentioned by UMs) violating them and 39% doing so occasionally.
We may imply that the educators perceived the requirement to ultimately
conform to the rules yielded their confident responses to a direct question;
while answering an indirectly formulated question they expressed their perso-
nal, non-prejudiced point of view, thus revealing the existing practices. Sts’
perceptions of the uniformity of assessment were not affected by the educators’
established views on job discipline, which suggested that the issue of uniformity
was essential for them to a certain extent.
As was discussed in Research rationale, it is more likely for any test, especially a
high-stakes summative one, to be valid if its preparation proceeded via all
mandatory stages of the test development cycle. Therefore, in the educator ques-
tionnaires, a special focus was placed on aspects of collaborative test design and
its further quality assurance.
Table 10.1 Perceptions of uniformity of requirements to

summative assessments
Managers Teachers Students
100% 97% 61%
According to UMs, the construct of mid-term tests (MTT) and end-of-course

tests (ECT) was discussed and determined by the department staff in 60% and
80% of incidences, respectively. We explain the bigger amount of collegial
effort required for ECT preparation by the higher degree of educators’
responsibility in decision-making. Furthermore, we specifically aimed to find
out who exactly participated in construct defining. According to UMs, these
people most frequently included unit leaders (70%), as well as all staff of the
department (30%). As far as the Ts are concerned, they claimed that 79% of
them participated in discussing and determining the construct of either the
MTT or ECT. The involvement of all staff in defining the test construct, in
our view, served a sound precondition to ensuring construct relevance to cur-
ricula. Moreover, in case some teachers had not covered certain aspects of the
curriculum in their teaching, they still had time to bridge that gap in order to
eliminate possible construct under-representation; otherwise, both test validity
and reliability would be put at stake.
Like defining the test construct, test development is another stage of the testing
cycle that needs collaboration. The responses collected from UMs suggested
that the test development process in 80% of incidences engaged unit leaders,
whereas 20% of respondents admitted that tests were prepared by each teacher
independently; the data are quite compatible with the responses of Ts who
claimed they were engaged in MTT (91%) and ECT (70%) development.
Nonetheless, the practice of each teacher independently developing tests, as
mentioned in 20% of UMs’ responses, is in obvious conflict with their previous
claims about uniformity of summative tests.
It was also interesting to find out whether teacher-constructed tests
underwent quality assurance and what staff were involved in that. As UMs
admitted, the quality of an MTT was assured in 50% of incidences, and the
quality of an ECT in 70%. These percentages do not seem sufficient for
ensuring validity and sustained quality of high-stakes tests although we
cannot compare these percentages with the data of any other research.
Furthermore, the responses collected from UMs revealed that quality assur-
ance was carried out almost totally by individuals: Head of department
(60%), Deputy Head (10%); in 30% of incidences the quality was monitored
by individual unit leaders. Ts’ participation was limited to 30% of perma-
nent and 24% of occasional engagement in pre-testing and further discus-
sion of the required improvements.
A similarly low level of collaboration was observed regarding ECT quality
assurance although it appeared more representative; apart from Heads, Deputy
Heads, and unit leaders, experienced teachers, external reviewers were also
involved. Regrettably, half of the respondent teachers had never contributed to
quality assurance of an ECT, whereas 27 % were engaged occasionally, with
only 21% participating in an ECT quality check. These data, to our mind,
predominantly reflect aspects of UMs’ malpractices, such as: lack of assessment
literacy, disregard of test quality owing to test developers’ credibility, or an
attempt to keep tests secure until their administration. Presumably, enhanced

TAL would help resolve this situation irrespective of its specific cause/causes.
The question in Table 10.2 was a direct one; it was aimed at all groups of
respondents and purported to elicit the degree of satisfaction with the test quality.
The responses revealed that the lowest degree of satisfaction was voiced by
UMs – 30%. The majority of Ts were totally (70%) and partially (27%) satis-
fied, and only 3% totally dissatisfied with the test quality. Sts appeared the most
loyal stakeholder group: 58% of them were totally and 42% partially satisfied
with the test quality which reflects their overall belief in actual validity of the
summative tests that were set for them.
Another direct question aimed to reveal the respondents’ perceptions of
the tests as valid or not; the question was formulated differently for Ts (if
‘tests measure what is defined by the curriculum’), and for Sts (if ‘tests
measure what you were taught’). The responses from the two groups were
considerably different: 73% of teachers and 54% of students thought that all
tests set at the department were valid whereas 27% of teacher respondents
and 44% of students were inclined to question the validity of all tests.
Could such mismatch be accounted for by the gap between the teachers’
perceived confidence in the construct relevance and the actual construct
overrepresentation perceived by the test takers? Does this imply the neces-
sity to build assessment literacy of both categories of the assessment agents,
as Pill & Harding (2013) state? These questions could become a focus of
further studies.
Evidence of the educators’ understanding of test quality and ways to
enhancement it was complemented by the responses to the following two
questions. When asked if the quality of the tests needed in-depth analysis,
70% of UMs confirmed this need, whereas 52% Ts noted they were quite
content with the present situation. While all UMs stated that the test quality
should be improved, less than a half of Ts agreed with them. Here we raise
the issue of the teachers’ possible indifference, if not apathy, for the
improvement of assessment instruments. However, the teachers managed to
provide clear answers to the question about tests’ obvious drawbacks: In their
view, tests had irrelevant task difficulty (61%), scoring criteria (36%), and task
validity (33%), as well as inconsistent structure and an irrelevant number of
tasks (21%). Again, UMs’ responses were more focused: 67% claimed tests
Table 10.2 Satisfaction with the test quality

Respondent groups Always Not always Never
Managers 30% 70%
Teachers 70% 27% 3%
Students 58% 42%
lacked relevant scoring criteria, 60% noted mismatches in difficulty of tasks,

and 50% questioned task validity.
Here we need to note that in Ukrainian universities summative tests are not
necessarily locally developed. In considerable number of cases, Ukrainian teachers
use ready-made tests offered by the coursebook authors with or without adapting
them, or compile tests from available test tasks which, in their view, are relevant to
the learners’ language level. The use of such tests entails certain risks, e.g. irrelevant
test construct, structure, timing, etc. From this perspective, use/adaptation of
ready-made tests may also be demanding for teachers (Hasselgreen et al., 2004;
Vogt & Tsagari, 2014) who should be specifically trained in it.
The question about the test results was addressed by us from three perspec-
tives. As far as UMs’ perception is concerned, only 20% of the respondents
believed that the measurements were objective, with the majority (70%) taking
a different view. Ts and Sts displayed almost identical perceptions: 58% in both
groups of respondents believed that the assessments were objective and reliable,
with 36% and 34% respectively admitting that tests did not always yield fair
results. However, 50% of Sts claimed they were not always satisfied with their
test scores obtained either on an MTT or an ECT. Frequent dissatisfaction with
the test results was also expressed by the majority of UMs (70%). This group of
respondents, as interpretation of the data suggests, was the most consistent in
their judgement about summative assessments of which they were in charge.
The data revealing Ts’ satisfactions with the test results could be interpreted as
possible discontent with the test scores obtained through an unreliable instru-
ment of assessment, or as a result of mismatch between the amount of effort
put into teaching and the documented outcomes of learning, or of displeasure
with students’ not meeting their expectations (e. g. underperformance of
advanced learners, multiple evidence of dishonesty/cheating).
Such pre-requisites to implementing reliable assessments as feedback, wash-
back, and use of alternative assessment were surveyed primarily from the per-
spectives of Ts and Sts. The responses to the majority of questions revealed
identical perceptions. For instance, we managed to determine that feedback
tended to be timely in most incidences; teachers provided both oral expla-
nation and detailed advice on improving performance. Surprisingly, a lower
Table 10.3 Perceptions of test objectivity vs satisfaction with test results/grades (%)
Respondent Always satisfied Not always satisfied Never satisfied
groups
test objectivity test results test test test test
objectivity results objectivity results
Managers 20 20 70 70 10 10
Teachers 58 55 36 37 8
Students 58 40 34 50 10
percentage of Ts (21%) than Sts (34%) mentioned written comments on

tests. Similar favourable responses testified to the absence of a negative
impact of the assessment results on students’ further attitudes to learning.
Yet conversely, we admit that alternative assessments had not become a
widely adopted practice in all surveyed institutions. The challenges of
implementing alternative assessments in the classroom are recognized and
experienced globally (Carless, 2007; Douglas, 2010; Meletiadou & Tsagari,
2016), which reduces the prospects of employing them as universal tools of
summative assessment in the near future.
The findings of the survey allowed us to identify the ways towards enhance-
ment of reliability–trustworthiness of summative assessments in universities. The
responses to open-ended questions obtained from UMs resonated with one
another in that they stressed the necessity of raising the teachers’ awareness in LTA,
of motivating teachers’ willingness to enhance their own assessment literacy, of
encouraging team work/collaboration across test design, of developing and pre-
testing, as well as of engaging teachers in assessment modification based on the
analysis of performance, and feedback from other teachers and students.
Responding to the question about their actual level of assessment literacy,
the Ts indicated all opportunities which promoted their training in LTA.
Among them the most frequent were participation in the seminars held at
departments by their staff (88%) and in workshops by visiting international
experts (64%), self-study (64%), participation in conference(s) on LTA
(48%), distant courses (39%), traineeship in Ukrainian universities (30%), etc.
Figure 10.1 visually represents that the teachers had a variety of opportunities
to enhance their AL, although in the majority of cases (88%) they gained
100
90
80
70
60
50
40
30
20
10
0
Short-É
WorkshoÉ
DistantÉ
TraineeshÉ
TraineeshÉ
StaffÉ
StaffÉ
Other
Self-study
Figure 10.1 Training in LTA received by respondent teachers (%)

knowledge about LTA via the staff seminars conducted by teachers at the same
rank as themselves. 64% of the respondents managed to participate in the
workshops by visiting international experts and another 64% improved their
TAL through self-study. All other opportunities had been taken by less than
50% of respondents.
The training experiences of the respondents, although being far from sys-
tematic, allowed the Ts to express their considered opinion about the most
effective formats. Below is the ranking of formats in order of effectiveness:
Table 10.4 Ranking of training format in order of effectiveness

Rank Format Score
1 Traineeship abroad 4,97
2 Workshops by international experts 4,2
3 Workshops by national experts 3,59
4 Short-term courses 3,25
5 Staff seminars 3,23
6 LTA conferences 3,19
7 Traineeship in Ukraine 2,8
8 Longer courses 2,54
9 Self-study 2,19
10 Distant courses 1,76
11 Other 0
As is seen in Table 10.4, the respondents associated the most effective way to
enhance TAL was with traineeship abroad; the 2nd and 3rd preferred formats
included workshops conducted in Ukraine by experts in LTA, as well as workshops
led by Ukrainian experts. We explain such appreciation of these formats by looking
at ongoing processes in Ukrainian education. The increased job responsibilities
related to frequent summative assessment put numerous questions to teachers with-
out clear answers provided by national policy makers. At the same time, four recent
years witnessed a growth of opportunities for teachers to participate in the training
events organized by UALTA; it is probably due to the high standards set by the
invited experts that heightened teachers’ expectations for international collaboration.
Staff seminars, which had been mentioned in the responses to the previous
question as the most accessible format, and conferences, which had been sustain-
able leaders among scholarly conventions, were considerably downgraded by the
respondents. The issue of whether or not this resulted from Ts’ discontent with the
staff seminars’ insufficient informativeness, or the scarcity and insufficient quality of
research into LTA reported at conferences, as well as the conference format in
itself, needs to be ascertained by a special survey of academia.
None the less, we believe it is quite within our grasp to account for an increased
rating of short-term courses in LTA; moreover, we hypothesized it. Although only
20% of the respondents had an experience of short-term courses in LTA, Ts
assumed that intensive short-term training in LTA would be effective and ranked
it 4th. On the one hand, this is indicative of the recently established practices of
conducting week-long winter/summer schools for teachers in the country. On the
other hand, the idea of training in LTA meets urgent demands for implementing
reliable–trustworthy assessments in higher education. Additionally, such short-
term courses have been piloted by us and were found quite effective (Kvasova,
2016). On the contrary, longer-term courses, as well as traineeship in Ukrainian
universities, distant courses, and self-study, did not meet the respondents’ expec-
tations in terms of their effectiveness, placing them 7th–10th in the rating. We can
assume, however, that these formats were perceived as quite demanding when it
comes to material and human resources (time, effort, cost).
We also had a look into UMs’ perceptions of the effectiveness of the ways to
enhance TAL and correlated them with those discussed above. As is seen in the
diagram, a total agreement is observed in placing short-term courses on LTA in
the middle, and ranking distant-courses the lowest. In two other incidences,
regarding long-term courses and traineeship in Ukrainian universities, the
indices have a larger variance, since managers are responsible for the smooth
flow of instructional process in their departments whereas absence of staff at
their workplace could impede it. In all other incidences, the graph reveals a
similar tendency with both groups of the respondents, although Ts provided
more generous scores to the preferred formats of training events.
The interpretation of the data allowed us to arrive at conclusions regarding
all questions of this research. It appeared that the perceptions of reliability–
trustworthiness by the three groups of informants diverged considerably
wherever they could be compared.
The curve indicating the major stakeholders’ (Sts) data stretches steadily along
medium indices, which points to the respondents’ undetermined perceptions of
all aspects excluding the most meaningful for them – their satisfaction with the
4.97
4.20 3.23
3.25 3.4 3.59 3.19 3.66
3.1 3.1 2.80
2.25 2.54 2.19 1.76
1.87 1.58 1.7
0.91 0.87
Teachers Managers
Figure 10.2 Preferred formats of training in LTA

100
97
70 70
61 58
58 58 55
40
30
Uniformity Test quality Objectivity Test results
Managers Teachers Students
Figure 10.3 Comparison of perceptions of summative assessment reliability
obtained grades. The educators’ responses agree only once, in respect to the
uniformity of summative assessment; their indices of satisfaction with the test
results, however, are quite close to each other but are somewhat higher than that
of the Sts. While the Sts’ and Ts’ perceptions of objectivity coincide on mark 58,
the UMs reveal a greater degree of certainty in test reliability; in fact, this con-
tradicts their overall rigorous stance on summative assessment.
On the whole, the UMs revealed the most critical and consistent evaluation
of all aspects of test development, administration, and analysis of test quality.
We attribute such perceptions primarily to great responsibilities of the man-
agerial job they perform, as well as their generally relevant capacity to organize
and monitor summative assessments. The data also suggest that UMs have a
fairly good control over summative assessment implementation although the
degree of collaboration on some stages of test development (e.g. quality assur-
ance), in our view, needs reconsidering. Additionally, UMs should be credited
for the effort invested in TAL enhancement, in particular, for organizing staff
seminars on LTA issues at the departments they head.
The data obtained from Ts enabled a more detailed view on assessments
procedures. The practices testify to assessments being developed and adminis-
tered in compliance with setting-specific requirements although we noted that
some relevant principles of test aspects were seriously compromised. The most
obvious reason for this lies in the lack of solid, specialized training in LTA;
however, even if teachers had a proper level of TAL, they would evidently be
in need of UMs’ support in terms of creating conditions conducive to colla-
borative test development and quality assurance. Nevertheless, what is clearly
positive in the assessment practices surveyed by us is the agreement in the Ts’
and Sts’ perceptions of feedback efficiency and the absence of any negative
impact of assessment on further learning.
The limitations of the study relate to subjectivity of a survey as a method of
investigation. However, the focus on the same concepts from different perspec-
tives allowed us to collect compatible data and enabled insights into real-life
practices in various local contexts. Drawing on these insights, we formulate the

practical implications for implanting summative assessment in universities.
First and foremost, department staff and UMs need to achieve coordina-
tion in defining the test construct and its relevance to the curricula. It is
desirable that the test development should involve a more representative
team of teachers. Our practice suggests that test preparation that engages,
along with unit leaders, some other motivated teachers, enables more staff
members to feel ownership of the test, as well as responsibility for its proper
quality, administration, and reliable results. Additionally, working on tests
together, the teachers share their knowledge about LTA and enhance TAL
in their setting. However, test developers should not confine themselves to
their department/institution, but network with colleagues from other
departments/institutions. So far, such networking has been promoted by
UALTA; the principal goal pursued by the authors of the chapter is to run
short-term courses in LTA for practicing university teachers. The courses,
apart from enhancing TAL, would expand research into alternative assess-
ment that has found fairly limited use in Ukrainian classroom along with
students’ assessment literacy and assessment agency.
Conclusion
Reliability–trustworthiness of summative assessments in Ukrainian universities
has been considered in the study from the perspective of central stakeholders.
The information obtained from all groups of respondents shed light on the
assessment practices typical of universities from across the country. The results
confirm the existing views on TAL as a cornerstone of objective and equitable
summative assessment.
Of particular interest for the authors of this chapter are the educators’ sug-
gestions about possible ways of improving reliability of summative assessments;
both groups of educator respondents found it critical to raise TAL, identifying
the preferred formats of training in LTA – traineeship, workshops, and short-
term courses. To conclude, we received salient evidence of some progress in
building TAL in the surveyed universities and identified the most meaningful
formats of TAL enhancement; this stimulates follow-on studies as well as
practical organizational steps.
References
Alderson, J. C. (1999, May). Testing is too important to be left to testers. Plenary address to
the Third Annual Conference on Current Trends in English Language Testing,
United Arab Emirates University.
Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation.
Bachman, L., & Palmer, A. (1996). Language testing in practice: Developing useful language
tests. Oxford University Press.
Bolitho. R., & West, R. (2017). The internationalisation of Ukrainian universities: the English
language dimension. British Council. https://www.teachingenglish.org.uk/sites/tea
cheng/files/Pub-UKRAINE-REPORT-H5-EN.pdf
Brown, G. T. L. (2019). Is Assessment for Learning Really Assessment? Frontiers in
Education, 4. doi:10.3389/feduc.2019.00064
Chapelle, C. A. (2013). Reliability in Language Assessment. In C. A. Chapelle (Ed.),
The Encyclopedia of Applied Linguistics (pp. 4918–4923). Blackwell/Wiley.
Carless, D. (2006, September 6–9). Developing synergies between formative and summative
assessment. Paper presented at the British Educational Research Association Annual
Conference, University of Warwick. http://www.leeds.ac.uk/educol/documents/
159474.htm
Carless, D. (2007) Learning-oriented assessment: conceptual bases and practical implications.
Innovations in Education and Teaching International, 44(1), 57–66.
Coombe, C., Folse, K., & Hubley, N. (2007). A Practical Guide to Assessing English
Language Learners. The University of Michigan Press.
Dovgopolova, I. V. (2011) Vprovadzhennia testovoi metodyky v protsess navchannia u
vyshchyh navchalnyh zakladah. [Integration of language testing into language learning
in higher education institutions]. Vyscshia shkola, 2(20), 41–50.
Douglas, D. (2010). Understanding language testing. Routledge.
Quarterly, 9(2), 113–132.
Fulcher, G., & Davidson, F. (2007). Language testing and assessment: An advanced resource
book. Routledge.
Green, A. (2016). Assessment literacy for language teachers. In D. Tsagari (Ed.), Class-
room-based Assessment in L2 Contexts (pp. 8–29). Cambridge Scholars Publishing.
Green, A. (2014). Exploring language assessment and testing: Language in action. Routledge.
Gareis, C. R., & Grant, L. W. (2015). Teacher-made assessments: How to connect curriculum,
instruction, and student learning (1st edition). Routledge.
Harlen, W. (2007). Designing a fair and effective assessment system. Paper presented at the
BERA Annual Conference: ARG Symposium Future Directions for Student Assess-
ment. University of Bristol, Bristol.
Harlen, W. (2004). A systematic review of the evidence of reliability and validity of assessment by
teachers used for summative purposes. EPPI-Centre, Social Science Research Unit, Insti-
tute of Education. https://eppi.ioe.ac.uk/cms/Portals/0/PDF%20reviews%20and%
20summaries/ass_rv3.pdf?ver=2006-03-02-124720-170
Hasselgreen, A., Carlsen C., & Helness, H. (2004). European survey of language testing and
assessment needs: Report: Part one – general findings. European Association for Language
ort-pt1.pdf
Houston, D., & Thompson, J. N. (2017). Blending formative and summative assessment
in a capstone subject: ‘It’s not your tools, it’s how you use them’. Journal of University
Teaching & Learning Practice, 14(3).
Hughes, A. (2003). Testing for language teachers. Cambridge University Press.

Inbar-Lourie, O. (2013). Language assessment literacy: What are the ingredients? Paper pre-
sented at the 4th CBLA SIG Symposium Programme, University of Cyprus.
Kvasova, O. (2016). A case of training university teachers in developing and validating
classroom reading test tasks. In D. Tsagari (Ed.), Classroom-based Assessment in L2
Contexts. (pp. 54–74). Cambridge Scholars Publishing.
Kvasova, O. (under review). Will a boom lead to a bloom? Or how to secure a launch
of language assessment literacy in Ukraine. In D. Tsagari (Ed.), Language Assessment
Literacy: From Theory to Practice. Cambridge Scholars Publishing.
Kvasova, O., & Kavytska, T. (2014). The assessment competence of university foreign
language teachers: A Ukrainian perspective. Language Learning in Higher Education, 4
(1), 159–177.
Lau, A. M. S. (2017). ‘Formative good, summative bad?’ – A review of the dichotomy
in assessment literature. Journal of Further and Higher Education, 40(4), 509–525.
McNamara, T. (2000). Language testing. Oxford University Press.
McNamara, T., & Hill, K. (2011). Developing a comprehensive, empirically based
Meletiadou, E., & Tsagari D. (2016). The washback effect of peer assessment on ado-
lescent EFL learners in Cyprus. In D. Tsagari (Ed.), Classroom-based Assessment in L2
Contexts (pp. 138–160). Cambridge Scholars Publishing.
Muñoz, A. P., Palacio, M., & Escobar, L. (2012). Teachers’ beliefs about assessment in
an EFL context in Colombia. PROFILE, 14(1), 143–158.
Norton, L. (2009). Assessing student learning. In H. Fry, S. Ketteridge, & S. Marshall
(Eds.), A handbook for teaching and learning in higher education: Enhancing academic practice
(3rd edition, pp. 132–149.) Routledge.
Olendr, T. M. (2015). Deiaki problemy vykorustannia innovatsiynyh tehnologiy kon-
troliu y otsiniuvannia znan studentid na zaniattiah z inozemnoi movy pf professiynym
spriamyvanniam. [The issues of implementing innovative methods of assessment in
ESP classroom]. Pedagogichnia nauky: teoriia, istoriia, innovatsiyni tehnologii, 4(48). http
s://repository.sspu.sumy.ua/bitstream/123456789/1622/1/Deiaki%20problemy%
20vykorystannia%20.pdf
from a parliamentary inquiry. Language Testing, 30(3), 381–402.
Shadrina, T. (2014). Testuvannia v umovah kredytno-modulnoi systemy: perevagy ta
nedoliky (na prykladi vyvchennia ukrainskoi movy iak inozemnoi. [Testing Ukrainian
as L2 in ECTS-compliant context: advantages and drawbacks]. Teoriia ta praktyka vyk-
ladannia ukrainskoi movy iak inozemnoi, 9, 60–67.
29, 21–36.
The Higher Education Academy (HEA). (2012). A marked improvement. Transforming
assessments in higher education. The Higher Education Academy. https://www.heaca
demy.ac.uk/system/files/A_Marked_Improvement.pdf
Tsagari, D., Vogt, K., Froehlich, V., Csépes, I., Fekete, A., Green A., Hamp-Lyons, L.,
Sifakis, N., & Kordia, S. (2018). Handbook of Assessment for Language Teachers. Tea-
chers’ Assessment Literacy Enhancement. http://taleproject.eu/pluginfile.php/
2129/mod_page/content/12/TALE%20Handbook%20-%20colour.pdf
Part 4
Language assessment literacy:

Interfaces between teaching and
assessment
Chapter 11
To teach speaking or not to

teach? Biasing for the interfaces
between teaching
Diana Al Jahromi
Introduction
As social creatures, people need to communicate in social settings in order
to survive and maintain relationships of friendship, enmity, comradeship,
and acquaintanceship by means of oral and/or written language. More than
340 million people speak English, which functions as the lingua franca
around the globe (Celce-Murcia, 2013; Tarone, 2005; Ounis, 2017; Ur,
2012). Accordingly, globalization and the marketplace have necessitated oral
and written English proficiency and it is considered to be one of the key
graduate attributes (Koo, 2009). Given the current status of English and the
major role it has been playing in education and commerce in the Arab
countries and, more specifically, in the Arabian Gulf region since the beginning
of the 20th Century, overall English proficiency is considered a pressing pre-
requisite for swift employment and successful career (Ministry of Labour and
Social Affairs in Bahrain, 2017). Being able to successfully communicate in
English in corporate-level conversations is a demanding criterion to which the
educational ecosystem, and more specifically higher education institutions, need
to pay additional attention when designing curricula.
The significance of oral performance proficiency and speaking as a productive
skill have been highly stressed by curriculum specialists, educationists, and
researchers (Celce-Murcia, 2013; He, 2013; Hismanoglu, 2013; Liu, 2012; Lu &
Liu, 2011; Ounis, 2017, Yaikhong & Usaha, 2012). However, looking at the
regional and local status of English instruction reveals some adversarial realities, as
the educational systems place high importance on reading, writing, and the
teaching of form and function, while less attention is given speaking. Thus, it is
considered the most difficult skill to master due to a number of factors caused by
negligence such as the fear of negative evaluation by peers, low lexical richness,
and lack of formation skills and practice (Al Asmari, 2015; Al Hosni, 2014).
This chapter aims to explore two dimensions of the assessment of the
speaking skill: 1) the current practice of assessing speaking in tertiary education
in Bahrain with reference to some pedagogical structures and to a number of
cognitive variables related to Second Language Acquisition (SLA) and language
200 Diana Al Jahromi
learning, and 2) an analysis of the possible provisional means of teaching and

assessing the speaking proficiency of L2 learners with the intent to empower
them and equip them with the skills they need to succeed in the job market
and in their careers. According to statistics from the Ministry of Labour and
Social Affairs in Bahrain, the majority of graduates with good oral skills in
English are those who get jobs and climb up the career ladder faster than those
who have poor oral proficiency in English. Bygate (1987) considers speaking
‘the vehicle par excellence of social solidarity, of social ranking, of professional
advancement and business’ (p.vii).
Research questions
The present study aims to answer the following questions:
1 Do L2 university learners have speaking anxiety?

2 How is speaking being taught and assessed in tertiary L2 programs?
3 What are L2 students’ perceptions of the teaching and assessment of the
speaking skill?
4 What is the relationship between students’ speaking anxiety and their
perceptions of the teaching and assessment of the speaking skill?
5 What is the relationship between students’ speaking anxiety and their
exposure to extracurricular spoken English?

Richards (2008) defined speaking as being engaged in ‘meaningful interaction’.
He also stated that ‘maintaining comprehensible and ongoing communication
despite limitations’ in the communicative competence of speakers (p.14),
involves the appropriate use of spoken discourse in different social contexts in
which purposeful interactions amongst interlocutors take place at different
levels. In addition to being interactional, speaking can be used for transactional
purposes in order to communicate information (Celce-Murcia, 2013; Harmer,
2010; Kingen, 2000; O’Sullivan, 2006). Kingen (2000) enumerates 12 func-
tions of speaking: the personal, descriptive, narrative, instructive, questioning,
comparative, imaginative, predictive, interpretative, persuasive, explanatory,
and informative.
Literature in the area of SLA and English as a Foreign Language (EFL) has
persistently considered speaking proficiency imperative for the cultivation of L2
learners’ ability to communicate adaptably in cross-cultural contexts in which
the exchange of information is mandated (Al Hosni, 2014; Hidri, 2017; Talley
& Hui-ling, 2014). The significance of the speaking skill in L2 settings, and its
mastery as a key performance indicator of overall language proficiency, has
been widely researched and echoed in the literature. Turning to regional lit-
erature, it is evident that awareness of the importance of investigating the status
Biasing for the interfaces between teaching 201
of the speaking skill has inspired a lot of researchers in that region, given the
number of studies in the field during the last decade (Bashir, 2014; Heng,
Abdulla, & Yusof, 2012; Hidri, 2018; Mahmoodzadeh, 2012; Mak, 2011;
Yahya, 2013). The majority of these studies acknowledged the disposition of
the teaching and assessment of speaking and related it to the influence of
speaking anxiety in EFL settings and to inadequate teaching methodologies and
practices (Hidri, 2017). Acknowledging the factors behind the lack of speaking
proficiency paves the path for appropriate pedagogical implications and
recommendations (Gebril & Hidri, 2019).
Foreign language speaking anxiety

One of the reasons that speaking continues to be one of the most difficult skills
to be mastered by L2 learners (Elmenfi & Gaibani, 2016; Ounis, 2017) could
be related to anxiety (Hanifa, 2018). Anxiety is defined as the ‘subjective feel-
ing of tension, apprehension, nervousness, and worry associated with an arousal
of the automatic nervous system (Spielberg, 1983, p.111). Anxiety has three
types: trait anxiety related to the individual character, state anxiety associated
with temporary moments, and situation-specific anxiety. It is often based on
two models: the retrieval model and the interference model (Woodrow, 2006).
The first refers to the inability to retrieve previously-learnt knowledge at the
time of sending the message out, while the second focuses on the lack of skills
and knowledge during the learning process.
The correlation between SLA and anxiety has long been a subject of
research. Similarly, given how it may impede language proficiency in general
and speaking proficiency in particular, foreign language speaking anxiety has
attracted a sufficient number of studies. Foreign Language Anxiety (FLA), was
first defined by Horwitz, Horwitz, and Cope (1986) as ‘a distinct complex
construct of self-perceptions, beliefs, feelings, and behaviours related to class-
room language learning arising from the uniqueness of language learning
processes’ (p.128). Young (1994) defines FLA as the ‘worry and negative
emotional reaction aroused when learning or utilizing a second language’
(p.27). Literature has accordingly shown the negative impact speaking anxiety
can have on the linguistic input and output of learners and on students’ aca-
demic performance and social behaviour in L2 settings (Arnaiz & Guillen,
2012; Ezzi, 2012; Hanifa, 2018; Liu, 2012; Park & French, 2013; Zhou,
2016). For instance, Gregersen’s (2003) study showed how having a high
level of FLA negatively affected the process of language learning. Some other
studies have investigated the negative correlation between foreign language
anxiety and a number of variables such as age, gender, L2 prior experience,
etc. (Fakhri, 2012; Mohammadi & Mousalou, 2013; Wang, 2010). With
particular reference to gender, these studies argue that a person’s spoken dis-
course is affected by gender and society. While males, for instance, tend to
project more confidence and energy while communicating (Yousif, 2016);
females tend to be more introvert and cautious. Coates (2014) acknowledges

that while females are often communicating silently when among males, they
‘talk as if there is no tomorrow’ in male-free communicative situations (p.45).
Other differences include levels of politeness, interruption, turns, and loud-
ness (Coates, 2014).
Gender dominance and proclamation of power is evident in mixed-gender
conversations (Al Qahtani, 2013). Saville-Troike (2013) refers to the effect
one’s religion and culture can have on their idiosyncrasy. Interlocutors with the
same cultural or religious ‘inside’ backgrounds speak to each other differently
than they do when communicating with ‘outsiders’. While Dörnyei (2005)
confirms that the psychological state of L2 learners determines the success or
failure of their language learning process, Horwitz, Horwitz, and Cope (1986)
also refer to second language speaking anxiety and consider it ‘a specific anxiety
reaction’ that, when combined with performance anxieties such as test anxiety,
fear of being negatively evaluated, and communication apprehension, could
lead to catastrophic learning experiences (p.125). Communication apprehension
is ‘an individual’s level of fear or anxiety associated with real or anticipated with
another person or persons’ (McCroskey, 1978, p.1). It is claimed that people
with oral apprehension often feel anxious in L2 settings (Horwitz, Horwitz &
Cope, 1986; McCroskey, 2016.).
Different studies have used different measures and scales to measure FLA.
Horwitz, Horwitz and Cope (1986) developed the Foreign Language
Classroom Anxiety Scale (FLCAS), which has been widely used in a ple-
thora of studies (Mahmoodzadeh, 2012; Mak, 2011). Mahmoodzadeh
(2012) found a heightened foreign language speaking anxiety among more
females than males, caused by their interlanguage meaning system. Mak
(2011) used the same scale with 313 Chinese freshmen and revealed that
fear of negative evaluation, negative self-evaluation, and speaking with
native speakers were among the main factors triggering speaking anxiety in
classroom settings. That said, other studies have used adopted versions of
FLCAS such as Heng, Abdulla, and Yusof (2012) who investigated the
speaking anxiety of 700 undergraduate Malaysian students. They found no
notable correlation between gender and speaking anxiety, which was
reported to be at a medium level. In addition to FLCAS, a number of other
scales have been employed in several studies. Yaikhong and Usaha (2012)
introduced a Public Speaking Class Anxiety Scale (PSCAS) aimed at mea-
suring public speaking class anxiety in Thailand after adopting and modify-
ing previous scales’ items. Another scale of speaking anxiety is the Second
Language Speaking Anxiety Scale (SLSAS) developed by Woodrow (2006).
Woodrow’s study used a confirmatory factor analysis (CFA) and developed
a speaking anxiety scale to attest 275 Australian students’ speaking anxiety.
Findings suggested that speaking in class during oral activities places addi-
tional pressure on students and adversely affects the level of speaking anxi-
ety (Kayoaglu & Saglamel, 2013).
Turning to the regional and local contexts, Arab learners of English seem
to have high speaking anxiety in L2 settings (Alhamadi, 2014; Al Jahromi,
2012; Al-Shaboul, Ahmad, Nordin, & Rahman, 2013; Rabab’ah, 2016 Taha
& Wong, 2016; Yahya, 2013. Yahya (2013 revealed that fear of negative
evaluation was the key factor triggering speaking anxiety among Palestinian
undergraduate students. Similar findings were reported by Elmenfi and Gai-
bani (2016). Rabab’ah (2016) claims that Arab learners encounter speaking
difficulties because of inadequate teaching methodologies, and lack of practice
and listening tasks, while Al Asmari (2015) attributes such difficulties to les-
sened motivation and strict evaluation techniques. Locally speaking, a review
of the literature shows that there has been no published research on the status
of speaking and FLA in the Kingdom of Bahrain. However, a number of
unpublished undergraduate graduation studies have shown that university and
school students consider the speaking skill more difficult to master than
reading and writing skills. In one of these studies, only one third of high
school students reported having oral fluency.
While more than half of these studies reported having good communication
skills, the majority of them expressed increased apprehension when randomly
assigned to speak in in-class participation (Abbas, 2017). In Yousif’s (2016)
study, more than 30% of the respondents who were undergraduate students
reported increased anxiety when speaking in class because their L2 teachers did
not teach or assess speaking or provide them with ‘real’ speaking opportunities.
However, almost half of the respondents reported feeling at ease when com-
municating with family members and friends. Similarly, 93% of the respondents
in Salman’s (2016) study revealed that they faced difficulties in speaking and
attributed them to skill deficiency due to teaching (47%) and lack of confidence
(46%). Taleb (2017) used the PSCAS to measure university students’ speaking
anxiety and found a moderate level of anxiety. Categorically, students in these
local studies attributed low proficiency in the speaking skill to a number of
variables that are attested in the current study: the lack of 1) confidence, 2)
native-like speaking opportunities, 3) speaking and communication skills cour-
ses, and 4) proper teaching and assessment practices and processes. Firstly,
speaking anxiety could be instigated by a number of factors; one of the most
fundamental could be related to students’ language competence and personal
traits such as lack of confidence, fear of peer evaluation or teacher evaluation,
shyness, etc. (Abbas, 2017; Al-Nasser, 2015; Elmenfi & Gaibani, 2016; Gan,
2012; Kayoaglu & Saglamel, 2013; McCroskey, 2016; Pathan, Aldersi &
Alsout, 2014; Tanveer, 2007). In Abbas’ (2017) study, almost 60% of university
students acknowledged having speaking anxiety, a large proportion of whom
were high-achieving students while conversely low-achieving ones claimed
having low anxiety during speaking. A similar number of students placed a high
value on the impact having good speaking skills on their language proficiency.
Speaking anxiety in this regard can be directly related Krashen’s Affective
Filter Hypothesis (1987) and Vygotsky’s (1986) Social Constructivist Theory
and Social Interaction Hypothesis. These theorists posit that L2 learning can
only be successful when the learner is in anxiety-free learning settings and
proximal zones of interaction. L2 learners of English often tend to feel appre-
hensive during speaking attempts or requests and tend to avoid situations in
which they are expected to communicate orally by skipping classes, minimizing
participation, and refusing to take part in oral presentations. When forced to
speak, their avoidance defensive strategy is triggered, resulting in apprehension.
Additionally, school and tertiary curricula design and education systems seem
to comprehensively focus on reading and writing skills at the expense of the
speaking skill (Celce-Marcia, 2013; Koran, 2015). This means students lack oral
proficiency and become anxious when having to orally communicate. It seems
that L2 speaking anxiety is not seriously addressed by practitioners and decision-
makers, who fail to gauge the effect of speaking anxiety on students’ academic
performance and classroom behaviour (Basic, 2011).
Teaching and assessment of the speaking skill

An influential cause of speaking anxiety is categorically related the traditional
pedagogical practices, which are lecture-based and teacher-centred with limited or
no L2 interaction or in-class speaking opportunities for students (Abbas, 2017;
Tanveer, 2007; Tarone 2005). Speaking, a productive skill, is one of the most
neglected skills in L2 instruction as it is rarely taught or assessed. If assessed, students
are often assigned a 5–10 minute summative presentation task without receiving
proper practice or instruction on speaking and oral presentation techniques. While
interaction in English is a key ingredient of overall language proficiency, Lindsay
and Knight (2006) refer to the impact weak curricula, which disregard the role of
speaking in their instruction or assessment, have on such proficiency.
The negligence of teaching and assessing speaking in L2 settings can be attrib-
uted to a number of factors. Time constraints are one factor, as overwhelmed
teachers are often confined to 50–60 minutes of classroom time, which they
believe should be assigned to the teaching of grammar, reading, and writing. Also,
teachers who teach speaking often rely on assigning tasks of drilling and repeating
sentences in class, or using language labs to memorize pronunciation or practice
producing oral expressions in artificial contexts (Al Jahromi, 2012; Koran, 2015).
Additionally, the assessment of the speaking proficiency in language courses or
individual courses is significantly scarce in local and regional EFL settings (Elmenfi
& Gaibani, 2016). If assessed, it is often a summative assessment without prior
teaching or practice; teachers often do not employ clear criteria or rubrics of
assessment. What is worse is that students are often not provided with curricular or
extracurricular opportunities to speak in the target language (Koran, 2015). Local
tertiary language programs offer language courses that for the most part do not
teach or assess speaking. The speaking segments in these textbooks for these
courses are skipped and more focus is given to the teaching of grammar, reading,
and writing.
Programmes might offer a phonology course in addition to an occasionally

offered elective course in public speaking, but this phonology course does not
allow for oral use of learnt pronunciation rules except at a drilling level. In a
local study (Abudrees, 2017), almost 60% of students taking these courses
revealed that these courses do not enhance their speaking skills, while more
than a quarter expressed their uncertainty of any positive impact of these
courses over their speaking proficiency. More than 80% of student respondents
revealed the importance of having speaking-related courses during their first
years at the program. Around 70% of students reported increased anxiety and
loss of confidence while speaking in English, due to their fear of negative
evaluation. The lack of in-class speaking opportunities such as discussions,
debates, and oral presentations was reported by the majority of students (74%).
Consequently, their academic performance and grades were negatively influ-
enced (64%) in the summative assessment they did at the end of the semester.
This corresponds well with Ur (2012) who conversely advocates the use of
well-designed classroom activities that augment the enhancement of students’
speaking skill of. In addition, 64% of the students in Abudrees’ study revealed
that they found communicating with native speakers of English problematic
and apprehensive. However, in a vivid acknowledgment of the importance the
job market places on speaking in English, more than 67% of these students
believed that low speaking proficiency could negatively affect their employ-
ment opportunities. A similar report on the negligence of speaking could also
be found in high school instruction. In AlRashid’s (2017) study, high school
students expressed that speaking is the least preferred skill and the most difficult
(70%) and they attributed that to the lack of teaching and assessment of
speaking. What was interesting was that students reported that they practiced
speaking in English outside of school more often than in class. More than two-
thirds reported not having speaking as part of the curricula or assessment.
Teacher interviewees in this study listed a number of challenges faced by L2
learners that discourage teachers from teaching speaking such as speaking anxi-
ety, lack of confidence, fear of peer evaluation, insufficient vocabulary, and
pronunciation issues.
In sum, although students experienced varied teaching methodologies, it seems
that educational systems in schools and universities do not categorically make use
of sufficient opportunities for speaking teaching, practice, and assessment.
Method
Sample
The study sample consisted of 82 L2 university students (75% = female, 25% =
male) enrolled in language learning programs in public and private higher
education institutions in Bahrain (91% = public university, 9% = private uni-
versities). In addition to majoring in English, the majority of these students
were doing a minor in Translation (43%), French (13.5%), and American Stu-
dies (12%), while another 10% were doing a single major. The vast majority of
these students were mature, given that 90% of these students were between 21
and 23 years old, while the rest were older than that. In addition, almost 80%
of these respondents were senior, about-to-graduate 4th -year students.
Data collection
An online questionnaire of 55 items was administered to private and public
tertiary-level students enrolled in EFL programs. The questions aimed to mea-
sure the relation between a number of variables such as gender, speaking
anxiety, teaching and learning practices, and language exposure outside class-
rooms. In addition to items examining the demographic backgrounds of the
respondents, speaking anxiety was measured using five-point Likert scale items
that were adapted from the FLCAS developed by Horwitz, Horwitz, and Cope
(1986), but modified to speaking. This scale identifies three levels of anxiety:
high, moderate, and little or no anxiety. Out of the 33 items that FLCAS uses,
this study used only 15 items (see Appendix A, Question items 8–33) that mea-
sure students’ oral anxiety using three dimensions: 1) fear of negative evaluation,
2) communication apprehension, and 3) test anxiety. In addition, 10 more
question items were added to measure speaking anxiety. Following these items,
the questionnaire contained 16 questions that inquired about the status of
speaking in academic curricula and instruction (see Appendix A, Question items
34–47), while five more questions investigated the extracurricular exposure to
spoken English outside classrooms (see Appendix A, Question items 49–53).
Finally, interviews with a focus-group of 30 students were conducted to verify
the status of the teaching and assessment of the speaking skill, and to verify the
effects of the variables mentioned above.
Data collection procedures

The questionnaire was administered online to 82 L2 undergraduate students
enrolled in language programs in public and private universities. The link to
the questionnaire was sent to 150 students using an academic portal and the
response rate was 55%, which is a representative rate. The questionnaire con-
tained items inquiring about respondents’ demographic information such as
age, gender, academic level, and type of higher education institution.
Data analysis
Data from the questionnaire was coded and analyzed using descriptive statistics
and measures of central tendency (means, standard deviations, and percentages)
were used to identify the levels of anxiety. Given that only 15 items were
selected and modified out of the 33, the original measures were not used.
Conversely, the mean scores were used to measure responses to the question
items, using the following scale: low FLCAS = the mean scores between
1.00–2.00; moderate FLCAS = 2.01–3.50; high FLCAS = 3.51–5.00. A similar
scale was used to measure students’ satisfaction with the academic curricula
and teaching practices: high satisfaction = the mean scores between 1.00–2.00;
moderate satisfaction = 2.01–3.50; low satisfaction = 3.51–5.00. As to measuring
students’ extracurricular exposure to spoken English (question items 49–53),
the mean scores of the responses were analyzed using the following scale: no
or limited exposure = 0.00–1; moderate exposure = 1.01–2.00; high exposure =
2.01–3.00). The Statistical Package for the Social Sciences (SPSS) was used to
examine the correlations and differences among variables such as the level of
speaking anxiety, gender, academic curricula and teaching practices, and the
effect of extracurricular language exposure and use on speaking anxiety.
Pearson correlations and paired sample t-tests were used to confirm such
correlations.
Results
This section presents the findings of the survey and the interviews with the
focus group. First, based on students’ responses to the survey, the mean score of
students responses to FLCAS items (items 8–33) was 2.95 with a standard
deviation of 1.32. This indicates that students have a moderate level of anxiety
when speaking English in class (see Table 11.1).
Second, the level of anxiety was correlated with gender using a paired
sample t-test. Findings, which are presented in Table 11.2, illustrate that no
significant differences were found between males and females in the level of
anxiety when speaking in EFL settings (sig.=0.333).
Third, in relation to measuring students’ satisfaction with the academic cur-
ricula and teaching practices (items 34–47), the mean score of students
responses was 3.58 with a standard deviation of 1.11 (see Table 11.3). This
Table 11.1 Mean scores of the foreign language classroom anxiety scale (FLCAS)
N Minimum Maximum Mean SD
FLCAS 82 1.00 5.00 2.95 1.32
Table 11.2 Correlation between FLCAS and gender

Male (17) Female (65) t df Sig.
M Std. D. M Std.
D.
FLCAS 2.83 0.61 2.98 0.57 0.975 68 0.333
Table 11.3 Mean scores of students’ satisfaction with the academic curricula and
teaching practices
N Min. Max. Mean SD
Students’ satisfaction with the academic 82 1.00 5.00 3.58 1.11
curricula and teaching practices
signifies that students have a low level of satisfaction with the curricula and
the pedagogical practices related to the teaching and assessment of speaking
as a productive skill. More than two-thirds of the students reported that
their L2 curricula do not include speaking courses and that their teachers do
not provide them with in-class speaking opportunities or tasks. 62% denied
receiving any instruction related to public speaking or giving oral presenta-
tions, while the majority of them (81%) demanded having speaking courses
in their L2 programs.
When the focus group was interviewed, a number of students revealed more
details related to their dissatisfaction with the teaching and assessment of
speaking, as exhibited in the following extracts.
During these interviews, students reported heightened anxiety when speak-
ing in class. Interviewed students reported a number of factors that were major
causes of their anxiety; the first and foremost was the lack of speaking courses
Figure 11.1 Students viewpoints regarding the teaching and assessment of speaking (1)
Figure 11.2 Students viewpoints regarding the teaching and assessment of speaking (2)
combined with traditional teaching methodologies. However, a Pearson cor-

relation was carried out between FLCAS and students’ satisfaction with the
academic curricula and teaching practices, and it showed no significant rela-
tionship between FLCAS and students’ satisfaction with the academic curricula
and teaching practices (see Table 11.4).
*Correlation is significant at the 0.05 level (2-tailed).
Finally, results of the survey items measuring students’ extracurricular expo-
sure to spoken English (see Table 11.5) revealed that students have high
exposure to spoken English outside EFL classrooms through means of watching
English movies and listening to audio material such as songs and podcasts.
An interesting finding appeared in the correlation between FLCAS and extra-
curricular exposure to spoken English. Findings show a significant negative rela-
tionship between both (see Table 11.6). It seems that having extracurricular
exposure to spoken English lessens the level of anxiety among EFL students when
speaking in English. It also indicates that students with a high level of anxiety do
not often have an adequate level of such exposure to spoken English.
Discussion
A closer look at the results reveals that although L2 university students have
moderate levels of speaking anxiety, they are unsatisfied with their academic
curricula and with their teachers’ teaching practices related to speaking. Hence,
Table 11.4 Correlation between FLCAS and extracurricular exposure to spoken

English
Correlations
FLCAS students’ satisfaction with
the academic curricula and
teaching practices
FLCAS Pearson Correlation 1 -.189
Sig. (2-tailed) .118
N 82 82
students’ satisfaction Pearson Correlation -.189 1
with the academic Sig. (2-tailed) .118
curricula and teaching
practices N 82 82
Table 11.5 Mean scores of students’ extracurricular exposure to spoken English

N Min. Max. Mean SD
Extracurricular exposure to spoken English 82 1.00 3.00 2.69 0.52
Table 11.6 Correlation between FLCAS and extracurricular exposure to spoken

English
Correlations
FLCAS Extracurricular expo-
sure to spoken English
FLCAS Pearson Correlation 1 -.244*
Sig. (2-tailed) .042
N 82 82
Extracurricular expo- Pearson Correlation -.244* 1
sure to oral English Sig. (2-tailed) .042
N 82 82
*Correlation is significant at the 0.05 level (2-tailed).
it is imperative that the educational system makes changes to allow for the
adequate teaching and assessment of speaking by means of introducing speaking
courses and availing formative and summative oral, interactive, and collabora-
tive learning tasks and activities. Practitioners and curriculum specialists should
be called upon to undertake drastic changes in the academic programs at the
school and tertiary levels to render speaking a core skill to be taught, learnt, and
assessed. According to the National Research Council (1996), assessment and
learning ‘are two sides of the same coin’ (p.5). Consequently, assessment as
learning emanates from the idea that learning involves the students in an active
and interactive process of cognitive restructuring (Earl & Katz, 2006).
Ur (2012) argues that it is often L2 learners’ principal objective to be able to
communicate orally and fluently in formal and informal interaction, and hence
L2 teachers need to enable them to achieve such an objective. A number of
recommendations in this regard have been suggested by numerous educa-
tionists. Hamzah and Ting (2010) reported that teaching speaking in groups
enhances motivation and lessens speaking anxiety and fear of peer criticism
among individuals. In addition, diagnostic tests need to be undertaken to pin-
point anxious students and provide them with assistance (Woodrow, 2006).
What is more, the teaching of phonology and more particularly pronunciation
needs to be introduced at the early school cycles in order to gauge learners’
accent, stress, rhythm, and intonation in the early stages of learning the target
language (Shively, 2008) and equip them with the oral skills needed to bridge
the gap between academic school levels and tertiary education and workplace
requirements (Lindsay & Knight, 2006). Hence, interactive classroom activities
need to be implemented for the production of a consistent and meaningful
output by means of introducing the practice of real-life speaking in classroom
settings to reduce speaking apprehension and help students identify the areas in
which they need enhancement to augment their oral fluency (Harmer, 2010;
Koran, 2015). Koran argues that a good teacher is the one who assesses their
students speaking skill by means of both observations, quizzes, or exams designed
to evaluate their oral proficiency. For perfecting students’ speaking competence,
teachers have to provide constructive feedback, facilitate in-class discussions and
debates, and provide students with listening material (Harden & Crosby, 2000).
First and foremost, a holistic reformation of the academic curricula needs to be
proximately implemented in order to ensure that speaking is incorporated as an
important segment of any academic L2 program and that it is fused and assessed
as a key intended learning outcome. Longitudinal future studies that address the
pedagogical practices in the pre-tertiary educational cycles and that measure the
status of teaching and assessment of the speaking skill are required in order for
the educationists and decision-makers to be able to rectify the long-term negli-
gence of speaking and incorporate it in the L2 curricula.
Appendix A
Questionnaire on the status of English language speaking in L2 settings

in Bahrain
This questionnaire aims at exploring the status of the speaking skill in Bahraini
public and private universities in academic L2 programs from students’ per-
spectives. Please answer the following questions by choosing the option that
best describes your opinion. Kindly be informed that all the information pro-
vided is handled with confidentiality and utmost privacy. Thank you in
advance for your time and effort.
1 1. How old are you?

15–17
18–20
21–23
More than 23
2 Are you a …?
Public university student
Private university student
3 Gender
Female
Male
4 If you are a university student, what academic year are you doing now?*
Foundation
Year 1
Year 2
Year 3
Year 4
Not applicable
Other: (please specify) _________________
5 If you are a university student majoring in English, what is your minor?*
Translation
French
American Studies
No minors
Other: (please specify) _________________
6 How would you rate your speaking skill?*
Excellent
Good
Fair
Somewhat poor
Very poor
7 How would you rate your English language skills in general?*
Excellent
Good
Fair
Somewhat poor
Very poor
A. Speaking competence:
Kindly read the following statements and provide your opinion with reference
to your speaking competence.
Statement SD D N A SA
8. I never feel quite sure of myself when I am
speaking in my English class.
9. I don’t worry about making mistakes when
speaking in my language class.
10. I tremble when I know that I’m going to be
called on to speak in class.
11. It frightens me when I don’t understand
what the teacher is saying during class.
12. During English class, I find myself thinking
about things that have nothing to do with the
course.
13. I keep thinking that the other students are
better than me in English.
14. I am usually at ease during oral tests.
15. I start to panic when I have to speak with-
out preparation in my language class.
16. I don’t understand why some people get so
worried about giving oral presentations.
17. While speaking in class, I can get so nervous
I forget things I know.
18. It embarrasses me to volunteer answers in
my language class.
19. I do not get nervous speaking in English
with native speakers of English.
20. I get upset when I don’t understand why I
got bad marks in my oral test.
21. Even if I am well prepared for the oral tasks,
I feel anxious about it.
22. I often feel like not going to my language
class when there is an oral activity.
23. I feel confident when I speak in the English
class.
24. I am afraid that my language teacher is
ready to correct every mistake I make when I
speak.
25. I am afraid that students in my English class
are ready to correct every mistake I make when
I speak.
26. I can feel my heart pounding when I’m
going to be called on to participate in the
English class.
27. I always feel that the other students speak
English better than I do.
28. I feel self-conscious about speaking English
in front of other students.
29. I get confused when I am speaking in my
English class.
30. I feel overwhelmed by the number of rules
you have to learn to speak good English.
31. I am afraid that the other students will laugh
at me when I speak in English.
32. I would probably feel comfortable speaking
around native speakers of English.
33. I get nervous when the teacher asks ques-
tions which I haven’t prepared for in advance.
B. Academic Curricula and Teaching Practices

Kindly read the following statements and provide your opinion with reference
to your speaking competence.
34. My program curriculum includes a general
speaking course.
35. My program curriculum includes a public
speaking course.
36. Our language instructors encourage us to
speak in class.
37. I have given oral presentations during the
course of my study.
38. I have given more than three oral presenta-
tions during the course of my study
39. I have been taught how to give good oral
presentations.
40. Assessment in language courses includes
speaking.
41. Language courses allow for in-class speaking
activities.
42. Our instructors engage us in in-class debates
and discussions.
43. Our language department provides us with
opportunities to practice speaking.
44. Course activities and assignments require
the use of media.
45. Our L2 instructors are fluent in English.
46. Our L2 instructors teach us in English.
47. Speaking courses need to be introduced into
the program.
48. What are the courses which promote speaking and/or giving oral pre-
sentations? (You can select more than one):
a) Major courses
b) Minor courses
c) Language courses
d) Literature courses
e) Linguistics courses
f) Other: (please specify) ____________________________
Extracurricular exposure to spoken English:

Please answer the following Yes/No questions.
49. Do you practice speaking English outside the classroom?a)

Yes
b) No
c) Not sure
50. Do you watch English movies?
a) Yes
b) No
c) Not sure
51. Do you listen to English audio material (e.g. songs, podcasts, audio-
books, etc.)?
a) Yes
b) No
c) Not sure
52. Do you think watching English movies helps enhance students’ speaking
skill?
a) Yes
b) No
c) Not sure
53. Do you think listening to English audio material helps enhance students’
speaking skill?
a) Yes
b) No
c) Not sure
Recommendations
54. Would you like to recommend ways to enhance students’ speaking skill?
a) Yes
b) No
55. Kindly use the space below to provide us with your recommendations or
comments, if any.
References
Abbas, M. (2017). English classroom speaking anxiety among English major students
(Unpublished undergraduate thesis). University of Bahrain.
Abudrees, T. (2017). The differences in applying the aspects of connected speech
between first-year and fourth-year non-native speaking students at the University of
Bahrain (Unpublished undergraduate thesis). University of Bahrain.
Al Asmari, A. (2015). Communicative language teaching in EFL university context:
Challenges for teachers. Journal of Language Teaching and Research, 6(5), 976–984.
Al Hosni, S. (2014). Speaking difficulties encountered by young EFL learners. Interna-
tional Journal of Studies in English Language and Literature (IJSELL), 2(6), 22–30.
Al Jahromi, D. (2012). A study of the use of discussion boards in L2 writing instruction
at the University of Bahrain (Unpublished doctoral thesis). University of Sheffield.
Alhamadi, N. (2014). English speaking learning barriers in Saudi Arabia: A case study of
Tibah University. AWEJ, 5(2), 38–53.
Al-Nasser, A. S. (2015). Problems of English language acquisition in Saudi Arabia: An
exploratory-cum-remedial study. Theory and Practice in Language Studies, 5(8), 1612–1619.
Al-Qahtani, M. F. (2013). Relationship between English language, learning strategies, atti-
tudes, motivation, and students’ academic achievement. Educ. Med. Journal, 5, 19–29.
AlRashid, N. E. (2017). The effectiveness of teaching English speaking and writing in
Bahraini government secondary schools (Unpublished undergraduate thesis). Uni-
versity of Bahrain.
Al-Shboul, M. M., Ahmad, I. S., Nordin, M. S., & Rahman, Z. A. (2013). Foreign
language reading anxiety in a Jordanian EFL context: A qualitative study. English
Language Teaching, 6(6), 1–19.
Arnaiz, P., & Guillen, F. (2012). Self-concept in University-level FL Learners. The
International Journal of the Humanities: Annual Review, 9(4), 81–92.
Bashir, S. (2014). A study of second language-speaking anxiety among ESL intermediate
Pakistani learners. International Journal of English and Education, 3(3), 216–229.
Basic, L. (2011). Speaking anxiety: An obstacle to second language learning? (Unpub-
lished doctoral thesis). University of Gävle.
Bygate, M. (1987). Speaking. Oxford University Press.
Celce-Murcia, M. (2013). Teaching English in the context of world Englishes. In M.
Celce-Murcia, D. M. Brinton, & M. A. Snow (Eds.), Teaching English as a second or
foreign language (4th edition, pp. 2–14). National Geographic Learning/Cengage
Learning.
Coates, J. (2014). Women, men and language: A sociolinguistic account of gender differences in
language. Taylor and Francis.
Dörnyei, Z. (2005). The psychology of the language learner: Individual differences in second
language acquisition. Routledge.
Earl, L., & Katz, S. (2006). Rethinking classroom assessment with a purpose in mind. Western
and Northern Canadian protocol for collaboration in education. Manitoba Education, Citi-
zenship, and Youth. https://digitalcollection.gov.mb.ca/awweb/pdfopener?smd=1&
did=12503&md=1
Elmenfi, F., & Gaibani, A. (2016). The role of social evaluation in influencing public
speaking anxiety of English language learners at Omar Al-Mukhtar University. Arab
World English Journal, 7(3), 496–505.
Ezzi. N. A. (2012) Foreign language anxiety and the young learners: Challenges ahead:
Rethinking English language teaching. In TESOL Arabia conference proceedings: Pro-
ceedings of the 17th TESOL Arabia Conference (Vol. 16, pp. 56–62). TESOL Arabia
Publications.
Fakhri, M. (2012).The relationship between gender and Iranian EFL learners’ foreign
language classroom anxiety. International Journal of Academic Research in Business and
Social Sciences, 2(6), 147–156.
Gan, Z. (2012). Understanding L2 speaking problems: Implications for ESL curriculum
development in a teacher training institution in Hong Kong. Australian Journal of
Teacher Education, 37(1), 43–59.
Gregersen, T. S. (2003). To err is human: A reminder to teachers of language-anxious
students. Foreign Language Annals, 36(1), 25–32.
Hamzah, M. H., & Ting, L. Y. (2010). Teaching speaking skills through group work activities
(A case study at form 2ES1 SMK Damai Jaya Johor). https://core.ac.uk/download/files/
392/11785638.pdf
Hanifa, R. (2018). Factors generating anxiety when learning EFL speaking skills. Studies
in English Language and Education, 5(2), 230–239.
Harden, R. M. & Crosby, J. (2000). The good teacher is more than a lecturer – the
twelve roles of the teacher. Medical Teacher, 22(4), 334–347.
Harmer, J. (2010). How to teach English. Pearson Longman.
He, D. (2013). What makes learners anxious while speaking English: A comparative
study of the perceptions held by university students and teachers in China. Educational
Studies, 39(3), 338–350.
Heng, C. S., Abdullah, A. N., & Yosaf, N. B. (2012). Investigating the construct of
anxiety in relation to speaking skills among ESL tertiary learners. 3L: The Southeast
Asian Journal of English Language Studies, 18(3),155–166.
Gebril, A. & Hidri, S. (2019). Language assessment in the Middle East and North Africa.
[special issue: The status of English language research in the Middle East and North
Africa: An introduction]. Arab Journal of Applied Linguistics, 4(2), i–vi.
Hidri, S. (2018). Assessing spoken language ability: A many-facet Rasch analysis. In S.
Hidri (Ed.), Revisiting the assessment of second language abilities: From theory to practice (pp.
23–48). Springer.
Hidri, S. (2017). Introduction: State-of-the-art of assessing second language abilities. In
S. Hidri (Ed.), Revisiting the assessment of second language abilities: From theory to practice
(pp. 1–19. Springer.
Hismanoglu, M. (2013). Does English language teacher education curriculum promote
CEFR awareness of prospective EFL teacher? Procedia – Social and Behavioral Sciences
Journal, 93, 938–945.
Horwitz, E. K., Horwitz, M. B., & Cope, J. A. (1986). Foreign language classroom
anxiety. The Modern Language Journal, 70(2), 125–132.
Kayaoğlu, M. N., & Sağlamel, H. (2013). Students’ perceptions of language anxiety in
speaking classes. Tarih Kültür ve Sanat Araştırmaları Dergisi, 2(2), 142–160.
Kingen, S. (2000). Teaching language arts in middle schools: Connecting and communicating.
Koo, Y. L. (2009). Mobilising learners through English as lingua franca (ELF): Providing
access to culturally diverse international learners in higher education. Research Journal
of International Studies, 3(9), 45–63.
Koran, S. (2015). Analyzing EFL teachers’ initial job motivation and factors affecting
their motivation in Fezalar Educational Institution in Iraq. Advances in Language and
Literary Studies, 6(1), 72–80.
Krashen, S. (1987). Second language acquisition. Oxford University Press.
Lindsay, C., & Knight P. (2006). Learning and teaching English: A course for teachers.
Oxford University Press.
Liu, H. J. (2012). Understanding EFL undergraduate anxiety in relation to motivation,
autonomy, and language proficiency. Electronic Journal of Foreign Language Teaching, 9
(1), 123–139.
Lu, Z., & Liu, M. (2011). Foreign language anxiety and strategy use: A study with
Chinese undergraduate EFL learners. Journal of Language Teaching and Research, 2(6),
1298–1305.
Mahmoodzadeh, M. (2012). Investigating foreign language speaking anxiety within the
EFL learners’ interlanguage system: The Case of Iranian learners. Journal of Language
Teaching and Research, 3(3), 466–476.
Mak, B. (2011). An exploration of speaking-in-class anxiety with Chinese ESL learners.
System, 39, 202–214.
McCroskey, J. C. (2016). Introduction to rhetorical communication: A Western rhetorical
perspective. Routledge.
McCroskey, J. (1978). Validity of the PRCA as an index of oral communication
apprehension. Communication Monographs, 45(3), 192–203.
Ministry of Labour and Social Affairs in Bahrain. (2017). Workplace requirements for better
and faster employment. Paper presented at Media, Tourism, and Fine Arts Stakeholders’
Forum. University of Bahrain, Bahrain.
Mohammadi, M., & Mousalou, R. (2013). Emotional intelligence, linguistic intelli-
gence, and their relevance to speaking anxiety of EFL learners. Journal of Academic and
Applied Studies, 2(6), 11–22.
National Research Council. (1996). National science education standards. National Acad-
emy Press.
O’Sullivan, B. (2006). Modelling performance in oral language tests: Language testing and eva-
luation. Peter Lang.
Ounis, A. (2017). The assessment of speaking skills at the tertiary level. International
Journal of English Linguistics, 7(4), 95–112.
Park, G. P., & French, B. F. (2013). Gender differences in the foreign language class-
room anxiety scale. System, 41, 462–471.
Pathan, M., Z. Aldersi, & E. Alsout (2014). Speaking in their language: An overview of
major difficulties faced by the Libyan EFL learners in speaking skill. International
Journal of English Language & Translation Studies, 2(3), 96–105.
Rabab’ah, G. (2016). The effect of communication strategy training on the develop-

ment of EFL learners’ strategic competence and oral communicative ability. Journal of
Psycholinguistic Research, 45(3), 625–651.
Richards, J. C. (2008). Teaching listening and speaking: From theory to practice. Cambridge
University Press.
Salman, F. (2016). The power of words (Unpublished undergraduate thesis). University
of Bahrain.
Seville-Troike, M. (2012). Introducing second language acquisition (2nd edition). Cambridge
University Press.
Shively, R. L. (2008). L2 acquisition of [β], [δ], and [γ] in Spanish: Impact of experience,
linguistic environment and learner variables. Southwest Journal of Linguistics, 27(2), 79–114.
Spielberg, C. (1983). Manual for the state-trait anxiety inventory. Consulting Psychologists
Press, Inc.
Taha, T. A., & Wong, F. F. (2016). Foreign language classroom anxiety among Iraqi
students and its relation with gender and achievement. International Journal of Applied
Linguistics and English Literature, 6(1), 305–310.
Taleb, S. M. (2017). Second language speaking anxiety among English manor students at
the University of Bahrain (Unpublished undergraduate thesis). University of Bahrain.
Talley, P. C., & Hui-ling, T. (2014). Implicit and explicit teaching of English speaking
in the EFL classroom. International Journal of Humanities and Social Science, 4(6), 38–46.
Tanveer, M. (2007). Investigation of the factors that cause language anxiety for ESL/
EFL learners in learning speaking skills and the influence it casts on communication
in the target language (Unpublished MA Thesis). University of Glasgow.
Tarone, E. (2005). Speaking in a second language. In E. Hinkel (Ed.), Handbook of
research in second language teaching and learning (pp. 485–502). Lawrence Erlbaum.
Ur, P. (2012). A course in English language teaching (2nd edition). Cambridge University Press.
Vygotsky, L. S. (1986). Thought and language. MIT Press.
Wang, T. (2010). Speaking anxiety: More of a function of personality than language
achievement. Chinese Journal of Applied Linguistics, 33(5), 95–109.
Woodrow, L. (2006). Anxiety and speaking English as a second language. RELC Journal,
37(3), 308–328.
Yahya, M. (2013). Measuring speaking anxiety among speech communication course
students at the Arab American University of Jenin (AAUJ). European Social Sciences
Research Journal, 1(3), 229–248.
Yaikhong, K., & Usaha, S. (2012). A measure of EFL public speaking class anxiety:
Scale development and preliminary validation and reliability. English Language Teach-
ing, 5(12), 23–35.
Young, D. J. (1994). New directions in language anxiety research. In C. A. Klee (Ed.),
Faces in a crowd: The individual learner in multisection courses (pp. 3–46). Heinle & Heinle.
Yusof, R. (2016). Common factors that affect L2 students’ usage of language (Unpub-
lished undergraduate thesis). University of Bahrain.
Zhou, M. (2016). The roles of social anxiety, autonomy, and learning orientation in
second language learning: a structural equation modeling analysis. System, 63, 89–100.
Chapter 12
Planning for positive washback

The case of a listening proficiency test
Caroline Shackleton
Introduction
Test washback has been defined as the effects of tests on teaching and learning;
consequently, any introduction of a new test should plan for positive washback
(Wall, 2013). Assessment tasks, whether summative or formative, should
therefore be designed in a way that engages students in the necessary knowl-
edge, skills, and abilities (KSAs) needed to perform effectively in the real world
beyond the confines of the classroom. Arguably, such a focus is particularly true
for high-stakes proficiency tests, such as those used for school leaving or uni-
versity entrance, where governments and education department – especially
those within the European Union – have been obliged to take the Common
European Framework of Reference for Languages (CEFR) into account. As a
result, new educational initiatives have been plentiful as policy makers attempt
to incorporate competence-based language education (Lim, 2014).
The main purpose of such initiatives is both to promote learning and bring
about a shift in language pedagogy – from knowledge-based to more com-
municative practices – and to validly interpret what has been learned. Despite
the increasing pressure on teachers to be instigators of such changes, for many,
a lack of language assessment literacy (LAL) prevents them from successfully
fulfilling this role (Fulcher, 2012; Hidri 2019, 2018, 2014). Indeed, a lack of
LAL amongst teachers has been widely reported, which is arguably particularly
true in the case of standardized tests (Tsagari & Vogt, 2017). In Tsagari and
Vogt’s study, teachers did not feel that they had the correct training to help
students prepare for tests, and at most, test preparation took the form of
administering past papers without critically evaluating them.
Ultimately, it is teachers who will need to prepare their students for any
standardized test through the provision of support for learning outcomes,
classroom assessments to measure and track students’ progress, and other feed-
back. Students need to be given the tools to reflect on their learning, under-
stand their strengths and weaknesses, and develop learner autonomy in order to
develop life-long learning strategies. Teachers act as mediators between the
language class and the test. As such, it is arguable that the first step in instigating
Planning for positive washback 221
reforms would be a fully comprehensible description of any new test and how
it relates to the present curriculum, together with supporting construct validity
evidence. Tests with good construct validity promote positive washback, and
the move from classroom activities to test tasks should be fluid (Messick, 1996).
A new test must therefore be based on a clear definition of language profi-
ciency and have a strong relationship to the curriculum, and this information
must be provided to teachers. Teachers not only need to understand the cur-
riculum standards and test constructs but be able to relate this knowledge to
their professional practices if they are to bring about the desired washback effect
on student learning.
The present study is situated in the context of one such initiative in Spain,
where education reform laws have been introduced together with a new
communicative, competence-based curriculum. This new curriculum comes
largely in response to the growing demand in Europe for the implementation
of CEFR-related, competence-based curriculums, and as an attempt to
improve the poor results of Spanish students (European Commission, 2012),
and follows years of academic criticism of the previous system. The main cri-
ticism has been that no oral component has, until now, been included in the
exam.1 Furthermore, it has been extensively reported that teachers do indeed
teach to the test, and that consequently, a narrow form of the curriculum is
regularly taught in the classroom, with listening and speaking being largely
ignored (e.g., Amengual Pizarro, 2009; García Laborda & Fernández Álvarez,
2011). Yet, listening is an essential component of communicative competence
(adults spend nearly 50% of their time listening) and plays a key role in suc-
cessful language acquisition (Wagner, 2014), thus contributing to academic
success. This is especially true in the context of university entrance, where
universities are increasingly offering courses taught in English.
The situation in Spain is therefore ripe for change; in order to achieve a
positive impact on teaching and learning, any new assessment should not only
clearly evaluate the competencies outlined in the new curriculum, but also
provide evidence that this is the case. While tests have been shown to bring
about changes in educational systems in many different contexts (Cheng, Sun,
& Ma, 2015), positive change can only be brought about if a test accurately
reflects the aims of the curriculum (Wall, 2013). It is hoped that this study will
be a timely contribution to just such an outcome.
Theoretical background
A central issue in language testing is the question of the theory-defined con-
struct: Before test development can begin, it is essential that a theoretical stance
on the nature of language ability first be taken (Chapelle, 2012). Addressing the
continual debates concerning language proficiency constructs, Bachman (2007)
concludes that both competence and task-based perspectives should be taken
into account. Such an approach resonates well with the CEFR, which provides
222 Caroline Shackleton
proficiency scales outlining both a description of abilities and domains of use.

That is to say, language proficiency is represented in the CEFR by both quality
and quantity of language use, with the language user becoming more proficient
as contexts become more complicated and require more complex language skills.
It is argued that the complexity of KSAs involved in L2 oral comprehension
make it almost impossible to provide a global, comprehensive definition of the
construct (Wagner, 2014). Some authors have centred on a sub-skills approach (e.
g., Munby 1978), yet this approach has been criticized because of the lack of
empirical investigation and the fact that sub-skill separation would be incredibly
difficult to operationalize in test development (Buck, 2001). Other models (e.g.,
Anderson 2009; Buck, 2001) describe listening as a cognitive activity which takes
place online as the listener attempts to process input and – in the context of a
language test – to complete some sort of task. Field (2008a, 2013a) outlines just
such a process-based approach, which draws on research into L1 listening. Here,
listening is described as consisting of five levels of processing: (i) aural input is
decoded, (ii) input is parsed, (iii) propositional meaning is established, (iv) a mental
model is built, and (v) a situation or discourse model is finally constructed.
Input decoding and parsing are bottom-up activities which require the
application of linguistic knowledge (Vandergrift & Goh, 2012). The words in
connected speech need to be identified and segmented, often by relying on
phonetic and phonological clues given by stress and intonation patterns. As
phonological knowledge increases, decoding routines become more automated,
making the task of establishing propositional meaning easier. Propositions are
then integrated and transformed into mental models, which are continuously
adjusted as the listener constantly forms and revises hypotheses (Field, 2008b).
However, the complete meaning of input cannot be derived solely from the
decoded information (Field, 2013a); instead, top-down semantic processing
allows for the listener to draw on their personal schemata and world knowledge
in context, and draw on pragmatic and discourse knowledge stored in the
long-term memory (Vandergrift & Goh, 2012). As such, every utterance must
be interpreted in its particular real-life communicative situation; as Buck argues,
‘meaning is not something in the text that the listener has to extract but is
constructed by the listener in an active process of inferencing and hypothesis
building’ (2001, p.29). By interpreting the interrelated ideas which make up
mental models, listeners are able to link information together and form a dis-
course representation of the input.
Communicative purpose for listening is also an important consideration
affecting how we listen, and a competent listener will be able to select the most
appropriate type of listening for the task at hand (Field, 2008a). After all, ‘in
teaching or in testing, the only way we can establish if “comprehension” has
taken place is to ask some kind of question’ (Field, 2017). Here, there exists a
clear distinction between local and global understanding (Field, 2008a). Besides
the skills and competences outlined above, another essential component in any
communicative language ability model is that of strategic competence, as
evidenced by its inclusion in the CEFR proficiency scales. Specifically, Macaro,

et al., (2007) identify the following meta-cognitive strategies as highly relevant
to the construct definition:
1 Predicting content.
2 Monitoring comprehension.
3 Making inferences.
These strategies mediate between trait and context and, because task specific
behaviours are context relevant, listeners must develop ‘real world strategies’ in
order to achieve comprehension (Field, 2008a). Figure 12.1 shows a repre-
sentation of the proposed theoretical construct for listening ability.
Response
Representation
of speech in
memory
Semantic processing
Discourse
Construction
Prior
Planning Monitoring Inference
knowledge:
world, pragmatic,
Metacognitive strategies
discourse, cultural Meaning

Construction
Parsing
Word string
Linguistic processing
Linguistic
Knowledge Lexical Search
Phonological
string
Input Decoding
Speech Input
Prediction
Figure 12.1 Proposed model of listening ability (based on Field, 2008a, 2013a)
Not only must any theory-based process model of listening ability be repre-
sented in the construct of a new test, but it must be clearly shown that candi-
dates use the same KSAs as they would in the target language use domain
(TLU). Context-specific features of test tasks are normally outlined in the test
specifications, and evidence should be provided that tasks do indeed represent
the proposed TLU. These context specific features should include elements
such as the source of input texts, channel of delivery, number of plays, and the
response format. Most importantly, we need to consider the characteristics of
the input passages and how these will relate to the TLU. A key debate here is
that regarding the authenticity of the audio used.
At present, most language tests use scripts, i.e., written texts which are then read
aloud (Buck, 2018; Wagner, 2014). These texts are often revised and edited before
being produced in a studio by actors; ‘far too often listeners are expected to be able
to understand texts that are meant to be read’ (Vandergrift & Goh, 2012, p.167).
Here, construct under-representation is an obvious threat, as a scripted text lacks
many of the characteristics of natural speech (Field, 2008a, 2013b, 2017; Vander-
grift & Goh, 2012). Indeed, several studies highlight the differences between
spoken and written discourse (for review, see Wagner & Toth, 2017). Natural,
connected speech is very different from the written word and can include gram-
matical mistakes, shorter idea units, and ellipses. Furthermore, it tends to be less
logically organized as a consequence of its unplanned nature (Wagner, 2014). Not
only can spoken language be more colloquial, containing fillers and repetition, but
its intonation patterns carry substantial meaning (Buck, 2018). In contrast, Field
(2013b) argues that actors mark commas and full stops, there are no hesitations or
false starts, and voices rarely overlap. Furthermore, test developers often put in
scripted distractors, making a recording much more informationally dense and
placing too great a strain on the working memory (Field, 2013a).
Consequently, there are many calls for a move towards more authentic input
texts, both for teaching and assessment purposes (e.g., Field, 2008a, 2013a;
Gilmore, 2011; Vandergrift & Goh, 2012; Shackleton, 2018a; Wagner, 2014;
Wagner & Toth, 2017). As Field (2013a, p.143) states, ‘if a test is to adequately
predict how test takers will perform in normal circumstances, it is clearly
desirable that the spoken input should closely resemble that of real-life con-
versational or broadcast sources’. In the case of school leaving/university
entrance tests, the range of genres found in the TLU should be sampled and
these would represent a continuum of aurality (Shohamy & Inbar, 1991; Van-
dergrift & Goh, 2012), from a planned talk to a spontaneous conversation.
A related issue concerns questions of accent, English as a lingua franca (ELF),
and the ongoing debate about the status of the native speaker as an ideal model
for assessment. Most international tests still limit themselves to accents drawn
from the major native-speaker varieties (British English, American/Canadian
English, and Australian English). However, the relevance of standard native-
speaker varieties has more recently been brought into question, and the lack of
ELF-based examples in most language tests has been the subject of criticism
(Jenkins & Leung, 2014). We live in a world in which English is increasingly

used for global communication and where university entrance tests are often
used to gain access to English as a Medium of Instruction (EMI) courses; in order
to reflect this international context, it is becoming ever more apparent that a
wider range of accents needs to be included in assessment procedures. Indeed, it
is worth pointing out here that the CEFR itself has recently removed the notion
of ‘native speaker’ from its proficiency scales (Council of Europe, 2018).
Research problem
The proposed new test must take the above debates into account; if positive
washback is to be encouraged in the present context, I would argue that both
authentic input texts and a range of both native and non-native speaker varieties
be included in the test construct. There are further reasons why this should
indeed be the case: If a new test is to be used for university admissions, it should
be clear just what competencies are being assessed, and here a CEFR-related test
can provide test users with a well-defined description. I would also contend that
there are a number of reasons why any CEFR-based test for the context under
discussion must be aimed at a B2 CEFR level; not only do the B2 descriptors
most resemble current Spanish curriculum requirements, but B2 is currently the
required university entrance level for most other European countries, as it is
considered to be the most appropriate level for basic academic study and work
insertion. While there have been some doubts expressed as to whether a higher
level may be necessary (Taylor & Geranpayeh, 2011), B2 is generally felt to be a
feasible minimum. For example, Carlsen (2018) reported that students entering
Norwegian universities with a proficiency level lower than B2 lacked the
necessary language skills for success on their courses. Such findings would suggest
that for the Spanish education system to keep in line with other European
countries, an ideal scenario would see students leaving upper secondary school
with a B2 level minimum (Deygers, Zeidler, Vilcu, & Hamnes Carlsen, 2018;
Lim, 2014). In light of the above issues, the motivation for the present study can
be framed as the need to develop a test which engages the proposed listening
ability model, which incorporates authentic discourse (including a variety of
accents), and which adequately reflects those CEFR B2 competences relevant to
the context of school leaving/university entrance demands.
Rationale
The correct identification of the TLU is clearly a key factor in the successful
operationalization of any test construct and, as such, the test specifications
should reflect those CEFR B2 abilities which students would be expected to
employ beyond the confines of the test in a school leaving/university entrance
context. Accordingly, the current study drew upon the following sources in
order to develop its specifications:
1 A survey of topics taught during baccalaureate.

2 CEFR B2 listening descriptors.
3 Types of listening based on specific purpose for listening.
Following the previous discussion, it was decided to use only authentic audio
files sourced from the internet or produced as a natural response to prompts in
order to obtain samples of non-adapted natural discourse which would include a
variety of accents (including one L2 speaker). Four different tasks were chosen, so
as to create adequate construct coverage and to minimize the task effect by
including a mix of task types. Each sound file lasts between three and five minutes
and is to be heard twice. In order to develop tasks that correspond to real-world
communicative events, purposeful items based on expert behaviour need to be
developed. To this end, a textmapping protocol (see Green, 2017) was followed. This
process makes no reference to a transcript; instead, after the purpose for listening
has been decided, a group of experts notes down the salient ideas taken away from
a given audio in order to replicate the real-world listening process as faithfully as
possible. In this way, an attempt is made to replicate specific types of listening
(Weir, 2005, p.101) by reaching a consensus on meaning, thereby modelling the
activity on expert cognitive processing behaviour as suggested by Field (2008a).
The development of all subsequent items is then based on this understanding of
the audio material in question.
Table 12.1 gives a breakdown and brief description of the four tasks based on
audios which were considered to be suitable for exploitation in accordance
Table 12.1 Test description

Task Description Type of listening and response mode
Task 1 Opinions about sport: Gist/Main ideas (G/MI).
Audio developed using controversial questions Multiple Match (MM).
related to sport. The utterances collected are
propositionally and linguistically complex and
contain abstract as well as concrete ideas.
Task 2 Moving to the USA: Main ideas with supporting
A talk sourced from the internet about moving to the details (MISD)/Listening to infer
USA from Mexico given by a Mexican. The speaker (propositional) meaning (IPM).
explains his move to the USA and includes opinions Multiple choice (MCQ)
and attitudes as well as cause and effect links.
Task 3 Text messaging: Main ideas with supporting
A radio interview sourced from the internet with an details (MISD)/Listening to infer
academic about her research into language use in (propositional) meaning (IPM).
text messaging. Multiple choice (MCQ)
Task 4 Geography trip: Selective listening/Specific
Constructed using prompts to produce a lecture- information and Important
type audio conveying quite informationally dense details (SI/ID).
instructions about a forthcoming school trip. Note form (NF)
with the test specifications. A small pilot study was carried out and items which
appeared to discriminate badly or be too easy/difficult for the pilot population
were removed. In order to discover if the test represents the proposed cognitive
processing view of listening, the following research question was addressed:
To what extent does the behaviour elicited from a test taker correspond to
the relevant knowledge, skills and abilities that would be required of him/
her in a real-world context?
Method
Verbal protocol methodology

In order to discover whether a test engages the abilities it intends to assess, one
extremely useful research tool is verbal protocol methodology (both ‘think
aloud’ and retrospective methods). Although concurrent ‘think alouds’ can be
used at the pre-listening stage to collect information about metacognitive plan-
ning and prediction strategies, the fact that listening is an online process makes
it impossible to collect think aloud data while a participant is performing the
task, and retrospective methods must instead be used. Participants are asked to
recall their thought processes immediately after completing the task, whilst they
are still in the short-term memory. The question paper acts as a ‘prompt’ for
aiding recall and further probing questions can be asked to encourage partici-
pants to give more useful information.
Once the verbal protocol recordings have been collected, they must be
transcribed, segmented, and coded, following a coding scheme which repre-
sents the data. Here, the present study is based on a theoretical framework of
listening ability, and therefore the framework itself can be used as a base for the
coding scheme (Gu, 2014). The data can then be analyzed both qualitatively
and quantitatively using frequency counts, thereby providing a rich description
of just how test items are solved in order to discover if the relevant KSAs are
being used. In this way, construct-irrelevant processes not revealed by the test
score themselves may subsequently be investigated.
However, this methodology does have some limitations: data collection is
time-consuming and so, typically, only small samples are used (Green, 1998);
reports may be incomplete or inaccurate due to the heavy reliance on memory
(Banerjee, 2004); and some strategies may not be reported by proficient users
who have more automated abilities (Phakiti, 2003). Notwithstanding these
criticisms, the methodology may still be considered superior to other research
tools currently used to investigate process and strategy use, such as ques-
tionnaires – which only report on those strategies participants themselves think
they have used – or expert judgements (see Alderson (1993) for a more detailed
discussion on limitations).
Data collection
After piloting the methodology with two participants, seven volunteers esti-
mated to have a CEFR B2 listening proficiency level were enrolled (male = 4,
female = 3), and the following two-stage design was employed:
1 Concurrent ‘think alouds’.
Participants were asked to verbalize their thought patterns whilst preparing to

do each task in order to allow for a qualitative analysis of planning and pre-
diction strategies.
1 Retrospective verbal reports.
Whilst finalizing their answers, participants explained how they had reached the
answer to each item. Each of the four tasks was completed separately in order to
reduce the time lag between doing the test and reporting it to a minimum. Partici-
pants were given the option of reporting in their L1 in order to reduce the cognitive
load when expressing their thoughts (Banerjee, 2004), although in the event only
one participant chose to actually do this. Once collected, the reports were transcribed
in preparation for coding using Qualitative Data Analysis (QDA) software Minor Lite.
The data was coded separately for each item on the test as a representation of the level
of processing reached in order to correctly solve the item. These levels of processing
were drawn directly from the listening ability model and are as follows:
L – Lexical recognition: The understanding of isolated vocabulary from the
audio input.
IU – Idea unit: A proposition, which could be as little as a noun phrase
(Buck, 2001, p.27–28), is used to answer the item. This is understanding at a
very literal level and includes local factual information.
MR – Meaning representation: The listener relates a proposition to the
context and uses prior knowledge in order to interpret meaning.
DR – Discourse representation: The listener is able to integrate information
into a wider semantic representation, including speaker intention.
In order to generalize from the results, reliability checks must be carried out.
In the present study, the researcher re-coded one entire protocol six months
after the original coding. The resulting intra-coder agreement between both
sessions was 87% exact agreement and Cohen’s Kappa, which takes into
account agreement by chance, was 0.782 (p <.0.001), 95% CI (0.65, 0.91), a
substantial agreement according to Landis and Koch (1977). The results are
presented quantitatively with illustrative examples in order to draw conclusions
about the listening processes necessary to answer the test items (for further
examples see Shackleton, 2018b). Furthermore, as construct-irrelevant strategies
would obviously pose a threat to the validity of the test, it was also decided to
report qualitatively on candidates’ strategy use.
Data analysis
Test tasks provide the candidate with a purpose for listening and informa-
tion about the context at hand, information which can then be acted upon
to activate relevant schemata and generate hypotheses (Shohamy & Inbar,
1991). The concurrent reports from this planning stage were analyzed
according to emerging themes in QDA Minor Lite and were categorized as
follows:
1 Use of task title and picture to activate relevant schemata.

2 Use of ‘key words’ in items to be sure of purpose for listening and to
activate schemata.
3 Prediction using previous knowledge schemata.
This part of the test taking process has been called ‘assessing the situation’
(Buck, 2001 p.104) and all participants in the present study reported such
strategy use. By activating relevant schemata using cues from the task title,
picture, and items they were able to make predictions about the content of the
audio files. This strategy use was especially evident on the MCQ items, which
allowed the participants to build a skeleton story of just what they were going
to hear. Building on previous knowledge schemata, they also made predictions
based on their knowledge of the world and personal experiences, as can be seen
in the example below referring to Task 2:
Example:
So, he’s talking first about the flat and then when he arrives what he’s
going to do then what was hard at first in USA. What he had to learn, the
differences between Mexico and USA, why he felt accepted … I’m
thinking about key points.
OK I understand. Cos I’ve been living in USA and I understand the
situation.
In this one he finds it difficult … for me it will be name or accent, cos
that’s what happened to me.
And he wants Americans to know … for me it would be to know
where he’s from… people see Mexican people like they are from a village
and they don’t have culture and internet and things like that.
(Participant 7, Task 2)
The results of the retrospective reports are presented as the highest level of
processing reached to correctly answer the items on each of the tasks. The
results for Task 1 – the gist task – are shown in Figure 12.2, where it can be
seen that most of the correct answers were arrived at as a result of a meaning
representation of the sound file, while the three highest scoring participants
reached a discourse representation on three of the items.
Discourse
representation
Listening process
Meaning
representation
Idea unit
Iexical recognition
0 2 4 6 8 10 12 14 16 18 20
Frequency
Figure 12.2 Highest levels of processing reached in Task 1
As expected, the metacognitive strategy of ‘inference’ was highly evident,

and participants were able to use the cognitive environment along with prag-
matic knowledge to answer items when the audio was only partially under-
stood. This was found to be true even when only isolated idea units or lexical
items had been understood.
Example:
I heard ‘I completely agree’ and so I think it’s this because you can’t
disagree about women doing sport as it is not accepted. And I heard
about people fighting against each other, this was another key piece of
information for me … people fighting. This is a boxing competition.
(Participant 4, Q1.5)
Task 2 is a MISD/IPM task and as such we would expect the candidates to

understand and follow a larger proportion of the audio. Figure 12.3 shows this
is indeed the case and that most correct answers were reached due to a dis-
course representation of the text.
An example of such a discourse representation, where information is inte-
grated in order to build a full representation of speech in memory, is given
below for the following item:
Q7. Finding somewhere to live was _____________

(A) A. almost impossible
(B) B. relatively easy
(C) C. time consuming
(D) D. a learning process
Example:
Discourse
representation
Listening process
Meaning
representation
Idea unit
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Frequency
At first, his main problem was to find somewhere to live so he found a guy
online that was renting part of his apartment. I hear that it was not diffi-
cult, he don’t use this exact words, but he said something like he contact
by email and it was relatively easy because he sent the email, he get the
answer about the flat and then his professor pick up him from the airport
and drop directly to the apartment so for me that’s easy, it’s not time-
consuming and he said ‘that’s it’, the professor took him to the flat and he
was there.
In many cases, the participants used inference to decipher the speaker’s under-
lying intentions, as shown in the example below.
Example:
He wants people to know he is from Mexico because Americans think it is

a low country, like they don’t know they are like them they think they are
under them and he wants to show them a new experience … a new
chance. He’s explaining how people get shocked when he says he’s from
Mexico because they have a stereotype about how they look like and how
they are behind them and things like that. That’s why he says that it’s
important to him for people to know that he’s from Mexico.
The metacognitive strategy of monitoring was also highly evident for this task,
especially for discarding distractors, as can be seen in the following example:
Example:
… he said that the American people, they try to guess where he’s come
from and the people say France, Poland. I’m 60% sure that it is ‘work out
his accent’. ‘Try to get to know him’, I think no. I discard ‘understand his
accent’ no because he can communicate, they can understand fluently …
yeah he said something about his name but this was more about the phy-
sical aspect, he don’t look like the typical Mexican guy, it’s not about say
his name … that wasn’t the meaning.
Similar results were found for Task 3, which is also a MISD/IPM task, but
includes a dialogue rather than a monologue. Figure 12.4 shows the level of
processing reached in order to answer items correctly for this task.
As in Task 2, most items were answered by reaching a discourse representa-
tion of the audio. In the instances where the item was answered correctly by
simply understanding an idea unit, meaning was created by using inference,
monitoring, and contextual clues. Here, listening may be seen as a problem-
solving process which includes a combination of strategies used in an orche-
strated way (Vandergrift, 2003).
Task 4 is a NF task and the intention is to test search listening for specific
information and important details. Figure 12.5 shows the level of processing
reached to answer the items on this task.
Although most of the correct answers were arrived at through a discourse
representation of the text, it should be noted that due to the nature of the
input audio – a teacher giving a class quite factual instructions about an
upcoming geography trip – the discourse was quite straightforward and con-
tained mainly local factual information. As might be expected, therefore, it was
Discourse
representation
Listening process
Meaning
representation
Idea unit
0 2 4 6 8 10 12 14 16 18 20 22 24
Frequency

Discourse
representation
Listening process
Meaning
representation
Idea unit
Iexical recognition
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Frequency
found that strategy use was minimum and items were mainly answered by
applying linguistic knowledge, although intonation patterns were also used as
clues signalling important information.
Example:
Here, I understood something about medical records, and she emphasized

this as if it was something really important she wanted to say.
To summarize, it can be seen that the participants demonstrated that they were
using the knowledge and skills proposed by the listening ability model in order
to solve test items. This observation can be seen in more detail in Table 12.2,
which gives the level of processing reached on an item-by-item basis. Table
12.2 shows that the most difficult items on the test for this small group of
participants were Q3.5 and Q4.4 – two of the most difficult items, as shown by
a Rasch analysis of test scores. There were only two correct answers in which
evidence of proper recourse to the listening ability model was not observed
(items Q3.3 and Q3.7), though this may simply have been due to the lack of
full reporting. Incorrectly answered items (26%) were the results of candidates
either missing the information or being unable to decode sufficient input.
Discussion
While the CEFR outlines listening behaviours for each proficiency level, it
does not provide a clear description of just which processes and strategies
should be used at each level and, as such, these are open to interpretation and
must necessarily be extrapolated from the ‘can dos’. The CEFR describes B2
level listeners as being able to understand main ideas and follow conversations
and talks, descriptors which would suggest meaning and discourse representa-
tions of the input. This was indeed found to be the case in the present data,
Table 12.2 Frequencies of level of listening process reached for each correct item
Lexical Idea unit Meaning Discourse No repor- Number of
recognition representation representation ted correct
processes responses
(N=7)
Q1.1 1 0 2 2 0 5
Q1.2 0 0 3 0 0 3
Q1.3 0 2 2 0 0 4
Q1.4 0 1 4 0 0 5
Q1.5 1 2 2 0 0 5
Q1.6 2 2 3 0 0 7
Q1.7 0 1 2 1 0 4
Q2.1 0 0 2 4 0 6
Q2.2 0 0 2 5 0 7
Q2.3 0 0 2 4 0 6
Q2.4 0 0 2 3 0 5
Q2.5 0 0 3 4 0 7
Q2.6 0 0 2 2 0 4
Q2.7 0 0 3 4 0 7
Q2.8 0 1 2 2 0 5
Q3.1 0 0 2 5 0 7
Q3.2 0 0 5 2 0 7
Q3.3 0 0 2 3 1 7
Q3.4 0 1 0 2 0 3
Q3.5 0 1 1 0 0 2
Q3.6 0 0 1 4 0 5
Q3.7 0 0 2 3 1 7
Q3.8 0 1 2 3 0 6
Q4.1 0 2 1 1 0 4
Q4.2 0 1 2 4 0 7
Q4.3 0 0 2 1 0 3
Q4.4 0 1 0 1 0 2
Q4.5 0 0 1 2 0 3
Q4.6 0 1 1 5 0 7
Q4.7 0 0 0 6 0 6
Q4.8 1 2 1 1 0 5
Q4.9 0 1 1 3 0 5
Q4.10 0 2 0 3 0 5
Total 5 22 60 80 2 171
where most correct answers were shown to be reached from higher levels of
processing. In contrast, lower level listeners would have had difficulty reaching
meaning and discourse representations as they would need to focus their
attention on lower level perceptual processing (Field, 2017). Correct answers
based on the understanding of individual vocabulary items and idea units were
found to be minimum, and in all cases were shown to be the result of recourse
to meta-cognitive strategy use.
In terms of construct-irrelevant variance, none of the guessing strategies
during the pre-listening stage led to the correct answer being ascertained.
Nevertheless, one participant – the lowest scoring – appeared to guess two
answers correctly for Task 3 (an MCQ task) without understanding the audio,
that is to say, without proper recourse to the listening ability model. Such a
finding has also been reported in previous studies regarding MCQ items (Yi’an,
1998). Nevertheless, and in contrast to Yang (2000), who found that 48% to
64% of the listening and reading items on the old TOEFL Practice Test B
could be answered using construct irrelevant test-wiseness strategies, the figure
for the present study represented only 1% of total correct answers.
It can therefore be argued that the test scores measure the construct well and
may be meaningfully interpreted for generalization. In this regard, the verbal
reports demonstrated that higher-scoring participants understood the audios
better than lower-scoring participants and that the listening process model was
very much evidenced.
In sum, the results presented above provide ample evidence that the participants
in the present study did indeed follow the proposed listening ability model. Here,
CEFR B2 level participants demonstrated fairly automated listening skills and were
seen to be able to use their world knowledge (including pragmatic, contextual,
semantic, and inferential information) along with meta-cognitive strategy use in
order to construct meaning.
Implications and recommendations

Language constructs are central to any agenda for improving LAL for class-
room teachers. Teachers need to be able to describe and define language
proficiency (Davies, 2008), and the construct itself is the ‘what’ that we are
testing (Inbar-Lourie, 2008). In the context of educational reform, it would
seem then that the starting point would be the provision of a construct valid
exam, one which represents the new curriculum reforms in order to bring
about positive washback on teaching and learning. It should be noted here
that many washback studies have highlighted the importance of the teacher’s
role (see Cheng et al., 2015 for review); teacher involvement is paramount,
and it is therefore of fundamental importance that new test constructs and
methodologies be fully understood (Wang, 2010). Teachers clearly need to
have a sufficient understanding of the learning process and expected out-
comes, and both the provision of sufficient resources and teacher in-service
training and support are therefore essential. LAL agendas should also include
general KSAs and principles of assessment (Fulcher, 2012), as the introduc-
tion of better formative assessments in parallel with high-stakes proficiency
tests will lead to improved test washback through the combination of
assessment for learning with assessment of learning.
A test which successfully represents the construct will hopefully encourage
teachers teaching to the test to teach to the construct itself; as a consequence,
listening skills and strategy training should necessarily become part of classroom
practice. Here, curriculums would need to take into account all aspects of the
listening process model and as such there are clear implications for pedagogy:
Listeners need training in lower-level decoding techniques in order to recog-
nize word boundaries and lexical chunks in spoken discourse (Cauldwell, 2018;
Field, 2008a). Once these decoding routines become more automated, the
working memory is freed up to be able to perform higher order meaning
building functions. Furthermore, strategy awareness training, shown to have
positive effects on learners listening ability (Vandergrift & Tafaghodtari 2010;
Zhang, 2012), would also be necessary. Indeed, a number of researchers share a
strong belief that by teaching listening strategies, we are actually teaching lear-
ners how to listen (Siegel, 2015; Vandergrift & Goh, 2012).
Such learning activities would entail learners listening for a communicative
purpose in order to develop core skills (Vandergrift & Goh, 2012). As these
core skills are included as part of the test specifications, their promotion would
facilitate a seamless transition from classroom activities to test tasks. If a test
construct is broad enough that it promotes teaching to a test which both
includes all the necessary competences for successful listening and is also a
reflection of real-life listening tasks, beneficial washback will be achieved
(Vandergrift & Goh, 2012).
In addition to the above-mentioned steps, authentic listening materials would
also need to be introduced into the classroom, thereby responding to the
numerous calls in the literature to expose students to natural language as it
becomes more and more recognized that learners need to develop the ability to
understand real-world connected speech. Such materials would represent a major
change to baccalaureate courses; a study of the text books presently in use shows
that all audio texts currently employed follow the traditional format of unnatural
sounding scripted recordings produced using actors. In contrast, the use of
unscripted audios in assessments would mean that both materials developers and
teachers would be obliged to incorporate authentic materials (Wagner & Toth,
2017 p.78). Here, studies have shown that more gains in listening ability are
made by groups of students exposed to authentic rather than scripted audio (e.g.,
Gilmore, 2011), thereby confirming that such a move could lead to positive
washback. Learner autonomy could possibly also be encouraged as students begin
to listen to other authentic materials outside the classroom.
Teachers themselves could become centrally involved in the test develop-
ment process in order to encourage understanding and foster a feeling of
ownership. Such a bottom-up approach would offer considerable LAL learning

opportunities. Training opportunities could also be provided in the standards
underlying the curriculum, which are often vague descriptions of learning
outcomes. For example, involving teachers in CEFR alignment activities can
help teachers to understand testing mandates. Activities such as including tea-
chers in CEFR standard setting procedures can be a useful way of improving
their knowledge and understanding of the CEFR.
Conclusion
Given that listening is undeniably one of the core academic skills, it is essential
that any new test demonstrate good construct validity if it is to be successfully
used to encourage the development of beneficial washback effects in the class-
room and beyond. This study has evaluated a listening test created with the aim
of producing positive washback in the context of the Spanish education system.
The study is theory-driven – based on a cognitive process view of listening
ability well-founded in research, it provides ‘strong’ construct validity evidence
for the test (Kane, 2001). It is proposed that the introduction of the test would
lead to serious positive changes in language pedagogy. The inclusion of
authentic sound files is considered to be a major improvement on most listen-
ing tests. If we are serious in our desire to help learners understand authentic
speech – which is unpredictable in nature – then ‘… serious tests must start
using construct-valid spoken texts’ (Buck, 2018, p. xv).
Educational reforms increasingly rely on the introduction of new assessment
procedures in order to improve the quality of education (Chalhoub-Deville, 2016).
In the Spanish context, policymakers need to ensure that any educational reforms
are properly implemented if we are to achieve a positive impact on teaching and
learning. The development of a more communicative L2 English-language class-
room, one which responds to the realities of actual language use, requires not
simply the optimization of assessment of learning, but – fundamentally – assessment
for learning. Clearly, the role of teachers themselves is pivotal to this process, and
consequently they must be provided with LAL in order to be able to implement an
array of relevant assessment techniques relevant to their students’ needs. A key
element of LAL is the ability to evaluate and criticize tests and an understanding of
the core constructs underlying assessment practice. To this end, teachers need to be
provided with convincing validity evidence. Indeed, the provision of evidence such
as that given in the present study, would go a long way to simplifying training and
ensuring they were more likely to make positive evaluations of the test, thereby
involving them directly in creating positive washback in the education system.
Note
1 However, a listening component has been introduced in the provinces of Galicia and
Catalonia.
References
Alderson, J. C. (1993). Judgements in language testing. In D. Douglas, & C. Chapelle
(Eds.), A new decade of language testing research (pp. 46–57). TESOL.
Amengual Pizarro, M. (2009). Does the English test in the Spanish university entrance
examination influence the teaching of English? English Studies, 90(5), 582–598.
Anderson, J. R. (2009). Cognitive psychology and its implications. Worth Publishers.
Bachman, L. F. (2007). What is the construct? The dialectic of abilities and contexts in
defining constructs in language assessment. In J. Fox, M. Wesche, D. Bayliss, et al.
(Eds.), Language testing reconsidered (pp. 41–71). University of Ottawa Press.
Banerjee, J. (2004). Reference supplement to the preliminary pilot version of the manual for
relating language examinations to the CEF: Section D: Qualitative analysis methods. Council
of Europe. https://rm.coe.int/1680667a1f
Buck, G. (2001). Assessing listening. Cambridge University Press.
Buck, G. (2018). Preface. In G. J. Ockey, & E. Wagner (Eds.), Assessment of second lan-
guage listening: Moving towards authenticity. John Benjamins.
Carlsen, C. H. (2018). The adequacy of the B2 level as university entrance requirement.
Language Assessment Quarterly, 15(1), 75–89.
Cauldwell, R. T. (2018). A syllabus for listening decoding. Speech in Action.
Chalhoub-Deville, M. (2016). Validity theory: Reform policies, accountability testing,
and consequences. Language Testing, 33(4), 453–472.
Chapelle, C. A. (2012). Validity argument for language assessment: The framework is
simple. Language Testing, 29(1), 19–27.
Cheng, L., Sun, Y., & Ma, J. (2015). Review of washback research literature within
Kane’s argument-based validation framework. Language Teaching, 48, 436–470.
Council of Europe. (2018). Common European framework of reference for languages: Learning,
teaching, assessment. Companion volume with new descriptors. Council of Europe.
327–347.
Deygers, B., Zeidler, B., Vilcu, D., & Hamnes Carlsen, C. (2018). One framework to
unite them all? Use of the CEFR in European university entrance policies. Language
Assessment Quarterly, 15(1), 3–15.
European Commission. (2012). First European survey on language competences: Final report.
European Commission.
Field, J. (2013a). Cognitive validity. In A. Geranpayeh, & L. Taylor (Eds.), Examining
listening: Research and practice in assessing second language Listening (pp. 77–151). Cam-
bridge University Press.
Field, J. (2013b). Good at listening or good at listening tests?[Conference presentation]. ANUPI.
Huatulco.
Field, J. (2008a). Listening in the language classroom. Cambridge University Press.
Field, J. (2017). Mind the gap: Listening tests versus real world listening [Conference presenta-
tion]. IATEFL TEASIG Conference, CRELLA, University of Bedfordshire. https://
tea.iatefl.org/wp-content/uploads/2015/10/John-Field_Mind-the-gap-TEA-SI
G-Oct-17-delivered.pdf
Field, J. (2008b). Revising segmentation hypotheses in first and second language listen-
ing. System, 36, 35–51.
Quarterly, 9(2), 113–132.
García Laborda, J., & Fernández Álvarez, M. (2011). Teachers’ opinions towards the
integration of oral tasks in the Spanish university examination. International Journal of
Language Studies, 5(3), 1–12.
Gilmore, A. (2011). ‘I prefer not text’: Developing Japanese learners’ communicative
competence with authentic materials. Language Learning, 61, 786–819.
Green, A. (1998). Verbal Protocol Analysis in language testing research: A handbook. Cam-
bridge University Press.
Green, R. (2017). Designing listening tests: A practical approach. Palgrave Macmillan.
Gu, Y. (2014). To code or not to code: Dilemmas in analysing think-aloud protocols in
learning strategies research. System, 43, 74–81.
Hidri, S. (2018). Assessing spoken language ability: A Many-Facet Rasch analysis. In S.
Hidri (Ed.), Revisiting the assessment of second language abilities: From theory to practice (pp.
23–48). Springer.
Hidri, S. (2014). Developing and evaluating a dynamic assessment of listening compre-
hension in an EFL context. Language Testing in Asia, 4(4), 1–19.
Hidri, S. (2019). State-of-the-art of assessment in Tunisia: The case of testing listening
comprehension. In S. Hidri (Ed.), English language teaching research in the Middle East
and North Africa: Multiple perspectives (pp. 29–60). Palgrave MacMillan.
Inbar-Lourie, O. (2008). Constructing an assessment knowledge base: A focus on lan-
guage assessment courses. Language Testing, 25(3), 385–402.
Jenkins, J., & Leung, C. (2014). English as a lingua franca. In A. J. Kunnan (Ed.), The
companion to language assessment (pp. 1605–1616). Wiley-Blackwell.
Kane, M. (2001). Current concerns in validity theory. Journal of Educational Measurement,
38(4), 319–342.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for
categorical data. Biometrics, 33, 159–174.
Lim, G. S. (2014). Assessing English in Europe. In A. J. Kunnan (Ed.), The companion to
language assessment (pp. 1700–1708). Wiley-Blackwell.
Macaro, E., Graham, S., & Vanderplank, R. (2007). A review of listening strategies: Focus
on sources of knowledge and on success. In A. D. Cohen, & E. Macaro (Eds.), Language
learner strategies: 30 years of research and practice (pp. 165–185). Oxford University Press.
Messick, S. (1996). Validity and washback in language testing. Language Testing, 13,
241–256.
Munby, J. (1978). Communicative syllabus design. Cambridge University Press.
Phakiti, A. (2003). A closer look at the relationship of cognitive and metacognitive strat-
egy use to EFL reading achievement test performance. Language Testing, 20(1), 26–56.
Siegel, J. (2015). Exploring listening strategy instruction through action research. Palgrave
Macmillan.
Shackleton, C. (2018a). Developing CEFR-related language proficiency tests: A focus
on the role of piloting. Language Learning in Higher Education, 8(2), 333–352.
Shackleton, C. (2018b). An initial validity argument for a new B2 CEFR-related baccalaureate
listening test (Publication No. 9788491639213) [Doctoral dissertation, University of
Granada]. DIGIBUG. http://hdl.handle.net/10481/52426
Shohamy, E., & Inbar, O. (1991). Validation of listening comprehension tests: The
effect of text and question type. Language Testing, 8(1), 23–40.
Taylor, L., & Geranpayeh, A. (2011). Assessing listening for academic purposes: Defin-
ing and operationalising the test construct. Journal of English for Academic Purposes, 10,
89–101.
Tsagari, D., & Vogt, K. (2017). Assessment literacy of foreign language teachers around
Europe: research, challenges and future prospects. Papers in Language Testing and
Vandergrift, L. (2003). Orchestrating strategy use: Toward a model of the skilled second
language listener. Language Learning, 53(3), 463–496.
Vandergrift, L., & Goh, C. (2012). Teaching and learning second language listening: Meta-
cognition in action. Routledge.
Vandergrift, L., & Tafaghodtari, M. H. (2010). Teaching students how to listen does
make a difference: An empirical study. Language Learning, 60, 470–497.
Wagner, E. (2014). Assessing listening. In A. J. Kunnan (Ed.), The companion to language
assessment (pp. 47–63). Wiley-Blackwell.
Wagner, E., & Toth, P. (2017). The role of pronunciation in the assessment of second
language listening ability. In T. Isaacs, & P. Trofimovich (Eds.), Second language pro-
nunciation assessment: Interdisciplinary perspectives (pp. 72–92). Multilingual Matters.
Wall, D. (2013). Washback in language assessment. In C. A. Chapelle (Ed.), The ency-
clopedia of applied linguistics. Blackwell Publishing.
Wang, J. (2010). A study of the role of the ‘teacher factor’ in washback [Unpublished
doctoral dissertation]. McGill University.
Weir, C. J. (2005). Language testing and validation: An evidence-based approach. Palgrave
Macmillan.
Yang, P. (2000). Effects of test-wiseness upon performance on the test of English as a
foreign language [Unpublished doctoral dissertation]. University of Alberta.
Yi’an, W. (1998). What do tests of listening comprehension test? A retrospection study
of EFL test-takers performing a multiple-choice task. Language Testing, 15(1), 21–44.
Zhang, Y. (2012). The impact of listening strategy on listening comprehension. Theory
and Practice in Language Studies, 2(3), 625–629.
Chapter 13
Testing abilities to understand,

not ignorance or intelligence
Social interactive assessment: Receive,
appreciate, summarize, ask!
Tim Murphey
Introduction: Literary review

In a 2011 TEDTalk entitled ‘5 ways to listen better’, Julian Treasure suggested
that we could improve our listening and interaction skills with the acronym
RASA:
05:55 ‘…RASA, which is the Sanskrit word for ‘juice’ or ‘essence’ … stands
for ‘Receive’, which means pay attention to the person; ‘Appreciate’,
making little noises like ‘hmm’, ‘oh’, ‘OK’; ‘Summarize’ – the word ‘so’ is
very important in communication; and ‘Ask’, ask questions afterwards’.
(Treasure, 2011)
Treasure is basically presenting good social communication techniques that are

generative tools that allow us to interact efficiently in order to learn and increase
our efficacy. I would only amend his last statement to ‘Ask questions con-
tinually!’ as questions are the sparks that light the interactional fire. While all
the elements of Treasure’s acronym are important, I will be concentrating on
‘asking’ as a technique that most students (and people generally) seem to avoid
for fear of looking ignorant or because of learned shyness. More specifically, I
will be describing a social testing procedure that allows students to ask each
other questions, after a first period of trying it on their own, which seems to
teach them more than merely the information in the test; it also creates a
positive environment of learning.
While I contend that everyday asking needs to be more robustly developed
in our societies to help us learn better, I also see it leading our minds to deeper
critical thinking as Macedo describes below when translating Freire (italics added):
A central theme in Paulo Freire’s work is his insistence on the need for
readers to adopt a critical attitude when reading a text. That is, readers
should critically evaluate the text and not passively accept what is said just
because the author said it. Readers must always be prepared to question and
to doubt what they have read.
242 Tim Murphey
Though I have long been inspired by Freire’s challenge, I must admit

that I did not so profoundly understand the significance of his insistence
that readers be critical and predisposed to question and doubt until I con-
fronted the many dimensions of Freire’s thought while translating the
contents of this book.
(Macedo, p.vii in Freire, 1985)
Dufva (2013) also advocates for more negotiation in place of singular answers
in learning through ‘translanguaging’:
… the assumed stability and singularity of norms and the entailing policy
of ‘one correct answer’ is maintained in classrooms, exams, and language
tests. The alternative views advocate subjecting the norms and language
use to negotiation, and not only for tolerating but also promoting ‘trans-
languaging’ in the classrooms.
(Dufva, 2013, p. 58; see also Creese & Blackledge, 2010)
Freeman (1998) says much the same with his dedication of his book to his wife:
‘To Ann Freeman, who has shown me that teaching is about asking questions,
and that in asking questions, you will learn’ (p. vi).
The earliest known advocate of asking was probably Socrates with what is
now called the Socratic method. In this method, interlocutors engage in
cooperative, argumentative dialogue through asking and answering questions
in order to stimulate critical thinking and challenge deeper thinking and
clarifications. This method can be used to great advantage when creating
well-argued theses and arguments, but as in a courtroom, it can sometimes
lead to violent and disparaging language that is counter to collaborative
creative thinking.
Canfield and Hansen’s (1995) The Aladdin factor explores the mostly positive
dimensions of asking and argues for the good effects of daring to ask for what
you want in life, based around the old story of Aladdin and his magic lamp,
containing the repeated phrase, ‘Ask and it shall be given’. The book argues
that we do not ask enough for many reasons and that we should be asking
others and our universe for better things, and that asking is the way to create a
better world. The authors mention five barriers to asking in their first chapter:
ignorance (don’t know how to); limiting inaccurate beliefs (no one will tell
me); fear (of looking immature, stupid, or helpless); low self-esteem (no one
will help someone like me); and pride (I do not want to be seen as a needy
person). Many of these are similar to what our students in classes may be using
to avoid asking.
In chapter two of The Aladdin factor, Canfield and Hansen (1995) look at the
benefits of asking, the first being ‘You will take control of your life!’, which is
true enough, as your asking of questions liberates you to choose more options
and a variety of paths. The second is that you will have better business and
Testing abilities to understand 243
personal relationships (which in classroom situations is translated into more

‘friends’). The third is that it will increase your personal power, through
knowing more and being able to easily find information through asking others.
The fourth is ‘You will have and give more love’. Love and affection can be
asked for, but most people rarely do. It takes a bit of bravery at first, but it
often creates great relationships. Letting loved ones know that you need more
closeness, or hugs, or attention is crucial to good relationships. The fifth is that
you can enrich your lifestyle. As Somerset Maugham said: ‘It’s a funny thing
about life; if you refuse to accept anything but the best, you very often get it’
(cited in Canfield & Hansen, 1995, p. 45). The sixth is that you will maximize
your talents and skills. This happens because when you ask, you get not only
more information but more relationships and opportunities in life.
In chapter 3, the authors list seven characteristics of the ‘Masters of the
Lamp’: 1) They know what they want; 2) They believe they are worthy; 3)
They believe they can get it; 4) They are passionate; 5) They take action; 6)
They learn from experience (that asking in certain ways, e.g. with a smile,
works better than others); 7) They are persistent (pp. 52–60). While the last
characteristic brings to mind the child who is learning to ask (and has dis-
covered people will usually answer and thus gets hooked on asking ‘why?’
while interrogating her parents and driving them crazy) it still has merit and we
as adults often need to relearn to be persistent.
Due to my research and experiences with social testing, I place the above list
of characteristics in this order: 6 > 7 > 5 > 1–4, or at least I can say that these
seem to be the ecological stages that appear to be happening to students in my
classes. We start by learning from experiences (6) in which we are directed to
ask others (e.g. the activities described below and the repeated testing). The
teacher and the students need to learn to be persistent in asking (7) and allow
‘asking’ to generalize to other realms in their lives (as noted in several student
quotes below in the qualitative data). This requires a lot of bravery to take
action (5). The results of these three steps, I believe, will be numbers 1–4: You
will better know what you want; you will believe you are worthy of someone
else’s help; you will believe you can get it; and you will become passionate
about asking and learning.
The art of asking

In The art of asking, Amanda Palmer (2015) looks closely at how people ask, or
don’t dare to. Prior to the publication of The art of asking, Palmer had become
famous for her TEDTalk (Palmer, 2013) of the same title which was about her
experiences as a street performer (an eight-foot tall white bride statue in Har-
vard Square) asking for tips, but it was also about her recent crowd funding
from fans to make a new musical album. In The art of asking, she says that
244 Tim Murphey
‘Asking is, at its core, a collaboration’ (Palmer, 2015, p.47). But more poetically
she writes:
Asking for help with shame says:

You have power over me.
Asking with condescension says:
I have power over you.
But asking with gratitude says:
We have the power to help each other.
(p.48)
Later on, she cites Hyde (1983) who ‘explains the term “Indian Giver”, which
most people consider an insult: someone who offers a gift and then wants to
take it back…’ (p.57). Hyde (1983) tackles the subject calling it ‘the commerce
of the creative spirit:’
But the origin of the term – coined by the Puritans – speaks volumes. A
Native American tribal chief would welcome an Englishman into his
lodge and, as a friendly gesture, share a pipe of tobacco with his guest,
then offer the pipe itself as a gift. The pipe, a valuable little object, is – to
the chief – a symbolic peace offering that is continually regifted from
tribe to tribe, never really ‘belonging’ to anybody. The Englishman
doesn’t understand this, is simply delighted with his new property, and is
therefore completely confused when the next tribal leader comes to his
house a few months later, and, after they share a smoke, looks expec-
tantly at his host to gift him the pipe. The Englishman can’t understand
why anyone would be so rude to expect to be given this thing that
belongs to him.
Hyde concludes:
The opposite of ‘Indian giver’ would be something like ‘white man

keeper’ … that is, a person whose instinct is to remove property from
circulation … The Indian giver (or the original one at any rate) under-
stood a cardinal property of the gift: whatever we have been given is
supposed to be given away again, not kept …The only essential is this:
The gift must always move.
Methods of learning asking (in regular classroom activities)

If you asked my students (from the last few years) what their teacher’s most
common phrase was, they would probably say ‘Ask you partners…’ is used at
least 20 times in every class. Below are brief descriptions of activities in which
students ask questions as an integrated part of those activities.
Songlets/speed dictations (Murphey, 1990, 1992, 2018a,

2018b)
In nearly every class, I give my students a speed dictation in which I ask them
to help each other in pairs, with one person writing the first half and the
other the second half of the dictation. I say it quickly (speed) so that neither
one can easily get it all; they are thus ‘forced’ to help each other and must
ask, ‘What is the first/second part?’ For example, Part 1 might be ‘Super,
happy, and optimistic’ and Part 2 ‘joyful and prodigious’. Thus, the seeds are
laid for collaboration, asking, and helping.
But they are also challenged by the speed dictation becoming a songlet (short
song) sung in response to a question. For the above speed dictation, the question
becomes, ‘How are you?’ I say to them (5–10 times) in the class in which it is
taught ‘Ask your partners “How are you?”’ and in later classes once or twice for
review. They have to respond, singing to the tune of Mary Poppins’ ‘super-
califragilisticexpiallidocious’ the correct answer: ‘Super, happy, optimistic, joyful,
and prodigious’. They are instructed to memorize it and to use it respond to
anyone who asks them ‘How are you?’ in any language, in and out of class, in
order to generalize it in their everyday lives. Many of them tell me in their
action logs that they end up teaching the songlets to their friends and family.
Action logging: Comments on class (Murphey, 1993)

I have been hooked on reading action logs for nearly 30 years and cannot
imagine teaching well without them. Many other teachers look at me as I go
through pile after pile of action logs (notebooks) sitting in our cafeteria and
think I am working too hard, when actually, it is joyful work that helps me to
figure out what I need to do in the following classes. I ask my students to tell
me ‘What do we need to review or correct in the next class?’ Action logging
brings many advantages for students as well. First and foremost, they get to ask
the teacher questions in a kind of private conversation, and in response get
tailored answers to their individual questions. In most classes, teachers cannot
have a conversation with every student, but in an action logs it is possible. The
main parts of the class are listed on the board for students to copy into their
action logs, and to evaluate and comment on. They can also do their home-
work in the action logs as discussed below.
Asking during ‘call report’ homework

In their action logs, students write a short description of their telephone
homework (Call report) with their in class partner that day; they are asked to
change partners every class. Typically, they ask about information they learned
in the class (speed dictation questions, etc.), but it is also an opportunity to ask
about personal topics and make friends. They are also supposed to write down
246 Tim Murphey
how many minutes they talked and what percentage was in the target language.
Calling up someone who you may have met for the first time that day can be a
scary thing for many people, but they do it, and they get used to it, and many end
up calling each other for test reviews and other tasks as well. (Students put their
phone numbers beside their names on an attendance list I pass out in class and I
give them all copies so they can call each other easily, and also so they can call each
other when they are absent in order to catch up with what they missed).
Asking during ‘teaching report’ homework

Another regular daily homework assignment is to teach someone out of class (who
is not in our class) something students learned in class and to write a Teaching report.
Many end up asking family members and friends if they can teach them things they
learned in class that day (e.g. songlets, stories, new vocabulary, information, etc.).
Teaching something you just learned to others helps you learn it better yourself
(Murphey, 2017c) and can include multiple asking cascades.
Action log share: Asking in class

Students are asked to have new partners each class and to exchange action logs
which have an introduction page at the front that describes the owner, and that
is followed by all the dated class logs. Students read the introduction page and
ask detailed questions. They then review recent classes and activities, asking
questions of their partner as to how they liked and did certain activities. They
are asked to write a short comment in their partner’s action log before giving it
back to them (e.g. ‘Nice to meet you. I look forward to working with you and
calling you tonight’, ‘Reading your action log was fun. It gave me good ideas
for my action log.’). They are asked to sign their name, anchoring their words
in time and space for the reader.
Story asking
I ask my students to tell many stories about themselves and to write some of
them down in their action logs; sometimes their fellow students will read them
and ask questions about them to deepen the conversation. For example, they
write a Language Learning History that details when they first began learning
foreign languages up until now. They also write about glory, embarrassment,
regret, and mistake stories, which students discuss to show that no one is per-
fect, we all make mistakes, and we can learn to laugh at them sometimes.
Lecture pre-asking
Most lecturers enthusiastically dive into their material, but most information is
wasted like water washing over rocks (brains). Priming the students at the
beginning of a class with a number of questions about the content that will be
covered in the lecture creates curiosity and gets students to form a possible
neural network for an answer. Research shows that even if they come up with
a wrong answer, they are still capable of easily replacing wrong answers with
new answers due to the curiosity network already formed (Roediger & Finn,
2009). Such questioning also shares class time democratically with students so
they feel more empowered. As Donald Graves (2002) wrote in Testing is not
teaching: What should count in education:
Perhaps the problem [of learning well] is best understood in the context of
power within relationships. Understanding is best reached when power is
shared. In most cases teachers are in the power position when working
with their students. They have the power of assignments, corrections, and
grades. The best teachers know how to share this power; indeed, they give
it away. They are constantly uncovering where the student’s heart is situ-
ated in the writing. Through the skills of teaching they know how to add
power to the student’s intentions.
(p. 11)
Formative assessment
All the activities above are about learning through assessing, and learning what
we know and don’t know, thus promoting language assessment literacy (LAL)
not just among teachers, but also among students. Wormeli (2018, p. 284), a
great advocate of formative assessment, defines it as: ‘Frequent and ongoing ways
to check students’ progress toward mastery; the most useful assessment teachers
can provide for students and for their own teaching decisions’. Thus, I was led
to make regular tests, like formative assessment, through socializing the proce-
dures, starting about six years ago with social testing (Murphey, 2013a, 2013b).
Asking in social testing

For the remainder of this paper I will be focusing upon social testing as a
method and incentive for students to ask each other questions and become
language assessment literate. Let me begin by providing an abstract of a recent
article on social testing published in Critical Inquiry in Language Studies:
A conception of social testing is described in which students are direc-

ted to give themselves grades at two moments: first, after filling in
answers that they recall alone; second, after asking others in the class
for mediating help during social interaction. The first grade is an esti-
mate of individual efforts, without social connections. The second
grade represents a situated person in a community with developing
connections, something neurologists, sociologists, and anthropologists
see as an ecological step towards species well-being. Social testing is
248 Tim Murphey
one step toward changing an epidemic trend in our societies and

schools toward increasing individualization and isolation (III).
(Murphey, 2019a)
The bottom of each test looks something like Figure 13.1 (also see Appendix A):
1st score____/100% 2nd score____/100% 3rd score____.

Who helped you? WHY?___________________________________
Who did you help? WDYH?_________________________________
What do you think of this test? WDYTOTT?___________________
Figure13.1: Bottom of each test
As you can see in Figure 13.1, I ask students to give themselves their own
grades at two separate times on the tests, first after a certain period of doing it
alone (1st score) and then after allowing them to ask others for help and give
help to others who ask for it orally (no copying; all oral). John Hattie (2012)
showed through his meta-analyses of 150 classroom activities that self-reported
grades (#1), formative evaluation (#4), feedback (#10), and reciprocal teaching
(#11) are all highly effective for learning, with the last three being highly social,
and all are included in the social testing protocol aligned with LAL. I wish to
propose a form of testing that allows students to interact more and learn more
at the same time. Although this way of testing will not solve all our problems,
it is a way to help students become more social, and to teach the worth of
social interaction and its benefits.
In Murphey (2019a), I cite mostly Vygotskian researchers who claim that:
the origin of intelligence (both phylogenetically and ontogenetically) [is]

social in nature; [such] studies of intelligence tend to treat the social nature
of its development as implicit. Social definitions of intelligence have in this
sense remained mere postulates, for while the social nature of intelligence
is recognized, intelligence is never explicitly studied as social … How are
we to move beyond this situation and develop the investigation of intelli-
gence while adequately incorporating its social nature and origins. In our
opinion the solution lies in the theoretical elaboration of a definition of
intelligence which not only embodies its social nature explicitly, but which
can also be empirically investigated. Mead, Piaget, and Vygotsky have all
given us the ideas, but neither the paradigms nor techniques necessary to
substantiate their belief that intelligence is essentially social in nature.
(Doise & Mugny, 1984 p. 22)
In previous articles, I have offered data to show how students highly rate such
tests, and have suggested ways that a graduate student might study them for their
long-term impact. For the remainder of this chapter, I would like to look closely
at some qualitative data on social testing and how it seems to liberate students’
learning. I will first look at some undergraduates doing three tests in one semester
and then look at some graduate students who did a social test as a final exam.
Some recent qualitative data: undergraduates

Most recently (fall 2018) I taught a ‘Ways of Learning’ class, which is an elec-
tive open to all four years and all departments for a 30-class semester, meeting
two times a week for 15 weeks, with about 100 students at a language/huma-
nities university. We did three quizzes, approximately one every ten classes,
from which I was able to collect their comments on the bottom of the tests and
in their action logs (Murphey, 1993) in which they regularly give me feedback
about the classes and our activities (see ‘Action log share’ above). The testing
comments can be felt in three waves of socio-cognitive emotion, which I will
describe after each set of comments. Space unfortunately limits me to looking
at just a few comments for each quiz.
QUIZ 1 Feedback at the bottom of the quiz (unedited except for emphasis)
1 This is my first time to take a test which encourages me to interact with others, so
this is interesting for me. This makes me think I want to have a conversation with
others more.
2 I love this test because I felt we were doing test together. I knew the importance
of cooperating.
3 This test improve our ability to ask things of others. We have to be brave, so I like
this way.
4 I thought this test is more meaningful than the way as usual test because just
remember something is not interesting but in this test not only remember, but we
use English. This is the big point of this style, I think.
QUIZ 1 Feedback from action logs (unedited except for emphasis)
1 Today we had a quiz, a new test that I never did before. We can learn from each
other and ask other people for answers. It is a new chance to communicate.
2 I really felt glad to see your words ‘Not knowing is OK’. And ‘Not asking is failing’. And
‘Helping and asking many people is your goal’. I really feel great to do today’s test.
3 I told my family that my teacher gave us a time to teach each other for us to improve
our skills, because it was the first time in my life that teacher gave us such a time. I
taught many things to my classmates during that time. I enjoyed helping them.
250 Tim Murphey
Commentary on student feedback for quiz #1:

After the first quiz, many students expressed being surprised (‘a test which
encourages me to interact’ and ‘I never did before’) and excited (‘interesting’; ‘I
love this test’) and provoked to act (‘makes me think I want to have a con-
versation with others more’; ‘I knew the importance of cooperating’; ‘improve
our ability to ask … others’) about the new way of testing in which they could
talk and learn from their partners. They also appreciated that they could help
others and learn from them at the same time. Some even told their families
about it (‘I told my family’) and others mentioned PowerPoint slides I showed
them before the test saying ‘Not knowing is OK. Not asking is failure. Your
grade is mostly about your ability to help and ask’. Not only did they enjoy it,
they found it more meaningful (‘this test is more meaningful’) than simply
regurgitating information alone on a paper. Thus, I will call this first stage SEP
for surprised, excited, and provoked.
Ten classes later (five weeks), we did quiz 2 (see Appendix A for all three
quizzes.)
1 I think this test help students to improve communication skill. We’ll have to
communicate with other people in the future job, so it’s practical test.
2 This test is really useful to think about the answer with peers. Giving hints to
find a clue is the best way to know the answers … I thought my communication
skill is getting up.
3 I like this style of testing. I can talk to different people a lot & it feels good to
help someone. But sometimes they went away just after they had their answers,
not teaching me anything and it hurts. It also hurts when I tried to remind them
the story (hint) and someone just said, ‘just tell me the main point!’
4 Today, I am very brave and do not hesitate to ask. So almost all of blanks are filled.
QUIZ 2 Feedback from action logs (unedited except for emphasis)
1 Last time I felt embarrassment to ask questions, but this time was not. I enjoyed
asking and talking. Everyone’s so kind. I’m glad to have and be in this class.
2 I really like this type of test. I always feel nervous or hate to take test but I feel
relax and enjoy this class’s test. Because you said, ‘Not knowing is OK. Not asking is
failure’. This phrase I like very much. And I really like part 2, because I can help
classmates and be helped by classmates. This is very good communication I think
because help each other is really good to learning. Today is the second time to
take this type of test, so I feel more relax to take test and I could ask and help
many [more] class mates than first time. I feel really happy to talk a lot of
people and help them.
3 In today’s class I took a test. I like this test because I could talk a lot and com-
municate with many people. This test needs knowledge which I learned in class
and we require to communicate positively in this test. In addition, we could
discuss questions before the test. It is different from other tests. We can have
opportunities to speak in this test style. And also, I learned scaffolding, it is dif-
ficult to give hints and help others understanding.
4 I had fun to do test and ask. I could ask more people than before. I think I get used
to ask people because of this class! I’m looking forward to next test! Please don’t
make it harder.
Commentary on student feedback for quiz 2:

I added into the pre-test explanation the idea of scaffolding (‘I learned
scaffolding, it is difficult to give hints and help others understanding’) and
hinting at answers rather than just telling them straight away. Some got it
(‘Giving hints to find a clue is the best way to know the answers’) and
some apparently did not get it (‘I tried to remind them the story (hint) and
someone just said, “just tell me the main point!”’). Some felt they were
getting used to the test style (‘Last time I felt embarrassment to ask ques-
tions’) and were appreciating it more by generalizing it to their regular lives
(‘I could ask more people than before. I think I get used to ask people
because of this class!’). They started to notice that this test style was more
relaxing (‘I always feel nervous or hate to take test but I feel relax and
enjoy this class’s test’), and again commented on the instructions delivered
using PowerPoint slides that said ‘Not knowing is OK. Not asking is fail-
ure’ and identified them as helping them to perform better. Thus, I will call
this second stage scaffolding, generalizing, and relaxing (SGR).
Five weeks later, at the end of the semester, we took the third quiz. Due to
time restrictions to get their action logs back to them, I was rushed with
reading 100 tests as well as 100 action logs in just a few days.
1 I could answer almost all questions compared to last test. And actually this is
my last test in my university school life! I’m glad to take this class and this test!
Thank you for all!
2 I can help many person. I’m so happy! I have confidence because I have good
classmates. This test is so fun!
252 Tim Murphey
3 This is the third time to take this [type of] test. I asked my classmates fluently and
they feel glad to help each other. I enjoy this test.
4 This time I could ask many people and they answered kindly. Though I’m
powerless alone, I was happy that there were many people who helped me.
QUIZ 3 Feedback from action logs about the quiz (unedited except for emphasis)
1 I could ask students more than before. It’s proud that I can learn asking is not
hesitating thing.
2 I could learn what I never thought or I’ve never known. It was really useful. The
remarkable thing is mistaking is not bad; trying not do is bad! I was so impressed
that. So I’m always trying what I face first time. I don’t judge with prejudice
anymore. I’ll never forget this class.
3 (1st year) I am really excited and satisfied with this class, as this class gives me a
lot of chance to speak English than any other class. Additionally, I could make new
friends … I really enjoyed and I’ll miss this class.
4 I am glad to choose this class because I enjoyed learning English and I met different
grade student and we had a chance to talk. I felt happy when we talked.
Commentary on student feedback for quiz 3:

The core learning by the third quiz was an increase in collaboration (‘This time I
could ask many people and they answered kindly. Though I’m powerless alone,
I was happy that there were many people who helped me’; ‘I have confidence
because I have good classmates’) which was very gratifying, especially for 4th year
students (‘this is my last test in my university school life! I’m glad to take this class
and this test!’). Some described a crucial change in perspective (‘I could ask stu-
dents more than before. It’s proud that I can learn asking is not hesitating thing’;
‘I could learn what I never thought or I’ve never known. It was really useful.
The remarkable thing is mistaking is not bad; trying not do is bad! I was so
impressed that’) They also confirmed that the act of talking with diverse others
was a learning act (‘this class gives me a lot of chance to speak English than any
other class’; ‘I could make new friends’; ‘I met different grade student and we
had a chance to talk. I felt happy when we talked’). And they could notice a
positive change in themselves and their classmates (‘This is the third time to take
this [type of] test. I asked my classmates fluently and they feel glad to help each
other’). Thus, the key words in this final stage are collaboration, asking freely,
and emotional bonding (CAFEB). I believe that the testing had a lot to do with
their close socialization and bonding by the end of the semester. Even I found it
difficult to say goodbye for the last time.
Graduate students’ reactions to social testing

In December 2018 I also did a social test with a small group of graduate stu-
dents (4) in a socio-cultural theory (Vygotskian) class to end our fall semester
on the last of four intensive Saturday afternoon meetings. Several of them
responded very enthusiastically on Moodle and gave me permission to post
their comments and names below:
OSHIKA, Eriko – Tuesday, 18 December 2018, 9:32 PM

As for the final exam, I felt that it was a lot more effective for internalizing what
we have learned, compared to traditional styled exams in which we usually just get
tested how much we ‘remember’ things. By discussing and exchanging ideas for
answers, we can find something new and understand the topics even deeper during
the test. I thought this was a very meaningful activity, not just a tool to evaluate
students’ performance.
WADA, Jun – Sunday, 23 December 2018, 11:38 AM

Hi, Eriko, Thank you for sharing your reflection.
I agree with your idea about the final exam. Even if students get the score of the
test, it is meaningless unless students learn from the feedback. In most cases, stu-
dents tend to get the score and do nothing after that. It is more effective if they
have opportunities to deepen their understanding in the test. As you mentioned in
your posting, teachers need to understand why students do the activity. The pur-
pose of the test is not only for getting the score for their evaluation, but it should
be mediation for them to learn.
YOSHIEDA, Megumi – Thursday, 27 December 2018, 9:06 AM

Hi everyone, I had the same joy as Eriko and Jun while having the final exam of
SCT. I would like to add a comment. As some of us are scheduling final exams of
the courses, we could plan a peer supporting exam style that we have learned in
class. One big concern for me is that some students did not like the new evaluation
ways I have tried. When it is a test, once they got very nervous and complained
even though I explained the benefit. Thus, one big task for us teachers is to take
time to explain well about the new ways as well. Have a great new year, everyone!
Commentary on graduate student feedback for their

final test
The graduate students noted that the social part of the exam allowed them to
better internalize concepts and increase their understanding, and they regretted
that many students are only worried about the score and not the learning; for
254 Tim Murphey
these students, learning is more important than a good score, even while taking
a test. Still, teachers need to find good ways to explain such testing procedures
to students to enlist their altruism and understanding of how to learn more on a
deeper social level.
Conclusion
To conclude, I would like to return to our long acronym in the title TA-
TUNII-SIA-RASA (Testing Abilities – To Understand Not Ignorance or
Intelligence – Socially Interactive (formative) Assessment – Receive,
Appreciate, Shadow, and Ask). I believe that as educators, we should be
testing and teaching ways of understanding, not simply information
(ignorance or intelligence). I am convinced that socially interactive ways of
assessing have great promise for helping students grasp more intellectual
territory than simple solo exams. We need to be able to learn, even during
assessments, and see that this is indeed part of language assessment literacy.
Social testing is not only teaching asking but also altruism, as one of my
students said a few years back:
Because I had taken a test (#1) in this class and I knew how we would do
the test #2, I tried to remember as much as possible not only for myself,
but for my classmates. Last time I took the test, I was helped by others
with answers, very helpfully. So, I wanted to help my classmates more
than I did last time. In Test #2 it was interesting. I felt as if I was already
working with classmates during my preparations for the test, and that
motivated me to study. Although it was not so many people that I could
help with the quiz, I was glad to hear ‘thank you’ from them and to see
their smiles. Showing thanks to people really makes them happy.
Another student referred to asking as part of ‘vital skills to live in real life’ and
though most people will not be taking pen and paper tests at their work, they
will need to be able to ask people for help:
I really like this type of test. I’ve never done such a creative and interactive
test, and I really think that I was required to get information and help
people, and these are vital skills to live in real life!
Please read Murphey (2017a) for a more detailed understanding of social testing or
Murphey (2017b) for a short, four-page synopsis from a Stanford University blog.
Finally, I wish to dare to talk more grandly beyond our classrooms and have a
look at the questions concerning our survival and social well-being in our various
societies. I believe we need to ask our schools, educational systems, businesses,
communities, governments, our universe, and our gods more grandly for better
understanding and well-being for all, and for a more just and ecological world in
which we all can live peacefully. I want my students to be able to ask for these
things, for in asking we may indeed find the ways through LAL.
Postscript (10 March, 2019)

I started reading Fourth generation evaluation (Guba & Lincoln, 1989) recently
and it occurred to me that social testing has many elements that are
described within Guba and Lincoln’s framework. Some of the consequences
of fourth generation evaluation, especially, seem to be similar to social
testing (p.256–258): accountability yields to shared responsibility; exploita-
tion yields to empowerment; ignorance yields to comprehension and
appreciation; and immobilization yields to action. Students take more
responsibility in self-evaluation and asking for help rather than just looking
for a grade. Rather than be exploited by a testing system, they are empowered
to make the test socially transparent. Rather than remaining ignorant of
answers they wish to know, they can comprehend with the help of others and
appreciate and be appreciated for mutual aid. Lastly, they go from being frozen
by unanswerable questions to taking action in order to learn from others and
to help others. The potential of social testing to foster action-taking (asking
and giving) scaffolds and builds agency in students that can support them for a
lifetime; this is what LAL should be about. Teaching students to dare to ask
opens one’s life up to the liberty of learning.
Appendix A: Three quizzes

Full NAME (romaji) & student number
WAYS Quiz 1, Mon, 15 Oct, 2018
Write quickly what you know! In 20-minutes. Write the song lyrics & info on
the back & put them in small boxes with their number one song & its number:
1 How are you?

2 Five strategies for memorizing a long line?
3 Why do you smile?
4 What’s asking?
5 How do you succeed?
6 How do you have a good life?
7 Ways of Improvisation
8 Ways to reduce stress
9 What’s more beautiful than a bird sitting in a tree?
10 How do you like it here?
11 Are you young?
12 What three things do we do when we read Newsletters in class:
13 Last lines or IMPORTANT points of stories/Videos:
256 Tim Murphey
BBB
Tim’s DAD
Chez Joan
Matsuyama Woman
Paradigm Shift
Denmark TV 2 Advertisement
Ms Liz’s Class Talking Twins
14 How can you do environmental engineering to learn more English? (3
examples please)
15 Why are telling embarrassment/mistake stories good for us?(3 things)
16 What are the three SSSs in Chapter 2 for and how do they help you learn?
17 What would you do if you were language hungry?
18 What is self-regulation?
19 Cry to the world ‘I’m in love!’ when you read this line! Done/Not Done
20 Approximately, how many people’s names do you know in this class?
1st score /100% 2nd score /100% 3rd score

Who helped you? WHY?
Who did you help? WDYH?
What do you think of this test? WDYTOTT?

WAYS Quiz 2, Mon, 26 Nov, 2018, Class #19
Write quickly what you know! In 20-minutes. Write the song lyrics &
info on the back & put them in small boxes with their number one song.
Take …
1 How do you eat well?

2 How do you learn?
3 Where do you belong?
4 How do you write well?
5 5 Ways to Happiness
6 What are you going to do today?
7 SPURR
8 PVA
9 NPRM
10 Write the 10 idioms in sentences that show their meaning.
11 Who do you love?
12 Stories/video last lines/main points and why important:
Beatles
Marilyn King
Going My Way
Ride and Read
Turtle with a Straw
Student Voice #1 LLHs
Student Voice #2 Job H Going Abroad
Roller Coaster
13 How does an effective helper help you? E t, A, S y u, Ref rather than C, & C
14 How are A student strategies different from C/D student strategies?
15 What are Tim’s most frequent three words in this class?
16 What are the advantages of doing IPQs?
17 What would be your rejoinder if I said, ‘I won the billion-dollar lottery!’?
18 Find a person you have never talked to, ask a question, & write their
whole name:


WAYS Quiz 3, Thurs, 26 Jan, 2019, Class #2 17 Jan
Write quickly what you know! In 20-minutes. Write the song lyrics on the
back & put them in small boxes with their number SONG #1 Take …
1 What are you going to do today?

2 Are you content?
3 Who are you?
4 What do you like?
5 What’s the weather like?
6 How are you? #2
7 What do you love?
8 How do you change the world?
9 3FRIMS
10 VAK
11 SPURR
12 How can Good Students make Good Teachers? List at least 5 or more ways
on the back.
13 How many words from the memory test can you remember by yourself (no
helping)?
258 Tim Murphey
Short answers:
1 Main point of the Rat Story:

2 WDWWM? What do women want most?
3 Does Sir Gawain choose Night or Day? And what happens?
4 The last line of Candide:
5 The Clay Buddha story: How do the monks in this story say hello and
goodbye to people and why?
6 Would you like to go for a drink tonight?
7 What two ways can you understand this phrase: opportunity is nowhere?
8 Put these words in correct order: the the more more animals playful
intelligent also ones were
9 Describe how someone gets ‘Learned Helplessness’:
10 Give two examples of how you can change the world everyday:
11 ‘Words don’t have meanings; people have meanings for words’. Give an
example:
12 Cry ‘MERRY CHRISTMAS’ out loud. Find a marker-person! Done/
Not done

References
Canfield, J., & Hansen M. (1995). The Aladdin factor: How to ask for what you want–and get
it. Berkley Books.
Creese, A., & Blackledge, A. (2010). Translanguaging in the bilingual classroom: A
pedagogy for learning and teaching? The modern language journal, 94(1), 103–115.
Doise, W., & Mugny, G. (1984). The social development of the intellect. Pergamon Press.
Dufva, H. (2013). Language learning as dialogue and participation. In E. Christiansen, L.
Kuure, A. Mørch, & B. Lindström (Eds.), Problem-based learning for the 21st century:
New practices and learning environments (pp. 51–72). Aalborg Universitetsforlag.
Freeman, D. (1998). Doing teacher research: From inquiry to understanding. Heinle & Heinle.
Freire, P. (1985). The politics of education: culture, power, and liberation. (D. Macedo, Trans.)
Bergin & Garvey Publishers. (Original work published 1985).
Graves, D. (2002). Testing is not teaching: What should count in education. Heinemann.
Guba, E., & Lincoln, Y. (1989). Fourth generation evaluation. Sage Publications.
Hattie, J. (2012). Visible learning for teachers: Maximizing impact on learning. Routledge.
Hyde, L. (1983). The gift: Imagination and the erotic life of property. New York.
Lewis, B. (2019, July 18). Teachers should design student assessments. But first they
need to learn how. Education week. https://www.edweek.org/ew/articles/2019/07/
19/teachers-should-design-student-assessments-but-first.html
Murphey, T. (2017b, June 30). A 4-page condensed version of Tim Murphey’s book
chapter ‘Provoking potentials: Student self-evaluated and socially-mediated testing’.
Tomorrow’s ProfessorSM eNewsletter. Stanford University. https://tomprof.stanford.
edu/mail/1581#
Murphey, T. (2017c). Asking students to teach: Gardening in the jungle. In T. Gregersen,
& P. MacIntyre (Eds.), Exploring innovations in language teacher education (pp. 251–268).
Springer.
Murphey, T. (2003). Assessing the individual: Theatre of the absurd. Shiken: JALT
Testing & Evaluation SIG Newsletter, 7(1) 2–5.
Murphey, T. (2018a). Bilingual songlet singing. Journal of Research and Pedagogy of Otemae
University Institute of International Education and Hiroshima JALT, 4, 41–49.
Murphey, T. (2012). In pursuit of wow!Abax.
Murphey, T. (2019b). Innovating with ‘The Collaborative Social’ in Japan. In H. Reinders,
S. Ryan, & S. Nakamura (Eds.), Innovation in language learning and teaching; The case of
Japan (pp. 233–255). Palgrave Macmillan.
Murphey, T. (1992). Music and song. Oxford University Press.
Murphey, T. (2019a). Peaceful social testing in times of increasing individualization &
isolation. Critical Inquiry in Language Studies, 16(1), 1–18.
Murphey, T. (2017a). Provoking potentials: Student self-evaluated and socially mediated
testing. In R. Al-Mahrooqi, C. Coombe, F. Al-Maamari, & V. Thakur (Eds.), Revi-
siting EFL assessment: Critical perspectives (pp. 287–317). Springer.
Murphey, T. (1990). Song and music in language learning: An analysis of pop song lyrics and
the use of song and music in teaching English as a foreign language. Peter Lang.
Murphey, T. (2018b). Songlets for affective and cognitive self-regulation. Bulletin of the
JALT: Mind, Brain, and Education SIG, 4(12), 22–25.
Murphey, T. (2013a). Turning testing into healthy helping and the creation of social
capital. PeerSpectives, 10, 27–31.
Murphey, T. (1993).Why don’t teachers learn what students learn? Taking the guess-
work out with action logging. English Teaching Forum, 31(1), 6–10.
Murphey, T. (2013b). With or without you and radical social testing. Poòkela (Hawai’i
Pacific University Newsletter, 20(69), 6–7.
Palmer, A. (2013, February). The art of asking [Video file]. https://www.ted.com/ta
lks/amanda_palmer_the_art_of_asking?utm_campaign=tedspread&utm_medium=
referral&utm_source=tedcomshare
Palmer, A. (2015). The art of asking. Grand Central Publishing.
Roediger, H., & Finn, B. (2009,October 20). Getting it wrong: Surprising tips on how to
learn. Mind Matters. https://www.scientificamerican.com/article/getting-it-wrong/
Treasure, J. (2011, July). 5 ways to listen better. [Video file]. https://www.ted.com/ta
lks/julian_treasure_5_ways_to_listen_better?utm_campaign=tedspread&utm_m
edium=referral&utm_source=tedcomshare
Wormeli, R. (2018). Fair isn’t always equal: Assessment and grading in the differentiated
classroom (2nd edition). Stenhouse.
Conclusion
Language assessment literacy: The way
forward
Sahbi Hidri
Language assessment literacy (LAL) has been approached from different

perspectives in different contexts; all chapters highlighted that the absence
of an effective LAL agenda for learners, teachers, and decision-makers will
undoubtedly lead to harmful effects on the future of all these stakeholders,
as well as on exams, curricula, language programs, and assessment policies.
In addition, all the chapters demonstrated that, contrary to standardized
assessment, classroom-based assessment can enhance effective test-taking
strategies and assessment techniques and unveil the actual assessment per-
formance of learners. That is, these interfaces can help test-takers develop
their potential to handle standardized and high-stakes exams.
The chapters also highlighted how the link between assessment and learn-
ing is approached by English Language Teaching (ELT) practitioners in dif-
ferent parts of the world, and how teachers conceive of this, especially when
they are faced with the dilemma of using standardized assessment for so many
practical reasons. The findings of this work might be conducive to some more
research on investigating the interfaces between assessment and learning
because, in contrast to previous work on these interfaces, this book unveiled
the different strategies and techniques of approaching this relationship, and
how. LAL plays a key role in shaping the final products of exams, learning,
teaching, language programs, and textbooks.
After having edited and co-edited books on evaluation, assessment, and
ELT research, I have realized that LAL needs to be further investigated and
that the interfaces between assessment and learning have not been given their
due importance, whether in second or foreign language assessment. There is a
dearth of international research on LAL as well as on the interfaces between
assessment and learning. Many practitioners operating in different parts of the
world need to revisit these research areas, since they have been implemented
in their assessment policy.
Apart from some landmark publications on LAL, this research area has been
overlooked for some time now because of the widespread use of standardized
forms of assessment. This publication will serve as an additional contribution
to these discussions of assessment and learning and how LAL should be
The way forward 261
approached. International conferences carried out in different ELT contexts

indicate that this research area has not been given its due momentum in
second or foreign language assessment.
Author Index
Abbas, 203, 204 Bashir, S., 201

AbdelWahab, 86, 88, 89, 90, 91, 96, Bawarshi & Reiff, 109, 126, 127,
Abudrees, 205, Beaumont, C., O’Doherty, M., &
Ahmad, 160, 161, 166, 172, Shannon, L. (2011). 81
Al Asmari, 199, 201, 203 Benesch, S. (2001), 43
Al Hosni, 199, 200, Berg, E. C., (1999). 81
Al Jahromi, 203, 204, Berger, A. (2012). 136,
Al Qahtini, 202, Berk, 109
Al-Issa, 70, Berry, V. & Munro, S. (2017), 20
Al-Nasser, 203, Berry, V., Sheehan, S, & Munro, S.), 20,
Al-Shaboul et al, 204 21, 138, 140, 150, 151,
Alderson and Banerjee, 31, 34, 43, Bhowmik, Hillman, & Roy (2018), 46
Alderson and Wall, 4 Bolitho. & West, 176, 177,
Alderson et al., 179, 193, Borich, 79, 82
Alderson, 3, 4, 5, 179, 180, 227238 Bourdieu (1991), 43
Alhamadi, 203, Braine, 80, 81
Allen & Negueruela-Azarola (2010), 23 Brennan, 135,
AlRashid, 205, Brewster et al., 109, 110,
ALTE 85, 87, 91, 95, 110, 147, 151 Bridgeman, B., & Carlson, S. (1983), 38
Amengual Pizarro, 221 Brindley, G. (2001). 63
Anderson (2009), 222, 238 Brown (2004), 3, 11, 21, 60, 63, 69,
Anderson, 222 82178, 126
Antoniou & James, 24 Brown, G. A., Bull, J., & Pendlebury, 82
Arkoudis, & O’Loughlin, 23, Brown, J. D., & Bailey, K. M. (2008),
Arnaiz, P. & Guillen, 201 22, 23
Asador et al., 127, Brown, J. D., & Bailey, K. M.
Asari, 79, (2008). 63
Asoodar, 111, Brown, J.D., & Hudson, T.D. (1998),
Ata, 107, 111, 128, 44, 46
Buck (2001), 222, 224, 228, 229,
Bachman, & Palmer, 87, 95,179, 237, 238
Bachman, 33, 37, 135, 221, 238 Budimlic, 71, 82
Badia, H. (2015), 20 Buri, 110,
Badia, H. (2015). Burton, 110,
Bailey, K. M., & Brown, J. D. 63 Buyukkarci, K. (2014). 138, 139,
Baker, B. A., & Riches, C. (2017), 19 151, 152,
Banerjee, 227, 228 Bybee, R. W. (1997). 63
Barkaoui, K., & Valeo, A, 21 Bygate 200
Author Index 263
Cameron, 126 Douglas, D. (2010), 36, 45

Campbell, C., Murphy, J. A., & Holt, J. Douglas, D. (2010). 179, 189
K. 162 Dovgopolova, I. V. (2011) 178
Canagarajah, A.S. (2006), 42 Duboc, A. P. M. (2009). 138,
Canagarajah, A.S. (2016), 46 139, 150,
Canfield & Hansen (1995), 242, 243, 258 Duboc, A. P. M. (2009). 152
Carless, D. (2007). 178, 189 Dufva, H. (2013), 242, 258
Carlsen (2018), 225, 238 Dunn, 3,
Carlsen, 225
Carr, N.T. (2011), 37 Earl & Katz (2006), 217
Cauldwell (2018), 236, 238 Earl & Katz, 210
Cauldwell, 236 East, M. (2015), 23, 24
Celce-Murcia, 200, 204, Edgeworth, F.Y. (1888), 36
Celce-Murcia, 2013 199, 200 Education Council of Oman (2017).
Chalhoub-Deville (2016), 237, 237, 238 Education Council of Oman (2017). 82
Chapelle (2012), 221, 238 El Massah & Fadly, 111, 112, 126,
Chapelle, C.A. (2013). 179, Ellis, 78, 82
Chapelle, C.A., 221 Elmenfi & Gaibani (2016), 217
Chase, 1995. 172, Elmenfi & Gaibani, 201, 203, 204,
Chen, T. (2016). 82 Emmitt et al., 109
Cheng et al., 4, 26, 221, 235, 23, 238 Engelsen, K. S., & Smith, K. (2014), 13
Chomsky, N. (1965), 41 Esnawy, S. (2016), 46
Coates, 202 European Commision, 221
Coombe, C., Folse, K., & Hubley, N. Evans, N. W., Hartshorn, J., & Allen
(2007). 162,180, 181,182, Tuioti., E. (2010). 82
Council of Europe (2011)., 225, 238 Ezzi (2012), 217
Creese & Blackledge (2010), 242, 258 Ezzi, 201
Creswell, 114, 141, 143,
Crusan, D., Plakans, L., & Gebril, Fakhri (2012), 217
A. (2016) Fard, Z. R., & Tabatabei, O. (2018). 138,
Csépes, I. (2014). 136, 138, 151,
Fard, Z. R., & Tabatabei, O. (2018). 151,
Darling-Hammond, L. (1994), 43 Ferris, D. (1999). 82
Davies, 7, 11, 13, 23, 24, 29, 33, 37, 38, Field (2008a), 222, 223, 224, 226,
40, 44, 46, 56, 61, 63, 137, 150, 159, 236, 238
235, 238 Field (2013a), 222, 224, 238
Davies, A., Brown, A., Elder, C., & Hill, Field (2013b), 224, 238
K. (1999), 33, 37 Field (2017), 222, 224, 235, 238
DeLuca, C., & Klinger, D. A. Field, 222, 223, 224, 226, 235, 236
(2010). 138, Foucault, M. (1980), 44
Deneen, C. C., & Brown, G. T. L. Fraenkel, J. R., Wallen, N. E., & Hyun,
(2016), 23 H. H. (2011). 141, 143,
Derewianka, 109, 125 Frederiksen and Collins (1989), 4
Dewey, 109 Freeborn, D. (2006), 40
Deygers et al. (2018), 225, 238 Freeman & Freeman (2001), 108,
Deygers et. al., 225 Freeman, 242, 258
Djoub, Z. (2017). 140, Freimuth,(2014). 107, 111, 112, 126,
Djoub, Z. (2017). 152, Freire (1985), 241, 242, 258
Doise, W., & Mugny, G. (1984), Fulcher, 3, 6, 14, 15, 21, 22, 23, 24, 34,
248, 258 35, 37, 38, 39, 54, 56, 57, 59, 60, 61,
Dörnyei, 202 87, 136, 137, 138, 140, 150, 152, 159,
Dörnyei, Z. (2005), 217 160, 162, 163, 179, 180, 220, 236, 238
264 Author Index
Gan (2012), 217 Harmer, 110, 126, 127,

Gan, 203 Harmer, 200, 210,
Garcia Laborda & Fernadez Alverez, 221 Harmer, 217
Gareis, C.R., & Grant, L.W. (2015). 180, Harsch, C., Seyferth, S., & Brandt,
Gebril & Hidri (2019), 217 A. (2017), 19,
Gebril & Hidri, 201 Hasselgreen, A. (2008). 63
Gebril, A. (2017). 140, 151, Hasselgreen, A., Carlsen, C., & Helness,
Gilmore (2011), 224, 236, 239 H. (2004), 18, 20, 22, 23, 24,
Gilmore, 224, 236 Hasselgreen, A., Carlsen, C., & Helness,
Giraldo Aristizabal, 137, 139, 150 H. (2004). 188
Giraldo Aristizabal, F. G. (2018). 153, Hasselgreen, A., Carlsen, C., & Helness,
Giraldo, F. (2018), 21 H. (2004). 64
Giraldo, F. (2018). 136, Hatipoglu, C. (2010). 140, 151,
Giraldo, F. (2018). 159, 160,180, Hatipoglu, C. (2015). 151, 152, 153,
Giraldo, F. (2018). 63 Hattie (2012), 248, 258
Giraldo, F. 63 Hattie, J. (2009). 82
Gitsaki et al. 107, 108, 111, 112, 126, Hattie, J., & Timperley, H. (2007). 82
Goody, J., & Watt, J. (1963). 63 Hawkins, J.A., & Filipovic, L. (2012), 39
Gotch, C. M., & French, B. F. He (2013), 217
(2014). 137, He, 2013; 199
Graves, D. (2002), 247, 258 Heaton, J. B. (2011). 135, 136, 140,
Green (2017), 226, 239 Heng et al. (2012), 217
Green, A. (1998), 227, 239 Heng, Abdullah & Yosaf, 201, 202
Green, A. (2013). 180, 181, Hernández Ocampo, S. P. (2017), 21
Green, A., 227 Herrera, L., & Macías, D. 150,
Green, R., 226 Herrera, L., & Macías, D. 154,
Gregersen (2003), 217 Hickey, R. (2015), 40
Gregersen, 201 Hidri, 4, 8, 20, 23, 25, 135, 136, 137,
Groves (2010), 47 154, 163, 177, 200, 201, 217, 220,
Gu, P. Y. (2014), 227, 239 239, 260
Gu, P. Y. (2014), 23, Hildén, R., & Fröjdendahl, B. (2018), 26
Gu, P. Y., 227 Hilden, R., & Frojdendahl, B.
Guba, E. & Lincoln, Y. (1989), 255, 258 (2018). 151,
Hildén, R.,& Fröjdendahl,B. (2018).
Hakim, B. (2015). 139, 140, 151,
Hakim, B. (2015). 153, Hill, K. (2017), 26
Hamid, O.M. (2014), 46 Hill, K. (2017b). 137, 150,
Hamid, O.M., & Baldauf, R.B. Hill, K., & McNamara, T. (2012), 24
(2013), 47 Hillerich, R. L. (1976). 64
Hamp-Lyons, L. (1998). Hillocks Jr, G. (1986). 82
Hamp-Lyons, L., 23, 43 Hismanoglu, 2013 199
Hamzah & Ting (2010), 217 Hismanoglu, M. (2013), 217
Hamzah & Ting, 210 Horwitz et al. (1986), 218
Hanifa (2018), 217 Horwitz, Horwitz & Cope, 201, 202, 206
Hanifa, 201 Houston, D., & Thompson, J.N. (2017).
Harden & Crosby, 211 177, 178,
Harden Crosby (2000), 217 Howard, R.M. 163
Harding, L., & Kremmel, B. (2016), 16 Howerton, A. M. (2016). 138,
Harding, L., & Kremmel, B. (2016). 63 Huang, J., & He, Z. (2016). 137,
Harding, L.,& Kremmel, B. (2016). Hudaya, D. W. (2017). 138,
Harlen, W. (2004). 178,180, Hudaya, D. W. (2017). 152,
Harmer 200 Hughes, 107, 111, 126, 127, 128,
Author Index 265
Hughes, 179, Kiomrs, R., Abdolmehdi, R., & Naser,

Huhta, A. (2007), 44 R. (2011), 19, 22, 25,
Hyde, L. (1983), 244, 258 Kirkpatrick, A. (2006), 46
Hyland, 109, 128, Kirkpatrick, A., & Deterding, D.
Hyland, 64, 82 (2011), 47
Hyland, K., & Hamp-Lyons, L. Klinger, C. J. T. (2016). 138, 139,
(2002). 64 140, 150,
Klinger, C. J. T. (2016). 152,
IELTS, 9, 18, 36 Koh, K., & DePass, C. (2019). 64
Inbar-Lourie, O. (2008), 13, 14, 16, Koo (2009), 218
17, 25, Koo, 2009. 199
Inbar-Lourie, O. (2008), 235, 239 Koran (2015), 218
Inbar-Lourie, O. (2008). 136, 137, 150, Koran, 204, 211
Inbar-Lourie, O. (2008). 159, 180, Krashen, S. (1982). 82
Inbar-Lourie, O., 235, Krashen, S. (1987), 218
International Language Testing Krashen, S., 203
Association (ILTA) (2000), 40 Krekeler, 111,
Irons, A. (2007). Kremmel, B., & Harding, L., 17, 24
Irons, A. (2007). 82 Kremmel, B., Eberharter, K. & Harding,
L. (2017), 17, 21, 25
Janatifar, M. & Marandi, S. S. 163, Kremmel, B.,& Harding, L. (2017).
Jannati, S. (2015). 139, Kumaravadivelu, B. (2006). 64
Jannati, S. (2015). 153, Kvasova, O. (2016).191
Jenkins & Leung (2014), 225, 239 Kvasova, O., & Kavytska, T. (2014), 20,
Jenkins & Leung, 225 21, 22, 23, 24
Jenkins, J. (2006), 46 Kvasova, O., & Kavytska, T. (2014). 178,
Jenkins, J. (2014), 35, 44 Kyriacou, 109, 110,
Jeong, H. (2013), 22, 23, 24,
Jeong, H. (2013). 136, Laborda & Alvarez (2011), 221, 239
Jin, Y. (2010), 22, 23 Lado, R. (1961), 38
Jin, Y. (2010). 135, Lam, R. (2015), 23
Jin, Y. (2010). 162, Lam, R. (2015). 138, 140, 151,
Johnson, 109, 110, Lam, R. (2015). 151, 152,
Johnston et al. 112, Landis & Koch (1977), 228, 239
Landis & Koch, 228
Kachru, Y., 39, 40, 41, 46 Lantolf, J., & Thorne, S. L. (2007). 82
Kaiser, G., & Willander, T. (2005). 64 Lau, A.M.S. (2016). 177,
Kalajahi, S. A. R. & Abdullah A. N. 162, Lee, I. (2014). 82
Kane (2001), 237, 239 Leech et al., 126,
Kane, 237 Leung, C. (2007), 44
Karagul, B. I., Yuksel, D., & Altay, M. Leung, T., & Mohan, B. (2004), 24
(2017). 139, Lewis, B. (2019), 258
Kayaog˘lu & Sag˘lamel (2013), 218 Lightbrown, P.M., & Spada, N. (1999).
Kayoaglu & Saglamel, 202, 203 110, 126,
Keck, (2006). 163, Lightbrown, P.M., & Spada, N.
Kennedy, C., & Thorp, D. 164, (1999). 82
Khadijeh, B., & Amir, R. (2015). 137, Lim (2014), 220, 225, 239
Kim (2006), 47, Lim, 220, 225,
Kim, et al., (2017), 19 Lin, (2014)159,
Kingen (2000) 200 Lin, D. & Su, Y. 161, 162
Kingen (2000), 218 Lindsay & Knight (2006), 218
Kingen, 200 Lindsey & Knight, 204, 210
266 Author Index
Linn, R.L. (2000), 35 Messick, 7, 37, 43, 221,

Liu (2012), 218 Miles, M. B., & Huberman, A. M.
Liu, 199, 201 (1994). 144,
Liu, 2012; 199 Mills, 114,
López, A., & Bernal, R. 160, Mohammadi & Mousalou (2013), 218
Lu & Liu (2011), 218 Mohammadi & Mousalou, 201
Lu & Liu, 2011; 199 Moore & Morton, 111, 112,
Lukin et al., 126, 127, Moss, P.A., Pullin, D., Gee, J.P., &
Lynch, B., & Shaw, P. (2005), 44 Haerbel, E.H. (2005), 35, 44
Lynch, B.K. (2001), 41, 42, 44 Munby (1978), 222, 239
Lyster, R. (2011). Munby, 222
Lyster, R. (2011). 82 Muñoz, A.P., Palacio, M., & Escobar, L.
Lyster, R., Lightbrown, P.M., & Spada, (2012).
N. (1999). 82 Muñoz, A.P., Palacio, M., & Escobar, L.
(2012). 139,
Macaro et al. (2007), 223, 239 Muñoz, A.P., Palacio, M., & Escobar, L.
Macaro et. al., 223, (2012). 178
Mahlberg, 126, Murphey, T., 241, 245, 246, 247, 248,
Mahmoodzadeh (2012), 218 249, 254, 259
Mahmoodzadeh, 201, 202
Mai (2019) 161 National research council, 210
Mak, 201, 202, Nawab, 6,
Malone, M. (2013). 159, Newfields, T. (2006). 138,
Malone, M. (2013). 64 Noels, K. (2001). 82
Malone, M., 18, 23, 27, Norton, L. (2009). 178, 182
Mark (2011), 218 Nunan, D. 160,
Mauranen, A., Llantada, C.P., Swales,
J.M. (2010), 44 O’Loughlin, K. (2006). 136,
Mayor, B., et al. 164 O’Loughlin, K. (2013), 18, 22, 23,
Mazandarani, O., & Troudi, S. (2017), 24, 27,
23, 24, 25, O’Sullivan, B. (2006), 218
McCarthy & Carter, 109 O’Sullivan, B. (2011). 200
McCord, M. B. (2012). 82 Olendr, T.M. (2015). 178
McCroskey (1978), 218 Ölmezer-Öztürk, E., & Aydın, B. 163
McCroskey (2016), 218 Onalan, O., & Karagul, A. E. (2018). 139,
McCroskey, 202 Oshima, A., & Hogue, A. 163, 165,
McGuire et al., 125, 127, Ounis (2017), 218
McNamara & Hill, 180, Ounis, 200, 201
McNamara & Hill, 4, 180, Oz, H. (2014). 139,
McNamara & Roever, 4, 8, 40, Oz, H. (2014). 152
McNamara & Roever, 40, Özl, S. & Atay, D. 153, 162, 170,
McNamara, 4, 8, 24, 37, 179,
Mede, E., & Atay, D. (2017), 21, 22, 139, Palmer (2013), 243, 259
140, 151, 152, 153, Palmer, A. (2015), 243, 244, 259
Meletiadou, E., & Tsagari D. (2016). 189 Panahi & Mohammaditabar, 111, 112,
Mellati, M., & Khademi, M. 163, 126, 127, 128,
169, 171, Park & French (2013), 218
Mendoza, A. A. L., & Arandia, R. B. Park & French, 201
(2009). 64 Pathan et al, 203
Menken, K. (2008), 33, 38, 43 Pathan et al. (2014), 218
Mertler, C. A. 162, Patton, M. Q. (2002). 141,
Messick, 7, 37, 40, 221, 239 Paulus, T. (1999). 83
Author Index 267
Peacock, M. (2001) 83 Semiz, O., & Odabas, K. (2016). 138,

Pennycook, A. (1994), 42 139, 140, 151,
Phakiti (2003), 227, 239 Semiz, O., & Odabas, K. (2016).
Phakiti, 227 151, 152,
Pill, J., & Harding, L. (2013), 15, 16, 23, Seville-Troike, M. (2012), 219
24, 41, Shackleton, C., 224, 228
Pill, J., & Harding, L. (2013). 180, 187 Shackleton, C., 224, 228, 239
Pill, J., & Harding, L. (2013). 64 Shadrina T. (2014). 178,
Pizarro (2009), 221, 238 Sheehan, & Munro, (2017). 152, 153,
Plake, B. S., & Impara, J. C. 162, Sheehan, & Munro, 140,
Plake, B. S., & James, C. I. (1993), 19 Shepard, 3, 11,
Poehner, 4, 5, Shi, L. 167, 169,
Popham, 6, 23, 33, 50, 159, 170, Shi, L., Fazel, I. & Kowkabi, N. 164,
Shively, 210
Qin & Uccelli, 108 Shively, R. L. (2008), 219
Quirk (1990), 41, 47 Shohamy & Inbar (1991), 224, 229, 239
Quirke & Zagallo, 110, 128, Shohamy & Inbar, 224, 229
Quirke et al., 2009, 126, Shohamy, 4, 33, 35, 36, 37, 41, 42, 43
Shohamy, 4, 33, 35, 36, 37, 41, 42, 43,
Rabab’ah, 203 224, 239
Rabab’ah, G. (2016), 219 Shohamy, E., & Or, I. G. (2013). 64
Raddaoui R., & Troudi S., 42 Siegel (2015), 236, 239
Rahimi, F., Esfandiari, M. R., & Amini, Siegel, 236
M. 159, Slavin, 109, 128,
Rauf & McCallum, 47 Smith, J. A., Flowers, P., & Larkin, M.
Raven, 107, 111, 126, (2009). 143,
Restrepo, E., & Jaramillo, D. (2017), 21 Spada 74, 82
Richards (2008) 200 Spencer, 110, 126, 128
Richards, J. C. (2008), 219 Spielberg, 201
Richards, J., 200 Spielberg, C. (1983), 219
Roediger, H. & Finn, B. (2009), Spolsky, B., 33, 35, 36, 37, 38, 41, 44
247, 259 Stabler-Havener, M. L. 150,
Rogier, D. (2014). 137, Stabler-Havener, M. L. 64
Roig, M. 163 Stiggins, R. (1991), 13
Stiggins, R. (1991). 159, 161,
Sadler, D. R. (1989). 83 Sultana, N. 161, 163
Sahinkarakas, S. (2012). 139, Sun, Y. C. & Yang, F. Y.163, 167, 169,
Saka, F. O. 139, Sun, Y. C. 164,
Saka, F. O. 153, Swaffar, J., Romano, S., & Arens, K.
Salman, 203 (1998). 83
Salman, F. (2016), 219
Sariyildiz, G. (2018). 152, 153 Taha & Wong (2016), 219
Sariyildiz, G. 140, Taha & Wong, 203
Sauvignon, S. J. (2005). 83 Taleb, 203
Saville- Troike, 202 Taleb, S. M. (2017), 219
Scarino, A. 135, Talley & Hui-ling (2014), 219
Scarino, A. 171, Talley & Hui-ling, 200
Scarino, A. 64 Talley & Hui-ling, 2014). 200
Scarino, A., 15, 16, 17, 23, 25, Tanveer, 203, 204
Schissel, L. J., Leung, C., & Tanveer, M. (2007), 219
Chalhoub-Deville, M. (2019), 25 Tarone, 2005. 199
Seargeant, P. (2012), 40 Tarone, 204
268 Author Index
Tarone, E. (2005), 219 Vogt, & Tsagari, 6, 20, 21, 22, 23, 24, 27,
Taylor & Geranpayeh (2011), 54, 59, 60, 138, 139, 140, 151, 153,
225, 239 180, 188, 220
Taylor & Geranpayeh, 225 Volante, L., and Fazio, X. 154,
Taylor, 6, 7, 13, 14, 16, 17, 26, 53, 57, Vygotsky, 109
58, 59, 60, 180, 225, 239 Vygotsky, 203
Teasdale, A., & Leung, C. (2000), 44 Vygotsky, L. S. (1986), 219
Teasdale, A., & Leung, C. (2000). 51
Thomas, J., Allman, C., & Beech, Wach, A. 135, 139,
M. 160, Wach, A. 152, 153,
Thorndike, E.L. (1904), 37, 38 Wagner (2014), 221, 222, 224, 240
Thorndike, E.L. (1904). 51 Wagner & Toth (2017), 224, 236, 240
Tomlinson, B. (2010). 51 Wagner & Toth, 224, 236
Torrance, Pryor, 4, 71, Wagner, 221, 222, 224
Treasure (2011), 241, 259 Wall (2013), 220, 221
Trede, F., & Higgs, J. (2010), 42 Wall (2013), 4
Trede, F., & Higgs, J. (2010). 51 Wall, & Alderson, 4
Trinity College London (2017). 65 Wall, 220, 221,
Trudgill, P., & Hannah, J. (2008), 40 Wall, 220, 221, 240
Trudgill, P., & Hannah, J. (2008). 51 Wall, 4, 220, 221,
Tsagari & Vogt (2017), 220, 240 Wang (2010), 219, 235, 240
Tsagari & Vogt, 2017 151, 153 Wang, J., 235
Tsagari & Vogt, 220 Wang, T., 201
Tsagari & Vogt. 138, 139, Warschauer, M. (2002). 83
140, 151, Weigle cited in Ahmad,2019 161,
Tsagari, D. (2012). 152, Weir et al. (2013), 33, 35, 37
Tsagari, D. 138, 139, 150, Weir, (2005), 226, 240
Tsagari, D. 65 Weir, 226
Tsagari, et al., 180, ., 240 White, E. 138
Tsui, A. B., & Ng, M. (2000). 83 Wilson 86, 89, 90
Tupas, F.R.T. (2010), 46 Wilson, 110, 111, 128
Tupas, F.R.T. (2010). 51 Wilson, 127,
Turk, M. 139, 140, 151, Wingate & Tribble, 110, 126
Woodrow, 201, 202, 210
Ukrayinska, O. 137, 139, Woodrow, L. (2006), 219
Ukrayinska, O. 152, Wormeli (2018), 247, 259
Ur, 2012). 199
Ur, 205, 210 Xu, Y., & Brown, G. T. L. (2017), 8,
Ur, P. (2012), 219 21, 24
Valeo, & Barkaoui (2017), 21 Yahya, 201, 203

Vandergrift (2003), 232, 240 Yahya, M. (2013), 219
Vandergrift & Goh (2012), 222, 224, Yaikhong & Usaha (2012), 219
236, 240 Yaikhong & Usaha, 2012 199
Vandergrift & Goh, 222, 224, 236 Yaikhong & Usaha, 202
Vandergrift & Tafaghodtari (2010), Yamada, K. 164,
236, 240 Yan, J. (2010). 140,
Vandergrift & Tafaghodtari, 236 Yan, J. (2010). 65
Vandergrift, 232 Verma, S. Yan, X., Zhang, C., & Fan, J. J. (2018),
Viengsang, R. 136, 19, 22,
Viengsang, R.154, Yan, X., Zhang, C., & Fan, J. J.
Villa Larenas, S. (2017), 21 (2018). 139,
Author Index 269
Yan, X., Zhang, C., & Fan, J. J. (2018). Young, D. J. (1994), 219, 240
152, 153, Yousif, 201
Yan, X., Zhang, C., & Fan, J. J. Yusof, R. (2016), 219
(2018). 65
Yang (2000), 235, 240 Zeichner & Liston, 109, 110,
Yastibas, A. E., & Takkaç, M. (2018), Zhang (2012), 236, 240
22, 139, Zhang, 236
Yastibas, A. E., & Takkac, M. (2018). Zheng, Y. (2014). 51
Yi’an (1998), 235, 240 Zhou, 201
Young, 201, Zhou, M. (2016), 219
Index
Academic 50, 51, 53, 64, 71, 81, 83, 97, Basic, 15, 21, 39, 151, 153, 160, 172,
159, 160, 161, 162, 163, 164, 165, 180, 225, 241,
166, 167, 171, 176, 178, 182,
academic literacy 53, 71, 159, 160, call report, 245
163,164, 165, 169,170, 171,172, candidates, 33, 37, 39, 46, 87, 224, 228,
academic writing, 44 230, 233,
Academic, 9, 10, 18, 34, 36, 38, 44, 46, CEFR, 39, 217, 220, 221, 223, 225, 226,
204, 211, 214, 216, 218, 221, 225, 228, 233, 235, 237, 238, 239,
226, 237, 239 clarity 84, 94
Accent (native and non-native speaker classroom teachers 72, 203, 204, 205, 208,
varieties), 210, 224–26, 209, 210, 211, 220, 221, 235, 236
Accountability 6, 7, 54, 56, 177, 178, cognitive 95, 199, 210, 222, 223, 226,
238, 255 227, 228, 230, 232, 235, 237, 238,
Accuracy 45, 88, 98, 99, 177, 239, 249, 259
179, 182, coherence 93, 162,
action logging, 245, 259 cohesion 93,
Aladdin Factor, 242, 258 community, 7, 36, 46, 53, 62, 70, 71, 74,
Alternative assessment 4, 9, 18, 33, 34, 39, 80, 82, 171, 200, 172, 218, 219, 225,
41, 44, 45, 91, 151, 154, 182, 183, 241, 250, 247,
184, 185, 188, 189, 193 Competence-based curriculum, 220, 221
Anxiety 180, 200, 201–10, Components, 7, 15, 25, 26, 53, 55, 56,
anxiety, 217, 218, 219 58, 87
apprehension, 201, 202, 203, 204, 206, concept 7–10, 13–28, 37, 43, 47, 52–64,
210, 218 73, 81, 92, 99, 151, 154, 160, 179,
assessing speaking, 199–200, 204, 192, 253
assessment 3–28, 33–35, 44, 46, 47, 151, conceptions, 4, 9, 10, 14, 15, 17, 20, 21,
152, 153, 154, 159, 160, 161, 162, 24, 28, 163,
163, 164, 165, 166, 167, 168, 169, connected speech, 222, 224, 236
170, 171, 172, 176, 177, 178, 179, construct 51, 56, 64, 74, 80, 87, 96, 154,
180, 181, 182, 183, 184, 185, 187, 159, 160, 161, 162, 163, 179, 181,
188, 189, 190, 191, 192, 193, 199, 186, 187, 188, 193,
200, 51–100, 217, 218, 220, 221, 224, construct, 4, 10, 14, 15, 17, 18, 37201,
225, 236, 237, 238, 239, 240, 241, 217, 221, 222, 223, 224, 225, 226,
247, 254, 258, 259, 260, 261 227, 228, 235, 236, 237, 238, 239
assessors 5–8, 10, 17, 58, 152, 153, 164, Content validity, 37
169, 172, 177 Context, 4, 6–27, 34–35, 38–41, 52, 56,
assurance 85, 87, 96, 98, 99, 176, 178, 57, 59, 61, 65, 69, 70–73, 75–78, 81,
182, 183, 185, 186, 192, 86, 89, 91, 92, 95–98, 152, 153, 160,
Index 271
161, 163, 165, 171, 178, 179, 180, exam-oriented 151, 153,
181, 184, 185, 193, 200, 203, 204, external, 8, 28, 176, 179, 186,
221, 222, 223, 224, 225, 227,
228, 229, fair, 4, 10, 42, 218, 166, 170, 178, 179,
context, 221, 222, 223, 224, 225, 227, 180, 182, 183, 184, 185, 188, 235, 259
228, 229, 235, 237, 238, 239, 247, feedback 69, 70–83, 152, 154, 171, 177,
260, 261 178, 182, 183, 184, 185, 188,
Correlation 167, 168, 170, 189, 192,
Correlation, 201, 202, 207, 209 feedback, 211, 220, 248, 249, 250, 251,
Correlation, 9, 19 252, 253,
course-based, 20, 39, 46, 151, 154, 152, Foreign Language Anxiety, 217, 218
161, 164, 165, 166, 167, 168, 169, foreign language teaching, 199, 200, 201,
170, 180, 181, 187, 188, 204 203–211, 218, 200, 221, 222, 224,
criterion-referenced, 3, 11, 179, 235, 236
critical attitude, 241 formative assessment, 5, 7, 71, 76, 82, 83,
Critical Language Testing, 34, 41 151, 154, 177, 178, 179, 220, 236,
culture, 14, 15, 28, 40, 41, 44, 69, 81, 247, 254,
202, 229, 258 frameworks, 7, 14, 15, 24, 25, 52, 54, 55,
curriculum, 5, 6, 39, 42, 43, 45, 47, 55, 62, 160, 200, 227,
56, 70, 88, 176, 182, 186, 187, 199, functional literacy, 15, 16, 57,
217, 221, 225, 235, 236, 237, functions 54, 84, 97, 99, 199, 200,
functions, 7, 38, 236
descriptive statistics 167, 168, 206
dimensions of assessment, 16, 199, genre, 45, 224
discourse analysis, 24 grammar, 22, 40, 41, 80, 82, 93, 96, 151,
dynamic assessment, 239 154, 166, 167, 170, 204
EAP, 9, 33, 34, 37, 41, 44, 47 high-stakes tests 186, 178, 185, 186,
education, 3, 14, 19, 22, 27, 34, 35, 42, 220, 236
44, 52–55, 57, 59–65, 70, 81, 82, 88, historical, 14–16, 35, 47, 54, 55, 57, 160,
199, 153, 154, 161, 176, 177, 178, holistic assessment 162,
180, 182, 183, 190, 191, 199, 210, hypothesis 181, 203, 204, 222
217, 218, 220, 220, 221, 225, 237,
239, 247, 258, 259 IELTS, 9, 18, 35, 36, 38, 39, 44, 161,
Educational reforms, 221, 235, 237, Illiteracy, 15, 16, 57
Effectiveness 89, 92, 184, 190, 191, impact of testing, 14, 55, 160, 221
EFL, 9, 10, 19–24, 64, 69, 83–91, 93, implications 10, 63–65, 81, 97, 201, 236,
95–97, 151, 152, 153, 154, 161, 162, 153, 159, 160, 161, 165, 169,
163, 165, 166, 171, 200, 201, 204, 171, 193,
206, 207, 209, 217, 218, 219, 239, implications
240, 259, in-service courses, 6, 236
ELT practitioners, 260 inaccurate paraphrase 163,
English 51, 61, 63–65, 69–71, 81–84, 88, institutional, 151, 59, 161, 171, 172, 176,
96, 161, 165, 176, 199, 200 instruction, 4, 5, 9, 10, 22, 43, 54, 72, 74,
English as a Foreign Language, 19, 41, 69, 79, 82, 88, 89, 99, 199, 204, 205, 206,
81, 160, 161, 171, 200, 240, 259, 208, 225,
English language testing, 9, 18, 35, 161, interfaces, 8, 10, 199,
221, 222, 235 internal, 81, 76,
English, 9, 18–22, 26, 27, 33–46, 199, interpretative, 14, 171, 200
217, 218, 219, 200, 203, 204, 205, interpretative, 200
206, 207, 209, 221, 224, 225, 237,
238, 239, 240, 249, 252, 256, 259, 260 journal, 9, 44, 56, 64, 81, 82, 83
272 Index
Knowledge, 3–22, 26, 28, 34–44, 51–54, Limitations 90, 163, 171, 192,200, 227
56–65, 69, 74, 76, 79, 95, 98, 151, listening 61, 84, 87, 93, 203, 209, 211,
152, 153, 160, 161, 162, 163, 171, 220–237
190, 193, 201, 220, 221, 222, 227, literacy 51–55, 57–100, 154, 159, 160,
229, 233, 235 162, 163, 164, 165, 166, 170, 171,
172, 177, 180, 181, 182, 184, 186,
L2 82, 83, 200–211, 222, 226 189, 193,
LAL training 151, 152, 153, 154, 160, literacy, 3–27, 238, 240, 247, 254, 260
171, 236 local practices 58, 59, 184
Language 151, 152, 153, 154, 159, 161, long-term 84, 172, 191, 211, 222,
164, 167, 176, 180, 188, 199,
language assessment 51–100, 151, 154, measurement 55, 56, 161, 162, 164, 167,
159, 160, 161, 162, 163, 171, 172, 171, 177, 178, 180, 188,
language assessment literacy, 51–100, 159, measurement scale 160, 170, 171,
160, 220, 220–235, 237 Metacognitive strategies, 223, 227,
language education 52, 59, 62, 63, 161, Method 64, 69, 86, 92, 94, 160, 161,
199, 210, 220, 171, 178, 181, 183, 192, 205, 227,
language learning 52, 62, 81–83, 86, 88, mid-term 182, 183, 186,
100, 201, 152, 154, 202, 205 multidimensional literacy 57
language pedagogy, 58 220, 236
language proficiency 171, 200 narrative genre, 200
language skills 61, 151, 152, 154, 159, negotiation 74
160, 163, 222 nominal literacy 57
language teaching 54, 61, 64, 82–84, 88, non-parametric correlation analysis 167,
89, 159, 177, 199–211 norm-referenced 179,
language testing 51, 61–65, 85, 151, novice EFL teachers 151, 152, 153, 154,
152, 153, 159, 161, 162, 177, 179,
180, 221, opinions 151, 152,
language trait 161, 223 oral proficiency 200, 199, 200, 204, 211,
language, 70, 88, 201, 202, 203, 204, 206, outcomes 52, 53, 72–76, 82, 88, 97, 161,
210, 222, 224, 225, 236 171, 188, 220
large-scale testing 54, 61, 91, 179, overrepresentation 187,
180, 182,
learning 51–52, 56, 60–62, 64, 69, 70–77, paradigm 51,180,
79, 81–83, 86, 88, 89, 91–94, 97, 98, paraphrasing 159, 160, 163, 164, 165,
100, 151, 152, 154, 160, 161, 171, 166, 167, 168, 169, 170, 171, 172,
176, 177, 178, 180, 184, 188, Participants 95, 100, 152, 159, 162, 163,
192, 200, 165, 170, 171, 184, 227, 228, 229,
learning 51–53, 56, 60–62, 64, 69–77, 230, 231, 233, 235
79, 81, 82, 83, 86, 88, 89, 91–94, 97, patchwriting 163, 164, 167, 168,169,
98, 100 pedagogy 58, 59, 78, 220, 236, 178,
learning outcomes 72–76, 220 perception 82, 92, 200, 201, 153, 154,
Learning, 3–11, 14, 20, 21, 27, 33, 42, 159, 162, 172, 177, 182, 183, 184,
44, 200, 201, 202, 204, 206, 210, 217, 185, 187, 188, 191, 192, 200
218, 219, 220, 221, 230, 235, 236, Performance 53, 54, 56, 74, 77, 81, 84,
237, 238, 239, 240, 241, 242, 243, 86, 88, 91, 93, 96, 99, 100, 159, 160,
244, 246, 247, 248, 249, 250, 252, 161, 163, 165, 167, 168, 169, 170,
253, 254, 255, 258, 259, 260 172, 178, 179, 180, 189, 199, 200,
levels 54, 56–59, 70, 74, 88151, 154, 161, 201, 202, 204, 205
162, 163, 164, 165, 171, 172, 176, Positive washback, 220–237
180, 200, 202, 206, 209, 210, 222, practical knowledge 153,
228, 235 practical skills 61, 152,
Index 273
practices, 4–28, 33, 34, 37, 41, 42, 44, 47, 201, 203, 204, 205, 210, 220, 222,
152, 153, 154, 160, 165, 170, 171, 225, 227, 233, 235, 236
172, 177, 178, 181, 183, 184, 185, social 51, 52, 54–57, 59, 71, 75,160, 167,
186, 191, 192, 193, 201, 203, 204, 171, 172, 199,200
207, 208, 209, 211, 220, 221, 258, social context 56, 71,200
practicum 152, 154, 162, social interaction, 204
pre-service teachers 60, 84 Social-Constructivism, 203
principles 54, 56–60, 64, 81, 82, 84, sociocultural values 58
94, 154, 159, 160, 180, 181, 182, source text 163, 169, 172,224, 226,
185, 192, speaking 61, 84, 87, 88, 93, 95, 199
principles and concepts, 58–59, 160, Speaking Anxiety, 199–216, 200 – 210,
principles, 236 217, 218, 219
procedural and conceptual literacy 57 speaking, 199–216
productive skills 170, staff seminars 190, 192,
productive skills, 199, 204, 208 stakeholders 54, 55, 57–62, 64, 91,159,
progress 70, 71, 76, 79, 81, 90, 172, 178, 177, 181, 182, 184, 185, 191, 193
193, 220 Standard English 51, 224
Purpose for listening, 222, 226, 229, 236 standardized tests 161, 176, 180, 220,
Strategic competence, 222
qualifications 165, structure 74, 75, 84, 85, 87, 89, 95, 97,
qualitative data, 227, 228 98, 100, 163, 164, 169, 187, 188, 199,
quality 63, 70, 71, 81, 85, 87, 92, 94, students 61, 67, 69, 70–82, 84, 85,
164, 171, 172, 176, 178, 179, 180, 88–100, 153, 159, 162, 163, 165, 166,
181,182, 183, 185, 186, 187, 190, 192, 168, 169, 170, 171, 172, 176, 177,
193, 222 178, 180, 181, 182, 183, 184, 185,
187, 188, 189, 192, 193, 200, 201,
reading 53, 61, 79, 84, 88, 90, 202, 203, 204, 205, 206, 207, 208,
93, 199, 209, 210, 211, 220, 221, 225, 236
reliability 56, 97, 164, 176, 177, 178, 179, Students,
180, 181, 182, 183, 186, 189, 191, summative assessment 71, 151, 160, 165,
192, 193, 228 176, 177, 178, 181, 182, 183, 184,
repetition 167, 224 185, 188, 189, 190, 192, 193, 204,
research 151, 152, 154, 159, 160, 162, 205, 210, 220
164, 165, 168, 169, 172, 178, 181, system 65, 70, 76, 82, 154, 165, 171, 172,
182, 183, 185, 186, 190, 191, 193, 177, 176, 201, 202, 204, 205, 210,
199, 200, 201, 203, 222, 227, 221, 225
Retrospective verbal protocol, 227,
228, 229 Target language use domain, 224, 225
rubrics 73, 75, 84, 204, 159, 160, 161, teacher 51, 55, 58, 59, 60–62, 64, 70, 72,
162, 164, 165, 166, 167, 168, 169, 76, 78, 82–84, 88, 89, 90, 92, 94, 96,
170, 171, 98, 151, 152, 153, 154, 159, 160, 161,
162, 163, 165, 166, 167, 169, 170,
sample size 166, 171, 205, 227 171,172, 176, 177, 178, 179, 180,
scores 152, 160, 164, 165, 167, 168, 169, 181,182, 183, 184, 185, 186, 187, 188,
170, 179, 188, 191, 207, 233, 235 189, 190, 191, 192, 193,
Second Language Acquisition 82, teacher educators 154,
199, 221 teacher training 61, 64, 177, 181, 220,
short-term 190, 191, 193, 227 teacher training programs 161, 178,
Skills 53–56, 58, 59, 61, 73, 74, 76, 77, teacher, 203, 204, 205, 208, 210, 211,
79–81, 84, 92, 96, 99, 151, 152, 153, 220, 221, 235, 236
154, 159, 160, 161, 163, 165, 166, teachers’ assessment literacy 64, 163, 169,
170, 171, 178, 180, 181, 199, 200, 177, 180, 182, 220,
274 Index
teaching career 154, theoretical knowledge 54, 153, 171,

teaching experience 95 theories 57, 62, 82,
teaching practice 152, think alouds, 227, 228
teaching practice 71, 97, 205, 206, TOEFL 161, 235
207, 209, training programs 60
teaching, 62, 69, 70, 74, 81, 83, 89, 151, transparency 167, 181,
152, 153, 154, 159, 160, 162, 163, trustworthiness 179, 181, 182, 189,
165, 171, 172, 176, 177, 178, 179, 191, 193,
186, 188, 199, 200, 201, 203, 204,
205, 208, 210, 211, 220, 221, 222, under-representation 186, 224
224, 235, 236 undergraduate students 165, 166, 202,
test 51, 54, 56, 58–60, 64, 84, 87, 88, 91, 203, 206,
93, 96, 151, 152, 153, 154, 159, 160, uniformity 171, 182, 183, 184, 185, 186,
161, 162, 164, 165, 166, 168, 192,
169,170,171, 176, 178, 179, 180, 181, university entrance tests, 220, 221,
182, 183, 184, 185, 186, 187, 188, 224, 225
189, 192, 193, university, 79, 151, 162, 163, 176, 177,
test construction 87, 161, 181, 183, 184, 193, 200, 203,
test design 91, 161, 162, 185, 189, 205, 225,
Test development 56, 176, 181, 182, 183,
185, 186, 192, 193, 221, 222 valid 55, 161, 177, 180, 182, 185,
Test preparation, 220 187, 235
test results 54, 159, 179, 182, 188, 192, validity 56, 97, 179, 180, 181, 183, 186,
test scores 152, 160, 164, 165, 168, 169, 187, 188, 221, 228
170, 179, 188 Validity evidence, 221
test scores, 233, 235 vocabulary 93, 96, 154, 151, 205,
Test specifications, 226, 227, 236 228, 235
test takers 159, 187, 227
Test washback, 220–237 washback 182, 183, 184, 185, 188,
test, 210, 220, 221, 222, 224, 225, 227, 220–237
228, 229, 232, 233, 235, 236 washback effect 154,155, 221
testing criteria 152, words 51, 80, 95, 163, 165, 167, 168,
testing policy 153, 169, 170, 222, 224, 229, 236
testing practices 60 writing 52, 53, 56, 59, 61, 63, 71–75,
testing procedures 152, 77–84, 88, 100, 159, 160, 161, 162,
testing skills 151, 152, 153, 163, 165, 166, 167, 169, 170, 171,
testing system 161, 172, 181, 199, 199, 203, 204,
testing techniques 152, 181, writing examiners 161,
testing tools 152, 153, 154, writing exams 161,
text length 164, 167, 168, 169, 170, Zone of Proximal Development, 204
Text mapping protocol, 226

2020 Book Perspectives On Language Assessment Literacy

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2020 Book Perspectives On Language Assessment Literacy

Uploaded by

Copyright:

Available Formats

‘This volume addresses an important and perennial problem in all of education–

Language assessment literacy is more complex in that language is not just an

Perspectives on Language Assessment Literacy describes how elements of language

Sahbi Hidri is Assistant Professor of Applied Linguistics, University of Tunis,

Challenges for Improved Student Learning

ISBN: 978-0-367-85969-5 (hbk)

2 Language assessment literacy: Concepts, challenges, and prospects 13

3 Traditional assessment and encouraging alternative assessment that

4 Language assessment literacy: Ontogenetic and phylogenetic

6 Using checklists for developing student teachers’ language

7 An investigation into the correlation between IETLS test

9 Teachers’ assessment of academic writing: Implications for language

10 Reliability of classroom-based assessment as perceived by university

13 Testing abilities to understand, not ignorance or intelligence: Social

Conclusion: Language assessment literacy: The way forward 260

Author Index 262

3.1 Academic reading multiple–choice task 36

4.1 LAL stages 57

11.6 Correlation between FLCAS and extracurricular exposure to

Fatema Al Awadi is Education Faculty and a former Bachelor graduate

Mojtaba Mohammadi is Assistant Professor of TEFL at Islamic Azad

Language assessment literacy (LAL) has tremendous potential to enhance stu-

macro-levels of LAL. Reading this book with a contextualization of the studies

Language assessment literacy:

Language assessment literacy

Review of the literature

Language education and the role of assessment

professionals are reﬂecting on the reasons behind speciﬁc practices of assessment

Linking assessment and teaching and learning

Although washback investigations explore the inﬂuence of assessment on

Language assessment literacy

Rigorous assessment literacy training pertains to practical and beneﬁcial

Encouraging Alternative Assessment that Promotes Learning: Illustrations from

Language assessment literacy

The conceptualizations of language assessment literacy (LAL)

Skills provide the training in necessary and appropriate methodology,

Davies’ model suggests that language assessment literate professionals should

The knowledge, skills and abilities required to design, develop, maintain or

The innovative aspect of Fulcher’s deﬁnition is that he considers that skills,

Assessment literacy needs to be considered in relation to the theoretical

On this view, self-awareness becomes an aim and measure of success in LAL

a) Illiteracy, i.e. the state of ignorance of language assessment concepts and

d) Procedural and conceptual literacy, i.e. understanding central concepts of

stakeholders. The point is that Taylor’s conceptualization provides a powerful

Expanding the circle of stakeholders, Malone (2017) stressed the centrality of

Empirical research in language assessment literacy

LAL research with various stakeholders

Aspects of LAL and English language teachers

teachers in Turkish universities have limited language testing and assessment

developed in isolation from other disciplines. According to Davies, this is

exact geographical distribution of participants is not possible. Nevertheless, the

Of course, the most powerful factor in assessment practices is the wider

Traditional assessment and

Review of the literature

Traditional language testing

The paradigmatic stance and aims of traditional testing

A reduced the productivity of farmland by 20 per cent.

11.By the mid-1980s, farmers in Denmark

A used 50 per cent less fertiliser than Dutch farmers.

12.Which one of the following increased in New Zealand after 1984?

The quest for uniform, single measurement is facilitated by an inter-

Family bike fun

Traditional testing can be seen to balance on the edges of discrete-objective

A checklist may comprise questions, statements, or just enumerate

A checklist can be computer-assisted, or be published and ﬁlled in with a

Sentence fragments: Missing subjects with predicates expressed with the

Items come each one below others.