Professional Documents
Culture Documents
2020 Book Perspectives On Language Assessment Literacy
2020 Book Perspectives On Language Assessment Literacy
how teachers and students perceive, respond to, and make use (or not) of
assessment. Assessment literacy is universal in that all systems and levels imple-
ment assessment to evaluate instruction, learning, or curriculum and, just as
frequently, to inform improved teaching and learning; so language education,
not just 2nd language, researchers will find the book useful.
The chapters in this volume help move language assessment more into multi-
faceted data collection about competencies for the sake of improving the
quality of learning and teaching. It’s nice to see language assessment research
catching up with the world of classroom assessment theory and practice. The
volume provides access to research and thinking about the topic from some
relatively under-represented perspectives, including Turkey, Tunisia, Oman,
Ukraine, UAE, Saudi Arabia, Japan; as well as Europe and the UK.’
Professor Gavin T L Brown
Associate Dean Postgraduate Research (ADPG),
The University of Auckland
‘Langauge assessment literacy (LAL) s a critical topic in the field of language testing
and assessment; see, for example, the recently established (April of 2019) Language
Assessment Literacy Special Interest Group (LALSIG) within the International Lan-
guage Testing Association. Perspectives on Language Assessment Literacy comprises
chapters by authors from traditionally less represented regions of the world areas and
thus represents an important contribution to the field. The volume also helps advance
the scholarship of LAL. Authors pay special attention to how language assessment
theory and practice can better synergize with teaching to improve students’ language
learning and to better document students’ learning outcomes.’
Micheline Chalhoub-Deville, Ph.D.
Professor, University of North Carolina at Greensboro
Perspectives on Language
Assessment Literacy
Edited by
Sahbi Hidri
First published 2021
by Routledge
2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN
and by Routledge
52 Vanderbilt Avenue, New York, NY 10017
Routledge is an imprint of the Taylor & Francis Group, an informa business
© 2021 selection and editorial matter, Sahbi Hidri; individual chapters, the
contributors
The right of Sahbi Hidri to be identified as the author of the editorial
material, and of the authors for their individual chapters, has been asserted
in accordance with sections 77 and 78 of the Copyright, Designs and
Patents Act 1988.
All rights reserved. No part of this book may be reprinted or reproduced
or utilised in any form or by any electronic, mechanical, or other means,
now known or hereafter invented, including photocopying and recording, or
in any information storage or retrieval system, without permission in
writing from the publishers.
Trademark notice: Product or corporate names may be trademarks or
registered trademarks, and are used only for identification and explanation
without intent to infringe.
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
Library of Congress Cataloging-in-Publication Data
A catalog record has been requested for this book
Typeset in Bembo
by Taylor & Francis Books
Contents
List of figures ix
List of tables x
List of contributors xii
Preface xiv
PART I
Language assessment literacy: Theoretical foundations 1
1 Language assessment literacy: Where to go? 3
SAHBI HIDRI
PART 2
Students’ language assessment literacy 67
5 Enhancing assessment literacy through feedback and feedforward:
A reflective practice in an EFL classroom 69
JUNIFER A. ABATAYO
viii Contents
PART 3
Teachers’ language assessment literacy 133
8 Language assessment literacy of novice EFL teachers: Perceptions,
experiences, and training 135
AYLIN SEVIMEL-SAHIN
PART 4
Language assessment literacy: Interfaces between
teaching and assessment 197
11 To teach speaking or not to teach? Biasing for the interfaces
between teaching 199
DIANA AL JAHROMI
12 Planning for positive washback: The case of a listening proficiency test 220
CAROLINE SHACKLETON
Introduction
The main points in traditional assessment are certifying reliability and validity in
assessment instruments. These aspects are consistent with the goals of quanti-
fying and measuring learning and collecting information. The fact that these
aspects are the concerns in assessing learners establishes the final grade as the
product of traditional assessment (Brown, 2004). As pointed out by Alderson
(1999), Alderson et al. (1995), and Alderson and Banerjee (2001), once learning
is characterized as a number, it is of utmost significance that the number be as
reliable and valid as possible, otherwise it has no meaning. The philosophy of
traditional assessment allows assessment instruments to rank students against
each other which is termed as norm-referenced testing (NRT). According to
Dunn et al. (2004), and Dunn and Dunn (1997), norm-referenced assessment
can be inaccurate or unreliable in some cases and is consequently regarded to
be less valid than criterion-referenced testing (CRT). According to Fulcher and
Davidson (2007), the shift towards criterion-referenced assessment is indicative
of the necessity to improve more precise representations of students’ learning
than can be provided through norm-referenced assessment. Shepard (2000)
pointed out that this viewpoint on learning is based on the notion that learning
is a procedure of gathering information and knowledge in separate pieces, with
restricted writing or transfer, and that assessment under this pattern attempts to
find whether learners have retained the information which is basically given to
them by their instructors.
absence has the potential to cripple the quality of a program as well as the
purpose of teaching. It is for this reason seen ‘as a sine qua non for today’s
competent educator’ (Popham, 2006, 2009). In addition, Nawab (2012) argues
that second language teachers must be trained exclusively under separate pro-
grams because language teaching and assessment demands something different
from other disciplines. It demands diligence and sensitivity towards the learning
needs of the stakeholders. It is through the phenomenon of language that
learners have to reach the stock of knowledge for other subjects. This throws
light on the significance of addressing the issues of assessment literacy among
second language programs.
After the first part of the 21st century, the concerns regarding assessment
literacy started to appear all around globe and an effective definition of the
term was needed in order to relate it to second language curriculum design.
This points towards the significance of the knowledge and practice of assess-
ment in the context of its utility in updating language learning materials
(Fulcher, 2012). In this way, assessment training among language teachers is
considered of prime importance in the delineation of successful language
teaching courses.
However, this calls in to question any language teaching and assessing cri-
teria that have been unsuccessful in yielding acceptable results from learners’
performance. In addition, it lightens the burden of responsibility and
accountability on the shoulders of learners because language learning contexts
are now packages that have a foundation built on the assessor’s assessment
skills of the assessors. The rest is unfolded later that encapsulates the traits of
other stakeholders. Nevertheless, educational organizations are still responsible
for arranging pre- and in-service courses and workshops to train the assessors
in order to alleviate the weak standards of assessment around them.
A study carried out by Vogt and Tsagari (2014) reveals that language tea-
chers, overall, expressed the need for training in issues of assessment because
they continuously have to come to terms with standardized testing. Moreover,
more and more people are getting involved in analyzing scores and deciding
about the assessment issues. In such a scenario, the systems that do not move
ahead with updates, are bound to barely meet, or completely fail in bringing
about desired learning outcomes among students. It actually is the time to
reengineer the whole system of assessment and its objectives (Taylor, 2009),
seeking to answer the most significant question: Why do we need to assess our
students? The answer takes into account professional assessment practices,
effective, assessment-literate teachers, and conscientious organizations that are
capable of providing concrete opportunities for their assessors to learn how to
effectively assess, what to assess, and what not to assess. Too much assessment
points towards the rote learning present in language learning programs in many
educational systems.
Growing professionalization has lent more meanings to the term ‘assessment
literacy’, and in the field of applied linguistics, it is now related to something
Where to go? 7
more than setting standards for developing and administering tests using cer-
tain techniques. It is something that goes beyond the traits of tests to include
the ethics of testing in policy and practice. It engages assessors at a level which
is much more than just the technical characteristics of tests (Taylor, 2009),
encapsulating the philosophy and ethics of testing. It is noteworthy that due
to globalization, an ethical milieu has appeared in the realm of assessment
literacy for assessors that functions on the standards of accountability and
responsibility. An assessor is not only a stakeholder whose responsibility starts
on the day of a test and finishes after an assigned duration. The modern
assessor is a significant learning partner whose effective techniques and con-
ceptions of assessment can produce effective learning outcomes among lear-
ners. Today’s assessor has to exercise, on an almost daily basis, analytical and
formative assessment in theory and practice. The primary reason behind such
a significant in assessment literacy is that the 21st century has so far been all
about promoting global connections among different populations of the
world. This purpose can only be achieved through understanding and
expressing ideas for which languages are the pivotal tools.
Another significant factor that can make or break assessment is the assessors’
understanding of the validity and reliability of tests. Assessors have to be fully
vigilant to look into the standardization of testing. This also includes the ability
to spot fake standardization in any testing contexts. This is too critical to be the
work of a naïve teachers without any prior training in assessment. It is for this
reason that the expert language teaching community has been working to
develop explicit standards in testing while expanding the concepts of validity
and reliability (Messick, 1996). This has updated the components of assessment
literacy. There has been a paradigmatic shift from merely relying upon the
assessment material, which is appropriate for testing a certain level, to increasing
accountability and analytically understanding the learners’ needs in a particular
context. Therefore, the shape of the present assessment paradigm is, according to
Davies (2003, 2013, 2014, 2008), ‘Knowledge + Skills + Principles’. The upda-
ted assessment frameworks function on the shoulder of teachers who reflect the
relevant ethical involvement in the process of assessment. This further indicates a
major change in the definition of assessment literacy in the age of information.
Davies (2008, 2014) continues to discuss what constitutes the elements of
Knowledge, Skills, and Principles in assessment. Knowledge is the relevant
background that is ready to function at the time of practice; Skills are the abilities
in an assessor to standardize tests and use related methodologies to hold mean-
ingful assessment, whereas Principles are related with the suitable use of language
tests with fairness and professional expertise.
We frequently come across reports of training and workshops arranged by
stakeholder organizations for updating teachers’ assessment literacy. We need to
approach this with much caution as there is a big difference between the
arrangements that bring about positive changes in teachers’ assessment concep-
tions, and counterfeit arrangements that are arranged for the sake of publicity.
8 Sahbi Hidri
Part Three of the book, Teachers’ Language Assessment Literacy, tackles teachers’
LAL. In Chapter One of this part, Sahin stresses the necessity of language tea-
chers to be assessment literate so that they can assess students in an objective way
as per their instructional context. The study was carried out on 22 Turkish EFL
teachers and was aimed at investigating their conceptions and practices of assess-
ment. Results of the study revealed that teachers still needed to work more on
their assessment beliefs so that they can test students in a fair way. In the second
chapter, Ahmad accentuates the relevance of teachers to develop their language
assessment literacy. Based on data analysis of students’ writing, the author
affirmed that both the assessment standards and teachers’ rating should be revis-
ited as they did not meet the expectations of writing fair and valid writing
assessment tasks. The study is important in signposting the relevance of standar-
dizing international benchmarks for the assessment of writing. In Chapter Three,
Kvasova and Shovkovyi tackled some stakeholders’ perceptions of the reliability
of classroom-based summative assessments in Ukraine. The authors stress the fact
that teachers most often lack important assessment literacy skills to be operational
in their academic contexts. However, this lack of assessment expertise, as per-
ceived by university managers, teachers, and students might lead to harmful
effects and therefore will not meet international assessment standards. Implica-
tions of the study called for biasing for a better quality of constructed tests whose
major purpose is to guarantee assessment reliability.
Part Three of the book, Language Assessment Literacy: Interfaces between Teach-
ing and Assessment, includes studies from Bahrain, Spain and Japan. In Chapter
One of this part, Al Jahromi highlights the fact that the assessment of speaking
is still posing some major problems to students, practitioners, and assessors.
Based on empirical data on the Bahraini context, the author critically reviews
the assessment of speaking and calls for a reconsideration of the assessment of
this skill. In Chapter Two, Shackleton addressed the positive ‘washback’ effect
in a listening proficiency test in the Spanish context. The author used a B2
listening exam to develop its construct validity, a think-aloud protocol, and a
retrospective interview to measure the construct validity of the test and whe-
ther the test measures what it is supposed to measure. Based on the analysis of
the planning and prediction strategies as well as the other research instruments,
the author maintained that the listening construct was fuzzy and that there are
serious threats to its construct validity. The author calls for the use of authentic
listening input to raise teachers’ awareness to order to target the assessment of
listening construct validity. The last chapter of this part deals with under-
standing the assessment of testing abilities in a socially interactive environment.
To do so, the author maintains this understanding necessitates the presence of
language assessment literacy for test designers. Also, the author criticizes the
role of standardized assessment in making our students creative in their socially
interactive contexts. Developing LAL is important there needs to be much
more work on training students and teachers so that they can contribute to
useful test development.
Where to go? 11
Conclusion
The main aim of assessment is to assess whether or not learning has occurred. It
is believed that the main objective of assessment is to find out how far the
learning involvements are essentially generating the anticipated outcomes.
Assessment primarily refers to the systematic way of collecting information with
the aims of making judgments or decisions about people. Shepard (2000)
pointed out that this viewpoint on learning is based on the notion that learning
is a procedure of gathering information and knowledge in separate pieces.
The main points in traditional assessment are certifying reliability and validity in
assessment instruments. These aspects are consistent with the goals of quantifying
and measuring learning, and collecting information. The fact that these aspects are
the concerns in assessing learners establishes the final grade as the product of tra-
ditional assessment (Brown, 2004). As pointed out by Alderson (2000), once
learning is characterized as a number, it is of utmost significance that the number
be as reliable and valid as possible, otherwise it has no meaning.
The philosophy of traditional assessment allows assessment instruments to rank
students against each other, which is termed as NRT. However, NRT can be
inaccurate or unreliable in some cases, and it is consequently regarded to be less
valid than CRT. The shift towards CRT is indicative of the necessity to improve
more precise representations of students’ learning than can be provided through
NRT.
References
Alderson, J. C. (2000). Assessing reading. Cambridge University Press.
Alderson, J. C. (1999, May). Testing is too important to be left to testers [Plenary address]. The
Third Annual Conference on Current Trends in English Language Testing. United
Arab Emirates University.
Alderson, J. C. (1996). The testing of reading. In C. Nuttall (Ed.), Teaching reading skill
in a foreign language (pp. 212–228). Heinemann.
Alderson, J. C., & Banerjee, J. (2001). Language testing and assessment (Part 1). Language
Teaching, 34(4), 213–236.
Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation.
Cambridge University Press.
Alderson, J.C., & Wall, D. (1993a). Does washback exist? Applied Linguistics, 14(2), 115–129.
Alderson, J. C., & Wall, D. (1993b). Examining washback: The Sri Lankan impact
study. Language Testing, 10(1), 41–69.
Brown, H. D. (2004). Language assessment: Principles and classroom practices. Pearson Education.
Cheng, L., Sun, Y., & Ma, J. (2015). Review of washback research literature within
Kane’s argument-based validation framework. Language Teaching, 48(4), 436–470.
Davies, A. (2014). 50 years of language assessment. In A. J. Kunnan. (Ed.), The compa-
nion to language assessment: Abilities, contexts and learners (pp. 3–21). Wiley Blackwell.
Davies, A. (2013). Native speakers and native users: Loss and gain. Cambridge University Press.
Davies, A. (2008). Textbook trends in teaching language testing. Language Testing, 25(3),
327–347.
12 Sahbi Hidri
Davies, A. (2003). Three heresies of language testing research. Language Testing, 20(4),
355–368.
Dunn, L. M., & Dunn, L. M. (1997). Peabody picture vocabulary test—III. American
Guidance Service.
Dunn, L., Morgan, C., O’Reilly, M., & Parry, S. (2004). The student assessment handbook.
Routledge Falmer.
Frederiksen, J. R., & Collins, A. (1989). A systems approach to educational testing.
Educational Researcher, 18(9), 27–32.
Fulcher, G. (2012). Assessment literacy for the language classroom. Language Assessment
Quarterly, 9(2), 113–132.
Fulcher, G., & Davidson, F. (2007). Language testing and assessment: An advanced resource
book. Routledge.
Hidri, S. (2016). Conceptions of assessment: Investigating what assessment means to
secondary and university teachers. Arab Journal of Applied Linguistics, 1(1), 19–43.
McNamara, T. (2001). Language assessment as social practice: Challenges for research.
Language Testing, 18(4), 333–349.
McNamara, T. (2000). Language testing. Oxford University Press.
McNamara, T., & Hill, K. (2011). Developing a comprehensive, empirically based
research framework for classroom-based assessment. Language Testing, 29(3), 395–420.
McNamara, T., & Roever, C. (2006). Language testing: The social dimension. Blackwell.
Messick, S. (1996). Validity and washback in language testing. Language Testing, 13(3),
241–256.
Nawab, A. (2012). Is it the way to teach language the way we teach language? English
language teaching in rural Pakistan. Academic Research International, 2(2), 696–705.
Poehner, M. (2008). Dynamic assessment: A Vygotskian approach to understanding and pro-
moting L2 development. Springer Science + Business Media.
Popham, W. J. (2006). All about accountability: A dose of assessment literacy. Improving
Professional Practice, 63(6), 84–85.
Popham, W. J. (2009). Assessment literacy for teachers: Faddish or fundamental? Theory
Into Practice, 48(1), 4–11.
Rea-Dickins, P. (2004). Understanding teachers as agents of assessment. Language Test-
ing, 21(3), 249–258.
Shepard, L. (2000). The role of assessment in a learning culture. Educational Researcher,
29(7), 4–14.
Shohamy, I. (2001). The power of tests: A critical perspective on the uses of language tests. Longman.
Taylor, L. (2009). Developing assessment literacy. Annual Review of Applied Linguistics,
29, 21–36. doi:10.1017/S0267190509090035.
Torrance, H., & Pryor, J. (2002). Investigating formative assessment: Teaching, learning, and
assessment in the classroom. Open University Press.
Vogt, K., & Tsagari, D. (2014). Assessment literacy of foreign language teachers: Find-
ings of a European study. Language Assessment Quarterly, 11(4), 374–402.
Wall, D. (2013). Washback in language assessment. In C. A. Chapelle (Ed.), The ency-
clopedia of applied linguistics. Blackwell Publishing Ltd.
Wall, D., & Alderson, J. C. (1996). Examining washback: The Sri Lankan impact study.
In A. Cumming & R. Berwick (Eds.), Validation in language testing (pp. 194–221).
Multilingual Matters.
Xu, Y., & Brown, G. T. L. (2016). Teacher assessment literacy in practice: A reconcep-
tualization. Teacher and Teacher Education, 58, 149–162. doi:10.1016/j.tate.2016.05.010.
Chapter 2
(2008) argued that assessment practices and assessment literacy have been
developed in the context of two conflicting conceptions of assessment. On the
one hand, there is the traditional, cognitivist approach, which reflects the
principles of positivism. This conception is materialised in high-stakes, standar-
dized tests and requires practitioners to have knowledge and skills which are
largely psychometric. For Inbar-Lourie, this conception, and the respective
practices, form the ‘testing culture’ (2008, p. 387). On the other hand, there is
a socio-cultural conception which adopts an interpretative, constructivist
approach to knowledge and assessment. According to the latter conception,
knowledge and assessment are not value-free (see also Taylor 2013, p. 411).
They are socially constructed under the influence of more or less dominant
epistemological assumptions, educational preconceptions, and social, political,
and cultural beliefs (2008, p. 387). This socially oriented conception, which
Inbar-Lourie calls ‘assessment culture’, requires stakeholders to be aware of the
contextual considerations and the social consequences of assessment and focus
on practices that promote learning (i.e., Assessment for Learning).
Insisting on the social aspect of assessment, and broadening the account of LAL
to include all possible stakeholders, Taylor (2009) suggested an extended and
more contextualized conceptualization of the term. As she points out, ‘training
for assessment literacy entails an appropriate balance of technical know-how,
practical skills, theoretical knowledge, and understanding of principles, but all
firmly contextualized within a sound understanding of the role and function of
assessment within education and society’ (Taylor 2009, p. 27). The contribution
of Taylor’s provisional framework is that it includes considerations of context
but, most importantly, it evokes the key notion of balance. According to Taylor
(2009, p. 27), the context and the role of the stakeholder in the assessment pro-
cess will determine the balance of knowledge in specific areas and the levels of
LAL that should be achieved. Therefore, although LAL should be developed for
all stakeholders, each stakeholder should acquire the amount of knowledge that
fits their role.
Fulcher (2012) addressed the contextualization of skills, knowledge, and
principles in his investigation on teachers’ levels of LAL. Drawing on previous
accounts, but also on his empirical findings on teachers’ perceived needs, Ful-
cher (2012, p. 125) provides the following definition of assessment literacy:
The advantage of this type of conceptualization is that it can reflect the fact that
LAL cannot be acquired as a block of knowledge all at once. The continuum
approach seems to describe more accurately the development of LAL. Moreover,
Pill and Harding’s model captures the fact that stakeholders in assessment have
different needs which in turn define the levels of assessment literacy that they
should reach. For instance, policy makers could fulfil their role by reaching the
‘functional level’ in Pill and Harding’s model, whilst teachers would most likely
need to acquire ‘multidimensional’ or, at least, ‘conceptual and procedural’ literacy
in order to engage in effective assessment practices.
Although innovative in design and insightful in distinguishing assessment
literacy levels according to stakeholders’ needs, Pill and Harding’s model
presents some critical problems. As the authors seem to acknowledge (2013,
p. 383), the exact meaning and the content of the proposed levels are rather
vague (see also Harding & Kremmel, 2016). Also, historical and social
dimensions of assessment appear peripheral in Pill and Harding’s model unlike
Brindley (2001a, 2001b), Inbar-Lourie (2008), and Scarino (2013) who
address contextual considerations and awareness of social consequences as
fundamental for the development of LAL.
An attempt to bridge component- and levels-based conceptualizations of
LAL have been presented in Taylor (2013). Taylor acknowledges that LAL
involves various stakeholders, not only teachers. She also takes into account
that the levels and areas of literacy vary with stakeholders’ roles and needs.
However, instead of matching areas of knowledge with levels of LAL (see Pill
& Harding, 2013), Taylor hypothesizes eight core dimensions of knowledge,
skills, and principles, and five degrees of literacy. These eight dimensions of
LAL are: knowledge of theory, technical skills, principles and concepts, lan-
guage pedagogy, sociocultural values, local practices, personal beliefs/attitudes,
scores, and decision making.
For the definition of the levels of literacy, Taylor follows Pill and Harding’s
(2013) model and assumes the possibilities of illiteracy, nominal illiteracy,
functional literacy, conceptual and procedural literacy, and multidimensional
literacy. According to Taylor’s model, stakeholders are expected to acquire a
specific level of literacy in each key dimension depending on their context and
needs (Taylor 2013, pp. 409–410). The proposed conceptualization allows us
to capture the fact that abilities in a knowledge area can be more or less
developed depending on each stakeholder’s specific needs. Of course, there
might be objections or modifications with respect to the content of the key
dimensions or the exact levels of literacy that are necessary for each group of
Concepts, challenges, and prospects 17
(through workshops, discussion groups and web resources), and the adoption of
a more holistic, contextual, and ethically grounded approach to the interpreta-
tion of language proficiency test scores.
Yan, Fan, and Zhang (2017) drew on data from semi-structured interviews
in order to provide LAL profiles for language teachers, language testers, and
graduate students in language studies programs in China. As their findings
suggest, assessment practices and training needs in China are highly con-
textualized and shaped by experiential factors which are different for each
stakeholder group.
Kim, Chapman, Wilmes, Cranely, and Boals (2017) illustrated a case of col-
laboration between educators and test developers for the creation of formative
language assessment tools in the US educational context. Their findings
revealed the benefits of dynamic collaboration with stakeholders – particularly
with educators and parents – in the development of valid language assessment.
The effects of dynamic collaboration were also presented by Harsch, Seyferth,
and Brandt (2017). In particular, Harsch et al. (2017) presented insights from
the eighteen first months of a long-term project in which teachers, coordina-
tors, and researchers developed their assessment literacy together. The aim of
the project was to investigate how the aforementioned stakeholder groups
bring their abilities, skills, and knowledge together, and how they learn with
and from each other.
A study that examined both teachers’ and language assessment specialists’ LAL
development is that of Baker and Riches (2017). The study was carried out over a
series of workshops on language assessment in 2013, where Haitian teachers
offered feedback to assessment specialists about draft examinations. The outcome
of these workshops was a revision of national English examinations which were
then presented to the Haitian Ministry of Education and Professional Training
(MENFP). Interestingly, the study found that teachers’ and specialists’ expertise
complement each other and there are still challenges to be addressed in colla-
borative decision making and consensus building among these stakeholders.
Finally, Kim, Chapman, and Wilmes (2017) studied various resources created to
enhance parents and educators’ assessment literacy and, more specifically, the
ability to interpret and use score reports.
that Iranian EFL teachers performed poorly in assessment practices and had
major misunderstandings about assessment, but felt well prepared for teaching
and assessing (see also Badia, 2015).
At the European level, the Hasselgreen et al. (2004) questionnaire was replicated
by Vogt and Tsagari in their 2014 study on the assessment literacy of EFL teachers
in seven countries in Europe who recruited 853 participants via questionnaires and
added a qualitative component in their research by conducting follow-up inter-
views with 63 of them. Findings from Vogt & Tsagari’s study confirmed teachers’
lack of assessment literacy and training (see also Tsagari & Vogt, 2017) particularly
in less traditional areas of assessment, as well as a need for development in the areas
of reliability, validity, and statistics, and an ability to critically evaluate the tests they
used. Additionally, Vogt and Tsagari (2014) point out that teachers, in their effort
to meet the requirements of their role, tend to resort to compensation strategies,
such as learning on the job (by observing colleagues and mentors) or testing as they
were tested (pp. 390–391). With the addition of a follow-up interview section,
Hasselgreen et al.’s (2004) questionnaire was also replicated by Kvasova and
Kavytska (2014) who conducted research on the LAL levels of Ukrainian EFL
teachers. Interestingly enough, Kvasova and Kavytska observed that Ukrainian
teachers also use the compensation strategies reported in Vogt and Tsagari (2014)
(i.e. learning on the job and assessing as they themselves were assessed).
In his investigation of assessment conceptions of Tunisian English language
teachers, Hidri (2016) found evidence of wrong and conflicting conceptions
about assessment. More recently, Berry, Munro, and Sheehan have presented
the results of a project which aimed to investigate the training needs, practices,
and beliefs of English language teachers in a wide variety of countries (see also
Berry & Sheehan, 2017; Berry, Sheehan, & Munro, 2017a; Sheehan & Munro,
2017). Drawing from semi-structured interviews, classroom observations, and
teachers’ written feedback on a LAL workshop, the researchers largely con-
firmed previous studies on teachers’ LAL levels and beliefs. More specifically, as
their data showed, English language teachers expressed a lack of knowledge in
assessment literacy, as well as a need for training in practical elements of
assessment and clear criteria in assessment. Supporting previous investigations
on the issue, Berry et al. (2017a) pointed out that teachers in their sample had a
testing-oriented conception of assessment. Interestingly enough, the majority of
the participants in the project were not confident about assessment.
Findings from a qualitative part of a large-scale study on EFL teachers’
assessment literacy levels, training, and needs were presented in Tsagari and
Vogt (2017). Using data from semi-structured interviews with primary and
secondary state school EFL teachers in Greece, Cyprus, and Germany, Tsagari
and Vogt investigated teachers’ perceptions of their own professional prepara-
tion, as well as teachers’ perceived training needs. Findings showed that EFL
teachers in the aforementioned educational contexts have low levels of LAL.
The majority of them said that they had not learnt anything (or learnt very
little) about language testing and assessment during their pre-service training.
Concepts, challenges, and prospects 21
Teachers also held fuzzy concepts about assessment, they tended to revert to
traditional assessment procedures, and their feedback procedures reflected a
deficit-oriented approach. Supporting findings from similar studies (see Vogt &
Tsagari, 2014; Kvasova & Kavytska, 2014), participants in Tsagari and Vogt’s
(2017) study followed the strategy of learning on the job, relying on mentor
colleagues and published materials. A very important finding in Tsagari and
Vogt’s research is that teachers were not able to clearly formulate their training
and professional needs.
The role of contextual factors in developing LAL, but also in exploring LAL
levels was examined in Xu and Brown’s (2017) study, which used an adapted
version of the Teacher Assessment Literacy Questionnaire to explore the LAL
levels of university English language teachers in China. They also explored the
possible interaction between LAL levels and demographic characteristics, such
as age, gender, professional title, qualification, and others. Drawing on data
collected from 891 participants, Xu and Brown’s study revealed that teachers in
Chinese universities have a very basic level of LAL while demographic factors do
not have a significant impact on teachers’ assessment literacy. The only factor that
seemed to influence LAL levels was the institution in which teachers worked.
However, the authors stress that these findings should not be considered as clear
evidence of the lack of interplay between contextual factors and LAL as this might
be due to the methodological design employed e.g. a questionnaire used which
was originally intended for a U.S. context 30 years ago was not appropriate for
capturing the particular contextual parameters of Chinese universities.
Assessment LAL levels in South America, more specifically in the Colombian
educational context, were also the focus of Hernández Ocampo (2017),
Restrepo and Jaramillo (2017), and Giraldo’s (2018) works. Villa Larenas (2017)
also explored LAL levels of Chilean EFL teacher trainers. More recently, Berry,
Sheenan, and Munro (2017b) collected data from interviews and classroom
observations in the course of a study aiming at exploring UK teachers’ attitudes
towards assessment as well as teachers’ perceived training needs.
Focusing on the Canadian educational context, Valeo and Barkaoui’s (2017)
research explored how English as a Second Language (ESL) teachers con-
ceptualize and conduct assessment in the ESL classroom and how teachers’
conceptions influence their decisions in designing and using writing assessment
tasks (see Valeo & Barkaoui, 2017; Barkaoui & Valeo, 2017). Valeo and Barkaoui’s
findings suggest that teachers hold varying conceptualizations about how to design
and select writing tasks. Using an expansion of Fulcher’s (2012) questionnaire,
Kremmel et al. (2017) presented a case about how teacher involvement in high-
stakes test development can contribute to the development of their LAL.
Focusing on the Turkish educational context, Mede and Atay (2017) exam-
ined the LAL levels of English-language university teachers in Turkey. Data
were collected from 350 participants who completed an adapted version of
Vogt and Tsagari’s (2014) questionnaire and from follow-up group interviews
with 34 participants. As Mede and Atay’s research showed, English language
22 Dina Tsagari
Methodological considerations
Methodological designs
The majority of research on LAL draws on quantitative and qualitative analytical
methods with an evident recent increase in the use of the latter. The use of
mixed methods is also very frequent by authors who attempt to combine the
validity of quantitative data analysis with the illuminating and clarifying force of
qualitative analytical tools (see Jeong, 2013).
The most popular instruments of quantitative approaches in LAL research are
questionnaires and surveys. In some works, authors designed, developed, and
administered original questionnaires (e.g. Brown & Bailey 2008; Hasselgreen et
al. 2004; O’Loughlin 2013). In other works, highly esteemed questionnaires
were replicated and adapted to the context and needs of particular research
projects (e.g. Jin, 2010; Kiomrs et al. 2011; Vogt & Tsagari 2014; Kvasova &
Kavytska 2014). The questionnaires used in the literature commonly consist of
closed-response items (e.g. Hasselgreen et al. 2004; Jin, 2010; Fulcher 2012;
O’Loughlin 2013), although combinations of both open- and closed-response
Concepts, challenges, and prospects 23
questions are not rare (e.g. Brown & Bailey 2008; Mazandarani & Troudi,
2017). Apart from cases in which questionnaires are the sole source of data
gathering and analysis (such as Hasselgreen et al. 2004; Fulcher 2012),
scholars often recruit subsets of questionnaire respondents for follow-up –
usually semi-structured – interviews (O’Loughlin, 2013; Jeong, 2013; Vogt
& Tsagari, 2014). In the latter case, the qualitative analysis of the informa-
tion gathered by interviews is meant to elaborate and clarify the quantitative
data offered by the questionnaire. Rarely, the purpose of collecting both
quantitative and qualitative data is materialized by the use of questionnaires
that include sections for question elaboration and other written comments
(see, for instance, Malone, 2013).
In general, scholars have made extensive use of interviews but often as a
supplement to other types of data. While in most cases these interviews were
conducted on an individual basis (see, for instance, O’Loughlin, 2013), group
interviews were also used in some methodological designs (e.g. Malone, 2013).
Compared to individual interviews, group interviews are considered to max-
imize interactions between participants. However, they always entail the risk of
failing to collect the interviewees’ actual beliefs, since participants in group
interviews might influence one another or conform their discourse to the
group’s (see Malone, 2013, pp. 334–335). These limitations might explain the
choice of some scholars to use private conversations instead of interviews as a
means to collect qualitative evidence for their research (e.g., Arkoudis &
O’Loughlin, 2004).
In the recent literature on LAL, there are few studies that drew exclusively
on data gathered from interviews. Deneen and Brown (2016) is one of them.
In most cases, interviews are used in addition to other data collection tools (e.g.
Gu (2014); Tsagari & Vogt (2017)
Empirical investigations on practitioners and stakeholders have offered
important contributions to the field, but they do not monopolize the relevant
research. Significant findings and insights have also been presented by literature
and textbooks reviews (Davies 2008; Allen & Negueruela-Azarola, 2010).
There are also other studies, such as position papers (Boud, 2000; Carless, 2007;
Popham, 2009; Stiggins, 2006, 2012; Scarino, 2017, among many others), and
assessment literacy course surveys and overviews (Brown & Bailey, 2008; Jin,
2010; Lam, 2015). Papers discussing processes of language assessment in practice
(see, for instance, Rea-Dickins, 2001, 2006; Gu, 2014) and the implementation
of state-wide assessment reforms (e.g. Davison, 2013; Hamp-Lyons, 2016) have
also played a significant role.
Case studies constitute the vast majority of research (see, among others,
Pill & Harding, 2013; Kvasova & Kavytska, 2014; Hidri, 2016; Gu, 2014).
Large-scale investigations on different countries are also common (Hassel-
green et al., 2004; Brown & Bailey, 2008; Vogt & Tsagari, 2014), while
comparative studies are significantly fewer (e.g., Davison, 2004; Cheng,
Rogers & Hu, 2004; East, 2015). In general, research in LAL has been
24 Dina Tsagari
Participants
The vast majority of empirical investigations on LAL elicited data from infor-
mants who were either foreign/second language teachers or testing and assess-
ment instructors. However, at times the participants’ professional identity is not
very clearly identified – especially in large-scale investigations. Thus, many
empirical studies drew on data from participants who were teaching foreign
languages at both secondary and university level (Fulcher, 2012; Vogt & Tsa-
gari, 2014; Mazandarani & Troudi, 2017). Similarly, some investigations
recruited assessment and testing instructors as participants, who were also lan-
guages instructors (e.g. Hasselgreen et al., 2004). In his work, East (2015)
clearly distinguishes participant groups (i.e. Australian EFL teachers teaching in
secondary schools only, grouped by subject language), but this is rather an
exception compared to the overall tendency. It is very common for researchers
to use the whole classroom context in order to collect information about
actual, classroom-based assessment practices. The vast majority of these works
focus on the teacher’s role in the assessment process (e.g. Rea-Dickins, 2001,
2006; Gu, 2014), while the students’ role gains some attention only in investi-
gations that focus on teacher–student interaction (see Leung & Mohan, 2004).
Remarkably, research on the assessment literacy of other stakeholders is limited.
An exception to this is Pill and Harding (2013) who used the transcripts of
thirteen hearings of the Australian House Standing Committee on Health in
order to investigate the misconceptions of policy makers about language test-
ing. O’Loughlin’s work (2013) also explored the assessment literacy levels of
university staff in two Australian universities.
Participants in LAL research were largely self-identified and self-volunteered.
In most cases, they were contacted or recruited through professional lists,
mailing lists, and social networks (e.g. Fulcher, 2012; Jeong, 2013; Kvasova &
Kavytska, 2014). Cases of classroom- or course-observation analyses can be
considered as an exemption to this tendency. Therefore, it is reasonable to
assume that the participation of a classroom (or any other educational unit) in
scientific investigations entails some degree of collaborative spirit at some level
of the educational administration.
Since a large amount of research has been conducted on the basis of
web-collected information (e.g. Kremmel & Harding, 2017, 2019), the
Concepts, challenges, and prospects 25
Conclusions
Based on the findings and the points of agreement in or divergence from
claims and research outcomes presented above, the literature on LAL does not
reflect an entirely optimistic view. Authors have repeatedly observed a gap
between the theoretical standards of LAL and actual language assessment
practices. Other scholars have explicitly expressed doubts about whether the
field of LAL has really evolved in recent research. Still, essential components
of the LAL framework need further research, and the promotion of LAL is
still under investigation.
There is no doubt that research on LAL will improve our theoretical
designs, and new frameworks are likely to appear in the future. These future
models will probably accommodate defects and problems of previous con-
ceptualizations. However, as the overview of conceptualizations shows, some
crucial aspects of LAL should be prioritized in future investigations because
LAL components and practices are not definite or clearly articulated. As
Inbar-Lourie (2016, 2017) observes, the field is characterised by absence of
the language trait from some of the definitions offered in the literature.
Therefore, further research should clarify the relation between assessment lit-
eracy and language assessment literacy (see Kremmel et al., 2017) as they are
commonly treated as freely interchangeable, which seems to be due to the
existing theoretical vagueness.
Nevertheless, when referring to the language trait, it seems that the field does
not have a clear definition of language (see also Kremmel et al., 2017). In her
studies, Scarino (2008, 2017) observes that the conception of the language
construct has changed through time. Language assessment has shifted from a
cognitivist approach to language to a more communicative approach and,
recently, to an intercultural approach. Of course, language can be approached
from different perspectives, and language teaching and assessment can focus on
different dimensions of language use, especially within multilingual contexts,
which have become a challenge for most educational contexts today (Schissel,
Leung, & Chalhoub-Deville, 2019). Future research should provide a clear
definition of language in a more holistic framework that incorporates useful
insights from all approaches to language.
26 Dina Tsagari
Scholars also need to conclude with a clear and generally acceptable definition
of LAL. A major trend in the field suggests that LAL is made up of certain com-
ponents. However, both the number and the classification of these components
are debatable. Taylor (2009), for instance, hypothesizes that LAL consists of eight
components, including principles and concepts, sociocultural values, local prac-
tices, and personal beliefs. An obvious problem with such a classification is that it is
not entirely clear where the line between the different components should be
drawn. On a theoretical level, it seems reasonable to distinguish between, for
instance, socio-cultural values and personal beliefs. Nevertheless, when it comes to
examinations of actual attitudes and practices, the distinction between the socio-
cultural pattern and personal behaviour does not seem that straightforward and
requires clear, supportive evidence. The point is that LAL conceptualizations
should not focus entirely on addressing theoretical requirements but should also
provide some framework that can incorporate real assessment performances.
Another challenge in the conceptualizations of LAL concerns the role of
context. More and more, authors acknowledge that contextual factors have a
significant effect on the development of LAL and the implementation of
assessment practices (e.g. Hill 2017a, 2017b; Tsagari, 2017). In the literature,
authors claim that language assessment can be affected by parameters, such as
class size and administrative requirements (e.g. Cheng et al., 2004). Although it
is undeniable that assessment practices as well as LAL development are affected
by contextual considerations, there is still a need for a clear definition of these
considerations.
In addition to the need for a more systematic and general study of context,
the literature also reveals a need for a more intensive investigation of con-
textual factors that until now have been widely overlooked. It is not until
very recently that scholars have started to investigate parameters such as the
demographics of language teachers (e.g. experienced vs novice teachers; see
Hildén & Fröjdendahl, 2018). Similarly, authors concentrated on studying
LAL in the context of teaching English, overlook assessment practices and
needs in the context of teaching other languages. The educational level factor
has been also somewhat neglected in studies. Authors tend to collect data and
suggest theoretical models without distinguishing among educational levels. This
seems to imply that teaching, learning, and assessment can be treated in a uni-
form way whether it refers to primary, secondary, or tertiary educational levels. It
is obvious, though, that each educational level has different aims and needs and
suggests a different context. Thus, future research should investigate these con-
texts separately and suggest ways to promote LAL according to the context and
needs of each educational level. On this view, the field of LAL should also
investigate the ways in which LAL could be better communicated to all levels
involved.
Another aspect of context that needs careful consideration is the professional
context of non-practitioner stakeholders, such as policy makers or administra-
tion officers. Lack of LAL might be a sign of limited professionalism for
Concepts, challenges, and prospects 27
language teachers, but this is not the case for administration officers or policy
makers. As a result, general conceptualizations of dimensions are provided for
non-practitioners on the basis of the practitioner’s context, such as professional
ethics, decision making, and attitudes, and if policy makers, for instance, have
the same professional ethics with teachers. While the findings of relevant
research revealed misconceptions of LAL among practitioners as well as limita-
tions in the implementation of assessment practices, the findings are usually
explained on the basis of practitioners’ training and professional development.
Thus, the practitioner’s professional perspective should promote LAL to these
groups, and non-practitioners should not be expected to adapt to practitioners’
needs and intentions.
The aforementioned considerations of suggested models and conceptualiza-
tions should be equally addressed in future efforts to promote LAL. For
instance, the importance of contextual factors should concern not only research
projects but also the training provided. Training in LAL should not be designed
and delivered according to some general conception of assessment literacy.
Instead, it should be formed with respect to the contextual parameters and the
needs of the target stakeholders’ group. Thus, teachers of primary education
should receive different training from university teachers. Similarly, training
programmes should be designed according to the educational context of each
country. A training programme designed for English language teachers in
China cannot be transferred and applied as is in the context of French language
teaching in Greece.
In addition to carefully designed training programmes, LAL should be
promoted through other means, such as web resources (e.g. the TALE
project, http://taleproject.eu), online tutorials, seminars, and workshops.
Again, these materials should reflect the contextual considerations that apply
to each stakeholder group and each educational environment. Ideally,
practitioners and other test users should be given the chance to practice and
experience testing and assessment processes in structured educational
opportunities. Research shows that language teachers tend to adopt assess-
ment practices through experience and by observing others (see Vogt &
Tsagari, 2014). Similarly, O’Loughlin (2013) suggests that LAL of university
administrative staff could be raised if the latter could actually take the test
they use. O’Loughlin’s proposal is worth examining in practice and even
generalized for other stakeholders, too.
Moreover, there is a lot yet to be learnt about the protagonists of assess-
ment – students and teachers, and how they enact assessment policy mandates
in their daily practices. Research should shed light on students’ perspective on
assessment practices (see Rea-Dickins 2001; Malone 2016, 2017; Tsagari,
2013). Also, if assessment is meant to be for learning, then LAL should pro-
vide some account of what constitutes evidence of language learning (see
Rea-Dickins, 2001).
28 Dina Tsagari
References
Allen, H. W., & Negueruela-Azarola, E. (2010). The professional development of
future professors of foreign languages: Looking back, looking forward. The Modern
Language Journal, 94(3), 377–395.
Antoniou, P., & James, M. (2014). Exploring formative assessment in primary school
classrooms: Developing a framework for actions and strategies. Educational Assessment,
Evaluation and Accountability, 26(2), 153–176.
Arkoudis, S., & O’Loughlin, K. (2004). Tensions between validity and outcomes: Tea-
cher assessment of written work of recently arrived immigrant ESL students. Language
Testing, 21(3), 284–304.
Badia, H. (2015). English language teachers’ ideology of ELT assessment literacy. Inter-
national Journal of Education & Literacy Studies, 3(4), 42–48.
Baker, B. A., & Riches, C. (2017). The development of EFL examinations in Haiti: Colla-
boration and language assessment literacy development. Language Testing, 35(4), 557–581.
Barkaoui, K., & Valeo, A. (2017). Designing L2 writing assessment tasks for the ESL class-
room: Teachers’ conceptions and practices [Conference paper]. The 39th Language Testing
Research Colloquium, Universidad de los Andes, Bogotá, Colombia.
Berry, V., & Sheehan, S. (2017). Exploring teachers’ language assessment literacy: A
social constructivist approach to understanding effective practice. In Learning and
assessment: Making the connections: Proceedings of the ALTE 6th International Conference.
http://events.cambridgeenglish.org/alte2017-test/perch/resources/alte-2017-procee
dings-final.pdf.
Berry, V., Sheehan, S., & Munro, S. (2017b). Mind the gap: Bringing teachers into the lan-
guage literacy debate [Conference paper]. The 39th Language Testing Research Collo-
quium, Universidad de los Andes, Bogotá, Colombia.
Berry, V., Sheehan, S., & Munro, S. (2017a). What do teachers really want to know about
assessment? [Conference paper]. The 51st Annual International IATEFL Conference.
Glasgow, United Kingdom.
Boud, D. (2000). Sustainable assessment: Rethinking assessment for the learning society.
Studies in Continuing Education, 22(2), 151–167.
Brindley, G. (2001a). Language assessment and professional development. In C. Elder,
A. Brown, K. Hill, N. Iwashita & T. Lumley (Eds.), Experimenting with uncertainty:
Essays in honour of Alan Davies (pp. 126–136). Cambridge University Press.
Brindley, G. (2001b). Outcomes-based assessment in practice: Some examples and
emerging insights. Language Testing, 18(4), 393–407.
Concepts, challenges, and prospects 29
Brown, J. D., & Bailey, K. M. (2008). Language testing courses: What are they in 2007?
Language Testing, 25(3), 349–383.
Carless, D. (2007). Learning-oriented assessment: Conceptual basis and practical impli-
cations. Journal of Innovations in Education and Teaching International, 44(1), 57–66.
Cheng, L., Rogers, T., & Hu, H. (2004). ESL/EFL instructors’ classroom assessment
practices: Purposes, methods, and procedures. Language Testing, 21(3), 360–389.
Cooke, S., Barnett, C., & Rossi, O. (2017). An evidence-based approach to generating the
language assessment literacy profiles of diverse stakeholder groups [Conference paper]. The 39th
Language Testing Research Colloquium, Universidad de los Andes, Bogotá,
Colombia.
Davies, A. (2008). Textbook trends in teaching language testing. Language Testing, 25(3),
327–347.
Davison, C. (2004). The contradictory culture of teacher-based assessment: ESL teacher
assessment practices in Australian and Hong Kong secondary schools. Language Test-
ing, 21(3), 305–334.
Davison, C. (2013). Innovation in assessment: Common misconceptions and problems.
In K. Hyland & L. L. C. Wond (Eds.), Innovation and change in English language edu-
cation (pp. 263–275). Routledge.
Deneen, C. C., & Brown, G. T. L. (2016). The impact of conceptions of assessment on
assessment literacy in a teacher education program. Cogent Education, 3(1),
doi:10.1080/2331186X.2016.1225380.
East, M. (2015). Coming to terms with innovative high-stakes assessment practice:
Teachers’ viewpoints on assessment reform. Language Testing, 32(1), 101–120.
Engelsen, K. S., & Smith, K. (2014). Assessment literacy. In C. Wyatt-Smith, V. Klenowski
& P. Colbert (Eds.), Designing assessment for quality learning (pp. 91–107). Springer
Netherlands.
Fulcher, G. (2012). Assessment literacy for the language classroom. Language Assessment
Quarterly, 9(2), 113–132.
Giraldo, F. (2018). Language assessment literacy: Implications for language teachers.
Profile: Issues in Teachers’ Professional Development, 20(1), 179–195.
Gu, P. Y. (2014). The unbearable lightness of the curriculum: What drives the assess-
ment practices of a teacher of English as Foreign Language in a Chinese secondary
school? Assessment in Education: Principles, Policy & Practice, 21(3), 286–305.
Hamp-Lyons, L. (2016). Implementing a learning-oriented approach within English lan-
guage assessment in Hong Kong schools: Practices, issues and complexities. In G. Yu &
Y. Jin (Eds.), Assessing Chinese learners of English (pp. 17–37). Palgrave MacMillan.
Harding, L., & Kremmel, B. (2016). Teacher assessment literacy and professional
development. In D. Tsagari & J. Banerjee (Eds.), Handbook of second language assess-
ment. Handbooks of Applied Linguistics (pp. 413–428). Mouton De Gruyter.
Harsch, C., Seyferth, S., & Brandt, A. (2017). Developing assessment literacy in a dynamic
collaborative project: What teachers, assessment coordinators, and assessment researchers can learn
from and with each other [Conference paper]. The 39th Language Testing Research Col-
loquium, Universidad de los Andes, Bogotá, Colombia.
Hasselgreen, A., Carlsen, C., & Helness, H. (2004). European survey of language testing and
assessment needs: Report: part one – general findings. European Association for Language
Testing and Assessment. http://www.ealta.eu.org/documents/resources/survey-rep
ort-pt1.pdf.
30 Dina Tsagari
Hernández Ocampo, S. P. (2017). How literate in language assessment should English teachers
be?[Conference paper]. The 39th Language Testing Research Colloquium, Universidad
de los Andes, Bogotá, Colombia.
Hidri, S. (2016). Conceptions of assessment: Investigating what assessment means to
secondary and university teachers. Arab Journal of Applied Linguistics, 1(1), 19–43.
Hildén, R., & Fröjdendahl, B. (2018). The dawn of assessment literacy – exploring the
conceptions of Finnish student teachers in foreign languages. Apples – Journal of
Applied Language Studies, 12(1), 1–24.
Hill, K. (2017a). Language teacher assessment literacy – scoping the territory. Papers in
Language Testing and Assessment, 6(1), iv–vii.
Hill, K. (2017b). Understanding classroom-based assessment practices: A precondition
for teacher assessment literacy. Papers in Language Testing and Assessment, 6(1), 1–17.
Hill, K., & McNamara, T. (2012). Developing a comprehensive, empirically based
research framework for classroom-based assessment. Language Testing, 29(3), 395–420.
Inbar-Lourie, O. (2008). Constructing a language assessment knowledge base: A focus
on language assessment courses. Language Testing, 25(3), 385–402.
Inbar-Lourie, O. (2017). Language assessment literacies and the language testing communities:
A mid-life identity crisis? [Conference paper]. The 39th Language Testing Research Col-
loquium, Universidad de los Andes, Bogotá, Colombia.
Inbar-Lourie, O. (2016). Language assessment literacy. In E. Shohamy, I. Or & S. May
(Eds.), Language testing and assessment. (pp. 257–270). Springer International Publishing.
Jeong, H. (2013). Defining assessment literacy: Is it different for language testers and
non-language testers? Language Testing, 30(3), 345–362.
Jin, Y. (2010). The place of language testing and assessment in the professional pre-
paration of foreign language teachers in China. Language Testing, 27(4), 555–584.
Kim, A. A., Chapman, M., & Wilmes, C. (2017). Developing materials to enhance the
assessment literacy of Parents and Educators of K-12 English language learners [Conference
paper]. The 39th Language Testing Research Colloquium, Universidad de los Andes,
Bogotá, Colombia.
Kim, A. A., Chapman, M., Wilmes, C., Cranley, M. E., & Boals, T. (2017). Validation
research of preschool language assessment for dual language learners: Collaboration between
educators and test developers [Conference paper]. The 39th Language Testing Research
Colloquium, Universidad de los Andes, Bogotá, Colombia.
Kiomrs, R., Abdolmehdi, R., & Naser, R. (2011). On the Interaction of test washback
and teacher assessment literacy: The case of Iranian EFL secondary schools teachers.
English Language Testing, 4(1), 156–160.
Kremmel, B., & Harding, L. (2017). Towards a comprehensive, empirical model of language
assessment literacy across different contexts [Conference paper]. The 39th Language Testing
Research Colloquium, Universidad de los Andes, Bogotá, Colombia.
Kremmel, B., & Harding, L. (2019). Towards a comprehensive, empirical model of
language assessment literacy across stakeholder groups: Developing the language
assessment literacy survey. Language Assessment Quarterly, 17(1), 100–120.
Kremmel, B., Eberharter, K., & Harding, L. (2017). Putting the ‘language’ into language
assessment literacy [Conference paper]. The 39th Language Testing Research Collo-
quium, Universidad de los Andes, Bogotá, Colombia.
Kvasova, O., & Kavytska, T. (2014). The assessment competence of university language
teachers: A Ukrainian perspective. Language Learning in Higher Education: Journal of the
European Confederation of Language Centres in Higher Education (CercleS), 4(1), 159–177.
Concepts, challenges, and prospects 31
Lam, R. (2015). Language assessment training in Hong Kong: Implications for language
assessment literacy. Language Testing, 32(2), 169–197.
Leung, T., & Mohan, B. (2004). Teacher formative assessment and talk in classroom
contexts: Assessment as discourse and assessment of discourse. Language Testing, 21(3),
335–359.
Malone, M. E. (2013). The essentials of assessment literacy: Contrasts between testers
and users. Language Testing, 30(3), 329–344.
Malone, M. E. (2017). Including student perspectives in language assessment literacy [Conference
paper]. The 39th Language Testing Research Colloquium, Universidad de los Andes,
Bogotá, Colombia.
Malone, M. E. (2016). Training in language assessment. In E. Shohamy, I. Or & S. May
(Eds.), Language testing and assessment (pp. 225–239). Springer International Publishing.
Mazandarani, O., & Troudi, S. (2017). Teacher evaluation: what counts as an effective
teacher? In Hidri S., & C. Coombe (Eds.), Evaluation in foreign language education in the
Middle East and North Africa (pp. 3–28). Springer International Publishing.
Mede, E., & Atay, D. (2017). English language teachers’ assessment literacy: The
Turkish context. DilDergisi, 168(1), 43–60.
O’Loughlin, K. (2013). Developing the assessment literacy of university proficiency test
users. Language Testing, 30(3), 363–380.
Pill, J., & Harding, L. (2013). Defining the language assessment literacy gap: Evidence
from a parliamentary inquiry. Language Testing, 30(3), 381–402.
Plake, B. S., & James, C. I. (1993). Teacher assessment literacy questionnaire. University of
Nebraska-Lincoln.
Popham, W. J. (2009). Assessment literacy for teachers: Faddish or fundamental? Theory
into Practice, 48(1), 4–11.
Rea-Dickins, P. (2006). Currents and eddies in the discourse of assessment: A learning-
focused interpretation. International Journal of Applied Linguistics, 16(2), 163–188.
Rea-Dickins, P. (2001). Mirror, mirror on the wall: Identifying processes of classroom
assessment. Language Testing, 18(4), 429–462.
Restrepo, E., & Jaramillo, D. (2017). Preservice teachers’ language assessment literacy devel-
opment [Conference paper]. The 39th Language Testing Research Colloquium, Uni-
versidad de los Andes, Bogotá, Colombia.
Scarino, A. (2017). Developing assessment literacy of teachers of languages: A conceptual
and interpretive challenge. Papers in Language Testing and Assessment, 6(1), 18–40.
Scarino, A. (2013). Language assessment literacy as self-awareness: Understanding the role
of interpretation in assessment and teacher learning. Language Testing, 30(3), 309–327.
Scarino, A. (2008). The role of assessment in policy-making for languages education in
Australian schools: A struggle for legitimacy and diversity. Issues in Language Planning,
9(3), 344–362.
Schissel, L. J., Leung, C., & Chalhoub-Deville, M. (2019). The construct of multi-
lingualism in language testing. Language Assessment Quarterly, 6(4–5), 373–378.
Sheehan, S., & Munro, S. (2017). Assessment: attitudes, practices and needs: Project report.
British Council. https://www.teachingenglish.org.uk/sites/teacheng/files/pub_
G239_ELTRA_Sheenan%20and%20Munro_FINAL_web%20v2.pdf
Stiggins, R. (1991). Assessment literacy. Phi Delta Kappa, 72(7), 534–539.
Stiggins, R. (2006). Assessment for learning: A key to motivation and achievement.
Edge: The Latest Information for the Education Practitioner, 2(2), 1–19.
32 Dina Tsagari
Stiggins, R. (2012). Classroom assessment competence: The foundation of good teaching. http://
images.pearsonassessments.com/images/NES_Publications/2012_04Stiggins.pdf
Taylor, L. (2013). Communicating the theory, practice and principles of language test-
ing to test stakeholders: Some reflections. Language Testing, 30(3), 403–412.
Taylor, L. (2009). Developing assessment literacy. Annual Review of Applied Linguistics,
29, 21–36.
Tsagari, D. (2013). EFL students’ perceptions of assessment in higher education. In D.
Tsagari, S. Papadima-Sophocleous & S. Ioannou-Georgiou (Eds.), International
experiences in language testing and assessment (pp. 117–143). Peter Lang.
Tsagari, D. (2017). The importance of contextualizing language assessment literacy [Conference
paper]. The 39th Language Testing Research Colloquium, Universidad de los Andes,
Bogotá, Colombia.
Tsagari, D., & Vogt, K. (2017). Assessment literacy of foreign language teachers around
Europe: Research, challenges and future prospects. Papers in Language Testing and
Assessment, 6(1), 41–63.
Valeo, A., & Barkaoui, K. (2017). How teachers’ conceptions mediate their L2 writing assessment
practices: Case studies of ESL teachers across three contexts [Conference paper]. The 39th Lan-
guage Testing Research Colloquium, Universidad de los Andes, Bogotá, Colombia.
Villa Larenas, S. (2017). Language assessment literacy of EFL teacher trainers [Conference
paper]. The 39th Language Testing Research Colloquium, Universidad de los Andes,
Bogotá, Colombia.
Vogt, K., & Tsagari, D. (2014). Assessment literacy of foreign language teachers: Find-
ings of a European study. Language Assessment Quarterly, 11(4), 374–402.
Xu, Y., & Brown, G. T. L. (2016). Teacher assessment literacy in practice: A recon-
ceptualization. Teaching and Teacher Education, 58, 149–162.
Xu, Y., & Brown, G. T. L. (2017). University English teacher assessment literacy: A
survey-test report from China. Papers in Language Testing and Assessment, 6(1), 133–158.
Yan, X., Zhang, C., & Fan, J. J. (2018). ‘Assessment knowledge is important, but…’:
How contextual and experiential factors mediate assessment practice and training
needs of language teachers. System, 74, 158–168. doi:10.1016/j.system.2018.03.003.
Yan, X., Fan, J. J., & Zhang, C. (2017). Understanding language assessment literacy profiles of
different stakeholder groups in China: The importance of contextual and experiential factors
[Conference paper]. The 39th Language Testing Research Colloquium, Universidad de
los Andes, Bogotá, Colombia.
Yastıbaş, A. E., & Takkaç, M. (2018). Understanding language assessment literacy: Devel-
oping language assessment. Journal of Language and Linguistic Studies, 14(1), 178–193.
Chapter 3
Introduction
The field of language testing (also termed language assessment in this chapter and
other work [e.g. O’Sullivan, 2011]) involves the process of designing language
tests, testing students, and using this data for evaluation and decision-making pur-
poses (Davies, Brown, Elder & Hill, 1999). Language testing enjoys a rich, com-
plex, and often misinterpreted history, with tests simply defined as instruments that
elicit certain behaviour from candidates whereby this behaviour is used to make
inferences about a candidate’s language ability (Carroll, cited in Bachman, 1990).
These inferences are often reflected in a numerical score which is used against a
benchmark level to set entry into higher education, training or employment
opportunities, and to govern immigration to an often-English-speaking country
(Shohamy, 1998, 2001a). These inferences are facilitated by using standardized
tests where test administration, content, format, language, and scoring procedures
are equal for all test takers. This allows scores across test populations to be easily
compared (Popham, cited in Menken, 2008).
The history of language testing has been mapped according to different time
periods and ‘waves of scholarship’ including Spolsky (1976) and Davies’ (1978) three
stages: pre-scientific, psychometric–structural, and psycholinguistic–sociolinguistic as
well as Morrow’s (1979) time period classification of the same eras: the ‘Garden of
Eden’, the ‘Vale of Tears’, and the ‘Promised Land’. Shohamy’s (1996) distinct five-
stage categorization is partly guided by test task typologies: discrete-point, inte-
grative, communicative, performance testing, and alternative assessment, which span
more than a century of testing practices. These waves are steeped in economic,
social, and political influences that steer the direction of testing in mainstream edu-
cation. Weir, Vidakovic, and Galaczi (2013) summarize how tests have always been
gatekeeping tools to prevent mass immigration such as the US immigration that
took place in the post-world war years, while Spolsky (2008) highlights how the
Chinese first introduced formal selection testing for elite government positions and
34 Lee McCallum
this later transcended to education in Europe with France, Italy, and the UK using
tests to decide entry into higher education.
Given these uses, it is important to recognize that the last two waves of scho-
larship of psychometric–structural and psycholinguistic–sociolinguistic play a key
role in the understanding, promotion, use of tests, and the desire to change them
(Fulcher, 2000). The field of psychometrics is viewed as the cornerstone of ‘tradi-
tional’ testing with its focus on objectively measuring mental traits such as language
ability, whereas the increasingly influential psycholinguistic–sociolinguistic wave
champions the need for fairer communicative testing that is more socially aware,
ethical, and grounded in ‘re-humanizing’ the testing process (Fulcher, 2000).
This chapter acknowledges the vast history of testing, yet does not strive to
simply remap it. Instead, it follows other theoretically motivated work such as
Alderson and Banerjee (2001) by presenting an overview of the landscape of tra-
ditional testing under two broad principled sections that cover pertinent issues in
testing. The two principled sections – the reliance on statistically robust psycho-
metric scoring practices to meet the aim of selecting the highest scoring students for
entry into higher education, and beliefs that ‘Standard English’ is the testing model
to be followed – help outline traditional testing’s key tenets and shape the chapter
in a logical manner. These sections will also refer to task types and how they facil-
itate testing goals. The chapter will also examine the same two principled sections
through the lens of Critical Language Testing (CLT) to illuminate how, by situat-
ing itself in critical theory and critical applied linguistics, CLT offers alternative
views that are more concerned with prioritizing test takers than the scientific mus-
ings that traditional testing offers. CLT recognizes the power that tests yield over
test takers and aims to promote more interpretive, open scoring procedures which
call for the inclusion of local varieties of English in language testing.
In examining these tenets, the chapter focuses on providing theoretical and
empirical research evidence from the narrow context of English for Academic
Purposes (EAP), which involves the teaching and testing of academic English that
is needed for tertiary level study. It is hoped that such an analysis can contribute
to the wider literature on language assessment literacy and help illuminate task
type options to teachers and test designers who need to include a range of task
types in their assessments to capture the range of skills they need to test.
easily quantifiable and form the basis of key decision-making processes (Spolsky,
1976). Psychometrics, and its reliance on statistics, has played a pivotal role in
language assessment since the late 1950s coinciding with and being influenced by
structural linguistics. This influence continues today with standardized tests oper-
ating at all levels of education in different countries including China (see Jenkins
(2014) for an overview of China’s ‘Gaokao’ high school exit exam that determines
entry to study opportunities), and in the UK, US, and Australia with proficiency
exams such as IELTS (International English Language Testing System) and
TOEFL (Test of English as Foreign Language) required to gain entry into higher
education (Weir et al., 2013).
Fulcher (2010, 2014) places this historical reliance on psychometrics in a
wider sphere of testing being viewed as a natural science and having its
roots, in hard, generalizable, objective positivism and realism whereby lan-
guage ability was measurable and able to be isolated from the person pro-
ducing it. This means objective tests were supported for their purity in
measuring a single construct well and in achieving high levels of validity
and reliability as well as for being capable of being objectively scored and
administered to large populations (Spolsky, 1994). Moss, Pullin, Gee, and
Haerbel (2005) further outline the underlying goal of psychometrics, and
thus traditional testing, in that they seek to develop interpretations that are
generalizable across individuals and contexts, and to understand the limits of
those generalizations. In seeking out these generalizations, interpretations
characterize groups of individuals who score the same in the trait being
tested. This stance also highlights how test scores are interpreted, with
increasing test scores symbolizing proof of educational gains in knowledge
(Moss et al., 2005). However, this interpretation of scores remains dubious
because it disregards how learners are taught or prepared to answer ques-
tions and perform tasks on the knowledge appearing in the test (Miller &
Legg, 1993). This condenses curricula and means that while learners ‘appear’
to gain knowledge, this is somewhat artificially gained from a narrow
knowledge base that is decided by the test, which has in turn been decided
by those tasked with making selection decisions (Shohamy, 1996). This base
is of largely receptive knowledge because tasks are designed to be uniform
and easy to assess, and to require a single answer through their mediums of
gap-fill, multiple-choice questions (MCQs) and true–false or matching
exercises (Linn, 2000, cited in Moss et al., 2005). Figure 3.1 below shows a
typical multiple-choice task whereby students are expected to choose a
single correct answer.
The task in Figure 3.1 closely matches the tenets of the traditional paradigm
in language testing via using a task which has a limited number of possibilities
and which strengthens the focus on a particular kind of knowledge by asking
students to choose only one correct answer. This task achieves strong reliability
in grading as answers are prescribed in the shape of an answer key. These types
of tasks are a frequent occurrence in large-scale proficiency examinations which
36 Lee McCallum
Questions 10 – 12
Choose the appropriate letters A, B, C or D.
Write your answers in boxes 10–12 on your answer sheet.
10. Research completed in 1982 found that in the United States soil erosion
A farm incomes
B use of fertiliser
C over-stocking
D farm diversification
Figure 3.1: Academic Reading Multiple Choice Task. Task taken from: https://www.ielts.org/-/
media/pdfs/academic_reading_sample_task_multiple_choice.ashx?la=en
serve large numbers of students and institutions, and are often marketed as tests
that determine proficiency for a candidate’s suitability to undertake academic
study or skilled employment.
A further paradigmatic tenet of this reliance on psychometrics stems from
the belief that the scores adhere to normal distribution where the majority of
scores pool at the mean and other scores deviate from the mean to create a
bell-shaped curve (Douglas, 2010). This normal distribution pattern is used to
determine access to resources such as higher education, with scores benchmarked
against a cut-off score that determines pass or fail decisions and, ultimately, the
benefits available to the test taker (Spolsky, 1995). Shohamy (2001a) explains that
assessment stakeholders perceive these scores as objective, legitimate, and a mark
of achievement; however, several scholars have appreciated that scores can have
serious consequences for test takers with Cattell, cited in Spolsky (1995), recog-
nizing the serious decisions being made from these scores, and that the testing
community has a responsibility to ensure scores are reliable. This stance was also
taken by Edgeworth (1888); however, Edgeworth’s (1888) view on ensuring
Traditional and alternative assessment 37
reliability lay in rater reliability with the concept of ‘reliability’ chiefly concerned
with the measurement consistency of scoring practices (Carr, 2011).
These concerns, whether socially guided or purely statistically investigated,
highlight issues that were also recognized and investigated by Thorndike (1904)
in the early 20th century, and in Cambridge and Oxford University and
UCLES (University of Cambridge Local Examinations Syndicate) circles at
much later dates (Weir et al., 2013). Thorndike (1904), cited in Spolsky,
(1995), also recognized that fairness to test takers meant scoring questions to
reflect their level of difficulty, and therefore weighted scoring meant that the
candidates answering more challenging questions received more marks than
those answering simpler questions.
In keeping with a focus on the test and its target inferences, traditional test-
ing also supports a narrow conceptualization of validity, with Messick’s (1989)
content and construct validity receiving extensive attention from testers. Con-
tent validity addresses how representative the test is as a sample of a course’s
syllabus, whereas construct validity is an evaluation of how well the test’s scores
reflect what the test claims to be measuring (Davies, et al., 1999).
In EAP, content and construct validity have been traditionally governed
and driven by a needs analysis of learners whereby, as Fulcher (1999) indi-
cates, ‘content validity’ is taken to be representing the course students were
taught, and ‘construct validity’ ensures that the test examines what skills it
claims to be designed to test, while also ensuring that it tests the skills the
course aimed to develop. It is of utmost importance that the testing com-
munity, including instructors, realize that reliability and validity are not
absolute qualities, and that the two tenets operate on a continuum, much like
discrete-integrative/communicative whereby neither will ever be absolute
properties. It is equally important to realize that the political, social, eco-
nomic, and cultural terrains that tests operate under contribute to balancing
this continuum (McNamara, 2001). This narrow, traditional view of validity
does not consider the social consequences of the test. Thus, in combination
with reliability, a narrow range of single answer discrete test items, and a
narrow scoring scale, testing therefore views language proficiency as a single
unitary construct that can be isolated from human, social, and test adminis-
tration factors. In achieving a single measurable construct, traditional testing
supports validity and reliability practices that are geared towards ensuring that
the test document allows testers to make the types of inferences they aim to
make. This approach, Shohamy (2004) argues, is still embedded in a mea-
surement lens because it further distinguishes between those with lower and
higher levels of predetermined knowledge. These practices, including the
weighting of items, still rely on the design and selection of items that elicit a
single correct answer, signalling that as a unidimensional construct, language
proficiency can be isolated and measured as a single trait with the right tools
(Bachman, 2000).
38 Lee McCallum
This task tests candidates’ knowledge of word class with candidates required to change
the word given on the left to fit the passage.
For questions 17 – 24, read the text below. Use the word given in capitals at the end of
some of the lines to form a word that fits in the gap in the same line. There is an
example at the beginning (0).
Write your answers IN CAPITAL LETTERS on the separate answer sheet.
Example: 0 M E M O R A B L E
_____________________________________________________________________
National Bike Week was celebrated last week in a (0) … … …. way with a
Family Fun Day in Larkside Park. (MEMORY)
The event (17) … … …. to be highlysuccessful with over five hundred people attending.
(PROOF)
Larkside Cycling Club brought along a (18) … … …. of different bikes to
(VARY)
demonstrate the (19) … … …. that family members of all ages can get from (ENJOY)
group cycling. Basic cycling (20) … … …. was taught using conventional bikes. (SAFE)
There were also some rather (21) … … …. bikes on display. (USUAL)
One-wheelers, fivewheelersand even one which could carry up to six (22) … … (RIDE)
were used forfun.
The club also gave information on how cycling can help to reduce (23) … … … damage.
(ENVIRONMENT)
They also provided (24) … … …. as to how people could substitute the
bike for the car for daily journeys. (SUGGEST)
The overall message was that cycling is great family fun and an excellent alternative to
driving. By the end of the day over a hundred people had signed up for membership.
Figure 3.2: First Certificate in English: Use of English Task. Adapted from: http://www.cam-
bridgeenglish.org/exams/first/preparation/
In this respect, Fulcher notes how a scale such as the TOEFL or Common
European Framework of Reference (CEFR) can be misused because teachers
come to view the scale as a resource to prescriptively judge learners as well as to
influence curriculum development that reflects what the scale sees as signalling
a higher proficiency grade. The proficiency scales also help shape knowledge
by indicating, for example, linguistic features at each proficiency level, organi-
zation patterns, expected discourse markers, and also the development of ideas,
meaning the writer changes or molds their response to match these criteria
(Hawkins & Filipovic, 2012). In an international context, this means learners’
writing is forced to change style from the rhetoric and discourse style of their
L1 to the discourse of the L2, and learners are often prepared for these changes
in the form of IELTS exam preparation classes or freshman composition classes
at university (Kachru, 2006).
40 Lee McCallum
behaviour that the test elicits to gain access. It is these beliefs that lead the call
for change in the testing world and that achieve a fairer system that equalizes
and better distributes the current stratified sharing of resources and access to
benefits (Lynch, 2001). It is important to realize that in forming a response to
traditional testing, CLT strives for fair and equal testing opportunities under a
critical theory framework, yet it does not advocate eradicating traditional test-
ing. A fundamental consideration is that traditional test approaches have an
appropriate use; however, CLT’s objective is to highlight that neglect of other
approaches is undemocratic and thus calls for testing to be dialogic in nature
where all parties involved in testing have a voice (Shohamy, 2001a). It is also
important to clarify that CLT recognizes that through dialogue a harmonized
medium that balances interests can be achieved, and change can take place
(Trede & Higgs, 2010). This section of the chapter further explores how the
tenets of traditional testing are viewed under this framework.
Shohamy (2001b) equally outlines how tests originally aimed to provide access
to services for all regardless of entitlement. However, Shohamy (2001b) argues
that these tests have failed to shake off their overarching selection purpose
meaning classroom teaching is forced, consciously or unconsciously by the
school’s management, teachers, and would be selected students, to centre on
ensuring students pass the test and receive the associated pass grade benefits. In
this sense, Menken (2008) highlights how tests become the centralized lan-
guage policy that dictate, from top-down, what content is taught, how, and by
whom it can be taught, and in what language it is best taught. This is also
indicated in the work of Hamp-Lyons (1998) at the international level with the
TOEFL test dictating curriculum in schools, universities, and academies.
Shohamy (2001b) explains how CLT seeks to change the top-down practice
that has been created by traditional testing. This change places test takers at the
heart of the testing process, and seeks to give them an active role in that process,
and for power to be redistributed more fairly to reflect this new balance. Sho-
hamy (2001b) discusses how CLT invites stakeholders – including teachers and
test takers – to debate and confront the roles tests play in shaping instruction,
access to education, and the creation of ‘new’ knowledge. These views are also
stated by Darling-Hammond (1994) who sees a need for testing to move from a
sorting tool to a developmental aid that supports learners and has greater appre-
ciation for individuals’ unique knowledge that is often disregarded in traditional
testing in favour of dominant knowledge that those in power have deemed
important and therefore testable (Shohamy, 2001b). Shohamy (1998) highlights
how critical perspectives view the testing process as a non-neutral practice that is
ideologically laden with the values of those in power, while Messick (1989) and
Alderson and Banerjee (2001) similarly note that tests contain values that are
psychologically, socially, economically, and politically guided, with testing deci-
sions reflecting all of these values through the physical test document. Noam,
cited in Shohamy (1998), also clarifies how these factors merge to shape learners’
beliefs about knowledge, learning, and success, with learners believing that suc-
cess equals mastering test knowledge.
The stance of critical theory is further influenced by constructivism whereby
there is a need to peel away the surface and uncover power. Constructivists
believe that those in power decide what knowledge is valuable and whether it
will be tested. These issues of the powerful deciding knowledge are explained by
Foucault, cited in Benesch (2001), as being ever present. In testing, the power
imbalance between stakeholders such as teachers, test designers, and test users
such as institutions and test takers, is a ‘self-sustaining’ system where test designers
have total non-negotiable control over the knowledge input (Shohamy, 2001a).
A fundamental concept in understanding the power relations that exist in testing
is Bourdieu’s (1991) symbolic power, which specifies that power relations con-
tinue to exist and are maintained because the party granting the power believes
that the power exists; it is willing to give the other party power, and to allow the
other party to exercise its dominance.
44 Lee McCallum
these tasks are robustly designed and developed to ensure fairness, illustrate face
validity, and guard against grades that are too subjective and cannot be justified.
A sample alternative task is presented below in Figure 3.3:
Overview of task
In groups, learners design a brochure for the Hong Kong Tourism Board describing 4
attractions in Hong Kong which would appeal to young people of their own age.
Task guidelines for learners
(a) Task fulfilment: would your selected sites appeal to young people?
(b) Accuracy of language and information provided: Is the brochure written in good
English? Is the information provided accurate?
(c) Attractiveness of final written submission
Figure 3.3: Sample collaborative writing task (Adapted from the Curriculum and Development
Institute (2005) and Douglas (2010)).
The sample task in Figure 3.3 represents a possible task that is suitable for a
pre-university EAP course that focuses on a specific genre; it also raises aware-
ness of audience an integrates interpersonal affective skills such as communica-
tion and critical thinking. The task also requires prolonged engagement with
these skills and the opportunity to respond and react to teacher feedback.
46 Lee McCallum
Brown and Hudson (1998) suggest that these tasks can become fairer and allow
learners a louder voice by negotiating assessment criteria and giving learners a
say in the important elements of the task. In this respect, other collaborative
tasks that can form the basis for assessment may include jigsaw reading and
writing tasks (e.g. Esnawy, 2016) as well as tasks that allow students to compare
their experiences on individual, pair, and group work (e.g. Bhowmik, Hillman,
& Roy, 2018). Unlike the previous two tasks, this task allows negotiation of
meaning and allows students to produce freer examples of written language, as
opposed to the restricted output in Figure 3.2, and the focus on recognition of
the correct answer in Figure 3.1.
Their socially constructed nature means they are labelled with connotations of
powerful native norms seen as ‘legitimate’, and non-native norms seen as ‘ille-
gitimate’, or in the words of Quirk (1990), ‘quackery’. Hamid and Baldauf
(2013) also note that non-native norms are also seen as ‘deficit forms’, or, in
some cases, ‘interlanguage’, and are not recognized under traditional Second
Language Acquisition (SLA) theories as legitimate. However, Groves (2010)
refutes suggestions that these norms are ‘interlanguage’ by reminding us that the
interlanguage concept arose from application to individual learners and not
whole social communities, and alongside Kirkpatrick & Deterding (2011) he
points out that non-native norms, such as the practice of placing the topic at
the front of the sentence, are far-reaching and could be spread and shared across
more than one geographical area. Kirkpatrick and Deterding (2011) and Kim
(2006) cement this valid point by arguing that since more non-native speakers
shape the language than native speakers, their use of the language should be
considered in testing language use and ability.
Conclusion
This chapter has outlined the key theoretical tenets of traditional and critical lan-
guage testing and provided examples of EAP-relevant task assessments. Within this
broad discussion, a number of historical and contemporary assessment terms and
trajectories were set out to engage with the key understandings of language
assessment we need as practitioners. The chapter presented and discussed tasks that
typify these different understandings and it seeks to encourage these discussions to
continue at both global and local levels. At a global level, there is a need to
examine differences across international tests to find common and divergent task
differences, and how these relate back to the skills we perceive as fundamental to
study in higher education. At a local level, there is also a need to examine tasks that
play a role in shaping local assessment practices and how these tasks align with EAP
curriculum goals (e.g. Rauf & McCallum, in press).
References
Alderson, J. C., & Banerjee, J. (2001). Language testing and assessment (Part 1). Language
Teaching, 34, 213–236.
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford University
Press.
Bachman, L. F. (2000). Modern language testing at the turn of the century: Assuring
that what we count counts. Language Testing, 17(1), 1–42.
Benesch, S. (2001). Critical English for academic purposes: Theory, politics and practice.
Lawrence Erlbaum Associates.
Bhowmik, S. K., Hillman, B., & Roy, S. (2018). Peer collaborative writing in the EAP
classroom: Insights from a Canadian postsecondary context. TESOL Journal,
doi:10.1002/tesj.393.
Bourdieu, P. (1991). Language and symbolic power. Polity.
48 Lee McCallum
Bridgeman, B., & Carlson, S. (1983). A survey of academic writing tasks required of
graduate and undergraduate foreign students. TOEFL Research Report 15. Educational
Testing Service.
Brown, J. D., & Hudson, T. D. (1998). The alternatives in language assessment. TESOL
Quarterly, 32, 653–675.
Canagarajah, A. S. (2006). Changing communicative needs, revised assessment objec-
tives: Testing English as an international language. Language Assessment Quarterly, 3(3),
229–242.
Canagarajah, A. S. (2016). TESOL as a professional community: A half-century of
pedagogy, research and theory. TESOL Quarterly, 50(1), 7–41.
Carr, N. T. (2011). Designing and analysing language tests. Oxford University Press.
Carroll, J. B. (1968). The psychology of language testing. In A. Davies (Ed.), Language
testing symposium: A psycholinguistic approach (pp. 46–69). Oxford University Press.
Cattell, J. M. (1890). Mental tests and measurements. Mind, 15, 373–381.
Chomsky, N. (1965). Aspects of the theory of syntax. MIT Press.
Curriculum and Development Institute (2005). Task-based assessment for English language
learning at secondary level. Education and Manpower Bureau. https://cd1.edb.hkedcity.
net/cd/eng/TBA_Eng_Sec/pdf/part2_Task1.pdf
Darling-Hammond, L. (1994). Performance-based assessment and educational equity.
Harvard Educational Review, 64, 5–30.
Davies, A. (2014). 50 years of language assessment. In A. J. Kunnan (Ed.), The companion
to language assessment: Abilities, contexts and learners (pp. 3–21). Wiley Blackwell.
Davies, A. (1978). Language testing survey article. Part 1. Language Teaching and Lin-
guistics Abstracts, 113(4),145–159.
Davies, A. (2013). Native speakers and native users: Loss and gain. Cambridge University
Press.
Davies, A. (2003). Three heresies of language testing research. Language Testing, 20(4),
355–368.
Davies, A., Brown, A., Elder, C., & Hill, K. (1999). Dictionary of language testing. Cambridge
University Press.
Douglas, D. (2010). Understanding language testing. Hodder Education.
Edgeworth, F. Y. (1888). The statistics of examinations. Journal of the Royal Statistical
Society, LI, 599–635.
Esnawy, S. (2016). EFL/EAP reading and research essay writing using jigsaw. Procedia –
Social and Behavioral Sciences, 232, 98–101.
Cambridge University. First certificate English: Use of English paper: Part 3 task. https://
www.gettinenglish.com/wp-content/uploads/2014/07/cambridge-english-first-ha
ndbook-2015.pdf
Foucault, M. (1980). Power/knowledge: Selected interviews and other writings: 1972–1977.
Pantheon.
Freeborn, D. (2006). From Old English to Standard English: A course book in language var-
iation across time (3rd ed). Palgrave Macmillan.
Freire, P. (1996). Pedagogy of the oppressed. Penguin.
Fulcher, G. (1999). Assessment in English for Academic Purposes: Putting content
validity in its place. Applied Linguistics, 20(2), 221–236.
Fulcher, G. (2000). The communicative legacy in language testing. System, 28, 483–497.
Traditional and alternative assessment 49
Fulcher, G. (2014). Philosophy and language testing. In A.J. Kunnan (Ed.), The compa-
nion to language assessment: Evaluation, methodology, and interdisciplinary themes. (pp.
1434–1451). Wiley Blackwell.
Fulcher, G. (2010). Practical language testing. Hodder Education/Routledge.
Groves, J. (2010). Error or feature? The issue of interlanguage and deviations in non-
native varieties of English. HKBU Papers in Applied Language Studies, 14, 108–129.
Hamid, O. M. (2014). World Englishes in international proficiency tests. World Eng-
lishes, 33(2), 263–277.
Hamid, O. M., & Baldauf, R. B. (2013). Second language errors and features of world
Englishes. World Englishes, 32(4), 476–494.
Hamp-Lyons, L. (1998). Ethical test preparation practice: The case of the TOEFL.
TESOL Quarterly, 32, 329–337.
Hawkins, J. A., & Filipovic, L. (2012). Criterial features in L2 English: Specifying the refer-
ence levels of the common European framework. Cambridge University Press.
Hickey, R. (2015). (Ed.). Standards of English: codified varieties around the world. Cambridge
University Press.
Huhta, A. (2007). Diagnostic and formative assessment. In. B. Spolsky & F. M. Hult
(Eds.), The Handbook of educational linguistics (pp. 469–482). Wiley-Blackwell.
International English Language Testing System (IELTS). (2017). IELTS Academic reading
sample task. https://www.ielts.org/-/media/pdfs/academic_reading_sample_task_m
ultiple_choice.ashx?la=en
International Language Testing Association (ILTA). (2000). ILTA Code of Ethics. http://
www.iltaonline.com/page/CodeofEthics
Jenkins, J. (2006). Current perspectives on teaching world Englishes and English as a
Lingua Franca, TESOL Quarterly, 40(1), 157–181.
Jenkins, J. (2014). English as a lingua franca in the international university: The politics of aca-
demic English language policy. Routledge.
Kachru, B. B. (1986). The alchemy of English: The spread, functions and models of non-native
Englishes. Pergamon Press.
Kachru, B. B. (1982). (Ed.). The other tongue – English across cultures. University of Illinois
Press.
Kachru, Y. (2006). Culture and argumentative writing in World Englishes. In K. Bolton &
B. B. Kachru (Eds), World Englishes: Critical concepts in linguistics. (Vol. V) (pp. 19–39).
Routledge.
Kim, H. J. (2006). World Englishes in language testing: A call for research. English
Today, 22(4), 32–39.
Kirkpatrick, A. (2006). Which model of English: native-speaker, nativised or lingua
franca? In R. Rubdy & M. Saraceni (Eds.), English in the world: Global rules, global roles
(pp. 71–83). Continuum.
Kirkpatrick, A., & Deterding, D. (2011). World Englishes. In J. Simpson (Ed.), The
Routledge handbook of applied linguistics (pp. 373–388). Routledge.
Lado, R. (1961). Language testing. McGraw-Hill.
Leung, C. (2007). Dynamic assessment: Assessment for and as teaching? Language
Assessment Quarterly, 4(3), 257–278.
Linn, R. L. (2000). Assessments and accountability. Educational Researcher, 29(2), 4–16.
Lynch, B. K. (2001). Rethinking assessment from a critical perspective. Language Testing,
18(4), 351–372.
50 Lee McCallum
Lynch, B., & Shaw, P. (2005). Portfolios, power and ethics. TESOL Quarterly, 39(2),
263–297.
Mauranen, A., Llantada, C. P., Swales, J. M. (2010). Academic Englishes: A standar-
dized knowledge? In A. Kirkpatrick (Ed.), The Routledge handbook of world Englishes
(pp. 634–653). Routledge.
McNamara, T. (2001). Language assessment as social practice: Challenges for research.
Language Testing, 18(4), 333–349.
McNamara, T., & Roever, C. (2006). Language testing: The social dimension. Language
Learning Monograph Series. Blackwell Publishing.
Menken, K. (2008). High-stakes tests as de facto language education policies. In E. Sho-
hamy, & N. H. Hornberger Encyclopedia of language and education (Vol. 7), (pp. 401–413).
Springer.
Messick, S. (1989). Validity. In R. L. Linn. (Ed.), Educational measurement (pp. 13–103).
Macmillan.
Miller, D. M., & Legg, S. M. (1993). Alternative assessment in a high-stakes environ-
ment. Educational Measurement: Issues and Practice, 12(2), 9–15.
Morrow, K. (1979). Communicative language testing: revolution of evolution? In C. K.
Brumfit & K. Johnson (Eds.), The communicative approach to language teaching (pp. 143–159).
Oxford University Press.
Moss, P. A., Pullin, D., Gee, J. P., & Haerbel, E. H. (2005). The idea of testing: Psy-
chometric and sociocultural perspectives. Measurement: Interdisciplinary Research and
Perspectives, 3, 63–83.
Noam, G. (1996). Assessment at a crossroads: Conversation. Harvard Educational Review,
66, 631–657.
O’Sullivan, B. (2011). Language testing. In J. Simpson (Ed.), The Routledge handbook of
applied linguistics (pp. 259–274). Routledge.
Omoniyi, T. (2010). Writing in English(es). In A. Kirkpatrick (Ed.), The Routledge
handbook of world Englishes (pp. 471–490). Routledge.
Pennycook, A. (1994). The cultural politics of English as an international language.
Routledge.
Popham, J. W. (1999). Why standardized test scores don’t measure educational quality.
Educational Leadership, 56(6), 8–15.
Quirk, R. (1990). Language varieties and standard language. English Today, 21, 3–10.
Raddaoui, R., & Troudi, S. (2013). Three elements of critical pedagogy in ELT: An
overview. In P. Davidson, M. Al-Hamly, C. Coombe, S. Troudi, & C. Gunn (Eds.),
Achieving Excellence Through Life Skills Education: Proceedings of the 18th TESOL Arabia
Conference, (pp. 73–82). TESOL Arabia Publications.
Rauf, M., & McCallum, L. (in press). Language assessment literacy: Task analysis in
Saudi universities. In L. McCallum & C. Coombe (Eds.), The assessment of L2 written
English across the MENA Region: A synthesis of practice. Palgrave Macmillan.
Seargeant, P. (2012). Exploring world Englishes: Language in a global context. Routledge.
Shohamy. E. (2004). Assessment in multicultural societies: Applying democratic princi-
ples and practices to language testing. In B. Norton & K. Toohey (Eds.), Critical
pedagogies and language learning (pp. 72–93). Cambridge University Press.
Shohamy, E. (1998). Critical language testing and beyond. Studies in Educational Eva-
luation, 24(4), 331–345.
Shohamy, E. (2001b). Democratic assessment as an alternative. Language Testing, 18(4),
373–391.
Traditional and alternative assessment 51
Introduction
With the tenets of a sociocultural perspective widely recognized in the field of
language education, scholars have attempted to expand these tenets into different
aspects of learning and teaching. Assessment was not excluded. ‘Assessment of
learning’ was critically argued and the concept of ‘assessment for learning’ was
introduced. Out of the need to document learners’ outcomes, and to establish
standards and benchmarks to measure their knowledge of language learning,
came the concept of language assessment literacy (LAL). In its short lifespan
beginning 1991 in general education, and 2001 in language education, the defi-
nition of LAL has been expanded and reconceptualized, and a number different
models proposed . In the light of the prominence LAL has recently gained, and
given the depth and breadth of this concept, this chapter intends trace its genesis,
examining the development of ‘literacy’ to ‘literacies’, and of ‘assessment literacy
to ‘language assessment literacy’. It also explores theoretical frameworks and
proposed models. In conclusion, the concept of LAL is problematized and future
directions of the field are suggested.
to any assessment knowing what they are assessing, why they are doing so,
how best to assess the achievement of interest, how to generate sound
samples of performance, what can go wrong, and how to prevent those
problems before they occur.
(p. 240)
54 Mojtaba Mohammadi and Reza Vahdani Sanavi
The need for assessment to move from the periphery to centre stage was keenly
felt. The reason might be that testing has turned out to be ‘a big business’
(Spolsky, 2008, p. 297) – both commercially or non-commercially – and ‘the
societal role that language tests play, the power that they hold, and their central
functions in education, politics and society’ (Shohamy & Or, 2013, p. x). After
Stiggins, other scholars started to conceptualize assessment literacy, such as
Falsgraf (2006), who defined it as ‘… the ability to understand, analyze and
apply information on student performance to improve instruction’ (p. 6).
Inbar-Lourie (2013) viewed it as ‘the knowledge base required for performing
assessment tasks’ (p. 2924).
The concept of ‘language assessment literacy’ entered the field of language
assessment at the beginning of the 21st century when Brindley (2001) stated
that, unlike in general education, language teaching programs lack a sizable
bulk of research in ‘teacher’s assessment practices, levels, and training, and
professional development needs’ (p. 126). Without mentioning the stake-
holders, some scholars defined language assessment literacy. For Inbar-Lourie
(2008a), it is defined as ‘having the capacity to ask and answer critical questions
about the purpose for assessment, about the fitness of the tool being used,
about testing conditions, and about what is going to happen on the basis of
the test results’ (p. 389). In O’Loughlin’s (2013) view, it encompasses ‘the
acquisition of a range of skills related to test production, test-score inter-
pretation and use, and test evaluation in conjunction with the development of
a critical understanding about the roles and functions of assessment within
society’ (p. 363). Vogt and Tsagari (2014) saw LAL as ‘the ability to design,
develop and critically evaluate tests and other assessment procedures, as well
as the ability to monitor, evaluate, grade and score assessments on the basis of
theoretical knowledge’ (p. 377). Another definition, from quite a general
perspective in terms of stakeholders, is Pill and Harding’s (2013), which states
that LAL ‘may be understood as indicating a repertoire of competences that
enable an individual to understand, evaluate and, in some cases, create lan-
guage tests and analyse test data’ (p. 382).
Malone (2013) laid the accountability on teachers’ shoulders and remarked
that LAL ‘refers to language instructors’ familiarity with testing definitions and
the application of this knowledge to classroom practices in general and specifi-
cally to issues related to assessing language’ (p. 329). Fulcher (2012) offers a
definition of LAL with a wider scope, which embraces different assessment
competencies. He argued that LAL is:
understand why practices have arisen as they have, and to evaluate the role
and impact of testing on society, institutions, and individuals.
(p. 125)
The above conceptualizations of LAL have one thing in common. They have
pinpointed micro- and/or macro-level components of language assessment
practice. Some of the micro-level components are assessment, such as design-
ing, developing, grading, and analysing tests in the classroom, and the related
theoretical issues. Some of the macro-level components are assessment practices
which deal with having a critical perspective for the purpose of assessment,
societal roles and the consequences of assessment, and holding assessment
within wider historical, social, political, and philosophical frameworks.
LAL competencies
In line with the bringing attention to language assessment literacy, there were
attempts to demystify the concept and elaborate on what exactly is meant by
attaining this kind of literacy. The earliest attempt to describe ‘assessment literacy’,
though not named so, in education was the list of seven standards proposed by the
American Federation of Teachers, the National Council on Measurement in
Education, and the National Education Association in 1990, which are summar-
ized by Stabler-Havener (2018, p. 3). It includes skills such as:
The other major proposed competencies of LAL are presented in four dif-
ferent models. The first one is Brindley’s (2001) professional development
program model which introduced five competencies for a language teacher. He
criticized the standards presented by the American Federation of Teachers, the
National Council on Measurement in Education, and the National Education
Association as not being ‘flexible enough to allow teachers to acquire familiarity
with those aspects of assessment that are relevant to their needs’ and not having
56 Mojtaba Mohammadi and Reza Vahdani Sanavi
perspective, he expanded its model to two more tiers of ‘principles’ and ‘con-
text’. The concept of ‘principles’ includes the issues that guide teachers to have
the best possible practice, i.e., getting acquainted with principles, concepts,
processes of assessment, and ethics and codes of practice. ‘Context’ looks at a
wider landscape of LAL by placing practice and principles within a historical,
political, social, and philosophical framework for the stakeholders to be able to
figure out the origin and justifications of these practices, and principles, and the
impact(s) adopting any one of them may have on society, organizations, and
individuals. The initiative of the model seems to be twofold. One is that the
hierarchical nature of the model presents a sequence for the course providers
and trainers, which can be very assistive in providing coursebooks or planning
their sessions. The other, which can also be taken as the consequence of the
hierarchy of materials and issues here, is that these levels are not essential for all
stakeholders. That is, depending on the level they are working at, it can be
mandatory or optional for them. Another point to mention is the necessity of
practicing the theories for the teachers.
More recent research attempts introducing the componential framework of
language assessment literacy are Pill and Harding (2013) and Taylor (2013),
which have taken different view on the concept by accounting for stake-
holders in LAL. Excluding Brindley (2001) and Fulcher (2012) who, in one
way or another, consider a variety of stakeholders with respect to the levels of
literacy development, the studies by Pill and Harding and Taylor highlight
the agencies that can benefit differently and to various degrees from LAL
competencies. Pill and Harding adapted the idea from Bybee (1997), and it
was later expanded by Kaiser and Willander (2005) in the fields of scientific
and mathematical literacy in education. Unlike the previous models which
are mostly modular, Pill and Harding proposed a continuum of stages for LAL
ranging from illiteracy, which is complete ignorance in assessment concepts
and methods, to multidimensional literacy, which includes knowledge about
the philosophical, historical, and social background of assessment. Kaiser and
Willander (2005) summarized these five stages as follows:
We think that the major demerit here is the lack of a clear definition of
the levels each stakeholder is expected to attain, as Harding and Kremmel
(2016) also note. Taylor’s (2013) spider web model provides the solution.
In her LAL profile model, there are eight competencies, which are avail-
able in the previously mentioned models, with the exception of one:
knowledge of theory, technical skills, principles and concepts, language
pedagogy, sociocultural values, local practices, personal beliefs/attitudes,
and scores and decision making. This is the first model that has paid
attention to the personal beliefs/attitudes of the stakeholders in language
assessment. In tandem with Scarino (2013) and Giraldo (2018), we believe
that some particularities of language assessment practice come from the
bottom, from teacher–assessors’ interpretations, judgements, and decisions
in assessment, which result from their developing capabilities. Taylor’s
model has mapped these eight components onto a five-stage continuum
adopted from Pill and Harding (2013) (0 = illiteracy to 4 = multi-
dimensional literacy), and proposed which components, and how much of
each, should be achieved by each stakeholder – test writers, classroom
teachers, university administrators, and professional language testers.
Figure 4.1 Differential AL/LAL profiles for four constituencies. (a) Profile for test wri-
ters. (b) Profile for classroom teachers. (c) Profile for university adminis-
trators. (d) Profile for professional language testers.
Adopted from Taylor (2013), p. 410
Ontogenetic and phylogenetic perspectives 59
According to the model, a test writer, for example, is expected to have more
developed literacy in knowledge of theory, technical skills, and principles and
concepts, and less in personal beliefs/attitudes, local practices, and language
pedagogy than a classroom teacher. In spite of the differentiation of the stake-
holders, and the related levels of mastery in each dimension of LAL competency
here, Taylor (2013) has failed to define these dimensions; hence, anyone can
have his or her own definition of them (Harding & Kremmel, 2016).
In a recent study, Yan, Zhang, and Fan (2018) investigated how contextual
and experiential factors mediate teachers’ LAL development. They proposed a
two-layered mediation model for language teachers LAL competencies that
includes contextual factors (such as educational and assessment policies, assess-
ment practice for different stakeholders, and the resources and constraints of the
local instructional context), and experiential factors (such as assessment develop-
ment, i.e., item writing skills, and development of the assessment intuition, i.e.,
items analysis and score use). They also found that ‘the impact of contextual
factors is mediated through experiential factors. That is, while the assessment
context creates opportunities and motivation for assessment practice, it is the
accumulation of assessment experiences that foster and strengthen assessment
knowledge and skills’ (p. 166). Like some previous studies (e.g., Giraldo, 2018;
Scarino, 2013, Taylor, 2013), the enhancement of language teachers’ LAL begins
with their own assessment experiences, interpretations, and self-awareness.
LAL conceptualization
The literature of assessment literacy, both in general and in language educa-
tion, has put forward a number of definitions for key concepts. In language
education, from the earliest model up to the most recent one, few have
included macro-level assessment (i.e. consideration of the social, cultural, and
political context) when evaluating competencies. The majority of the com-
petency evaluations are for micro-level assessment (i.e. classroom assessment).
There is a need to specify a fairly balanced weight for the mid-level organi-
zations and agencies, institutions, associations, and communities. Along with a
need for a more comprehensive theoretical definition, LAL could also benefit
from being better defined operationally. Several studies have designed a
questionnaire and used interviews (e.g., Crusan, Plakans, & Gebril, 2016;
Fulcher, 2012; Hasselgreen, Carlsen, & Helness, 2004; Vogt & Tsagari, 2014);
60 Mojtaba Mohammadi and Reza Vahdani Sanavi
LAL stakeholders
Taylor’s (2013) recent model, paid attention to a number of stakeholders in language
assessment; however, other studies were either limited to looking at teachers or
remained silent regarding the existence of any stakeholders. Any re-conceptualiza-
tions need to be comprehensive in scope in order to cover all stakeholders including:
teachers, test writers, university administrators, and professional language testers.
Other stakeholders can also play a crucial role in the development, analysis, and
interpretation of a test. If parents, for example, are assessment-literate and are familiar
with a few testing principles and practices, teachers and school administrators can
have their support in their school/classroom activities.
LAL resources
To enhance LAL among teachers (or even other stakeholders), teacher educa-
tion programs, or other stakeholders, require adequate resources. Davies (2008),
in his analysis of the assessment textbooks published over 46 years, revealed that
language testing experts have separated themselves from the field of education
testing which has resulted in ‘students [being] over-protected from exposure to
empirical encounters with real language learners’ (p. 341). In the resources
meant to equip novice teachers with understanding and practical skills in the
classroom environment, there seems to be a need to have a trade-off between
covering theoretical concepts and practical, classroom-based assessment issues.
Fulcher (2012) also investigated the needs of the language teachers and con-
cluded that language teachers’ require certain resources to fulfil their role as
language testers:
A text that is not light on theory, but explains concepts clearly, especially
where statistics are introduced.
A practical ‘how-to’ guidance, although not prescriptive in nature.
A balance between classroom and large-scale testing, with illustrations and
practical examples drawn from a range of sources and countries.
Activities that can be reasonably undertaken given the constraints and
resources that teachers normally face. (p. 124)
62 Mojtaba Mohammadi and Reza Vahdani Sanavi
Textbooks are not the only available resources to help LAL stakeholders
enhance their awareness of and adherence to language assessment literacy norms.
As Malone (2013) noted, in recent years, textbooks can be supplemented in
assorted ways, including:
Conclusion
In the early 21st century, teaching, learning, and assessment are no longer three
stranded islands in the ocean of education, in general, nor is it the same in that
of language education, in particular. They are parts of a trilogy, with a series of
theories and practices in one large compendium, with various characters and
roles, and different scripts, but carefully stage-managed to make one voice and
a single message: Teaching is for the sake of learning, and assessing is for the
sake of learning. Assessment is no longer an ignored field in this compendium.
As Coombe, Troudi and Al-Hamly (2012) estimated, assessment makes up 30
to 50 percent of the daily activities of the language teachers. Hence, assessment
literacy deserves more attention. In its short lifespan, LAL has undergone a
series of (re)conceptualizations with a number of models having been proposed,
yet it continues to mature. There are several areas for future growth:
Only when language assessment literacy has explored these growth areas, can
we think of the concept as mature and developed, and contributing to an
increased quality of language education.
References
Bailey, K. M., & Brown, J. D. (1996). Language testing courses: What are they? In A.
Cumming & R. Berwick (Eds.), Validation in language testing (pp. 236–256). Multi-
lingual Matters.
Brown, J. D., & Bailey, K. M. (2008). Language testing courses: What are they in 2007.
Language Testing, 25(3), 349–383.
Bybee, R. W. (1997). Achieving scientific literacy: From purposes to practices. Heinemann.
Brindley, G. (2001). Language assessment and professional development. In C. Elder, A.
Brown, K. Hill, N. Iwashita, T. Lumley, T. McNamara & K. O’Loughlin (Eds.),
Experimenting with uncertainty: Essays in honor of Alan Davies (pp. 126–136). Cambridge
University Press.
Cambridge University (2018). Certificate in teaching English to speakers of other languages
(CELTA): Syllabus and assessment guidelines. https://www.cambridgeenglish.org/Ima
ges/21816-celta-syllbus.pdf
Cambridge University (2019). Diploma in teaching English to speakers of other languages
(DELTA): Syllabus specifications. https://www.cambridgeenglish.org/Images/
22096-delta-syllabus.pdf
Coombe, C., Troudi, S., & Al-Hamly, M. (2012). Foreign and second language tea-
cher assessment literacy: Issues, challenges, and recommendations. In C. Coombe,
P. Davidson, B. O’Sullivan & S. Stoynoff (Eds.), The Cambridge guide to second lan-
guage assessment (pp. 20–29). Cambridge University Press.
Crusan, D., Plakans, L., & Gebril, A. (2016). Writing assessment literacy: Surveying second
language teachers’ knowledge, beliefs, and practices. Assessing Writing, 28, 43–56.
Davies, A. (2008). Textbook trends in teaching language testing. Language Testing, 25(3),
327–347.
Falsgraf, C. (2006). Why a national assessment summit? New visions in action. National
Assessment Summit. Summit conducted in Alexandria, Va. https://files.eric.ed.gov/
fulltext/ED527580.pdf
Fulcher, G. (2012). Assessment literacy for the language classroom. Language Assessment
Quarterly, 9(2), 113–132.
Giraldo, F. (2018). Language assessment literacy: Implications for language teachers.
Profile: Issues in Teachers’ Professional Development, 20(1), 179–195.
Goody, J., & Watt, J. (1963). The consequences of literacy. Comparative Studies in Society
and History, 5(3), 304–345.
Harding, L., & Kremmel, B. (2016). Language assessment literacy and professional
development. In D. Tsagari & J. Banerjee (Eds.), Handbook of second language assessment
(pp. 413–428). Mouton de Gruyter.
Hasselgreen, A. (2008). Literacy in classroom assessment (CA): What does this involve?5th
Annual Conference. Paper conducted at the meeting of the European Association for
Language Testing and Assessment, Athens, Greece. http://www.eaulta.eu.org/con
ferences/2008/docs/sunday/panel/Literacy%20in%20classroom%20assessment.pdf
64 Mojtaba Mohammadi and Reza Vahdani Sanavi
Hasselgreen, A., Carlsen, C., & Helness, H. (2004). European survey of language testing and
assessment needs: Report: Part 1 general findings. European Association for Language
Testing and Assessment. http://www.ealta.eu.org/documents/resources/survey-rep
ort-pt1.pdf
Hillerich, R. L. (1976). Toward an assessable definition of literacy. The English Journal,
65(2), 50–55.
Hyland, K., & Hamp-Lyons, L. (2002). EAP: Issues and directions. Journal of English for
Academic Purposes, 1(1), 1–12.
Inbar-Lourie, O. (2008a). Constructing an assessment knowledge base: A focus on
language assessment courses. Language Testing, 25(3), 385–402.
Inbar-Lourie, O. (2013). Language assessment literacy. In C. Chapelle (Ed.), The ency-
clopedia of applied linguistics (pp. 1–9). Blackwell Publishing Ltd.
Kaiser, G., & Willander, T. (2005). Development of mathematical literacy: Results of an
empirical study. Teaching Mathematics and its Applications, 24(2–3), 48–60.
Koh, K., & DePass, C. (2019). Developing teachers’ assessment literacy: Multiple per-
spectives in action. In K. Koh, C. DePass, & S. Steel (Eds.), Developing teachers’
assessment literacy: A tapestry of ideas and inquiries (pp. 1–6). Brill Sense.
Kumaravadivelu, B. (2006). Understanding language teaching: From method to postmethod.
Lawrence Erlbaum Associates.
Lam, R. (2014). Language assessment training in Hong Kong: Implications for language
assessment literacy. Language Testing, 32(2), 169–197.
Malone, M. (2013). The essentials of assessment literacy: Contrasts between testers and
users. Language Testing, 30(3), 329–344.
Mendoza, A. A. L., & Arandia, R. B. (2009). Language testing in Colombia: A call for more
teacher education and teacher training in language assessment. Profile, 11(2), 55–70.
O’Loughlin, K. (2013). Developing the assessment literacy of university proficiency test
users. Language Testing, 30(3), 363–380.
Pill, J., & Harding, L. (2013). Defining the language assessment literacy gap: Evidence
from a parliamentary enquiry. Language Testing, 30(3), 381–402.
Scarino, A. (2013). Language assessment literacy as self-awareness: Understanding the
role of interpretation in assessment and in teacher learning. Language Testing, 30(3),
309–327.
Shohamy, E., & Or, I. G. (2013). Introduction to volume 7. In E. Shohamy, I. G. Or, &
S. May (Eds.), Encyclopedia of language and education (3rd edition., Vol. 7) (pp. ix–xviii).
Springer Science and Business Media.
Spolsky, B. (2008). Language testing at 25: Maturity and responsibility? Language Testing,
25(3), 297–305.
Stabler-Havener, M. L. (2018). Defining, conceptualizing, problematizing, and assessing
language teacher assessment literacy. Teachers College, Columbia University Working
Papers in Applied Linguistics & TESOL, 18(1), 1–22.
Stiggins, R. J. (1991). Assessment literacy. Phi Delta Kappa, 72(7), 534–539.
Stiggins, R. J. (1995). Assessment literacy for the 21st century. Phi Delta Kappa, 77(3),
238–245.
Taylor, L. (2013). Communicating the theory, practice and principles of language testing
to test stakeholders: Some reflections. Language Testing, 30(3), 403–412.
Trinity College London (2016). Certificate in teaching English to speakers of other languages
(CertTESOL): Syllabus. https://www.trinitycollege.com/resource/?id=5407
Ontogenetic and phylogenetic perspectives 65
Trinity College London (2017). Licentiate diploma in teaching English to speakers of other
languages (LTCL Diploma TESOL): Validation requirements, syllabus and bibliography for
validated and prospective course providers: Syllabus. https://www.trinitycollege.com/
resource/?id=1776
UNESCO (2004). The plurality of literacy and its implications for policies and programs: Edu-
cation sector position paper. UNESCO. http://unesdoc.unesco.org/images/0013/
001362/136246e.pdf
Vogt, K., & Tsagari, D. (2014). Assessment literacy of foreign language teachers:
Findings of a European study. Language Assessment Quarterly, 11(4), 374–402.
Yan, J. (2010). The place of language testing and assessment in the professional preparation
of foreign language testers in China. Language Testing, 27(4), 555–584.
Yan, X., Zhang, C., & Fan, J. J. (2018). Assessment knowledge is important, but …:
How contextual and experiential factors mediate assessment practice and training
needs of language teachers. System, 74, 158–168.
Part 2
Introduction
Providing effective feedback and expansive feedforward involves a complex
process that requires teachers to provide meaningful advice to students’ work to
take their learning forward. Feedback, as described by Brown, Bull, and Pen-
dlebury (2013), is the best-tested principle and practice in the classroom. It can
be meaningful and relevant when it is directed towards encouraging students to
improve their own learning. In addition, feedback becomes effective when it is
focused on students and where it provides the opportunity to act in response.
This closing the gap between where the learner is and where the learner will go
leads to the power of feedback (Hillocks, 1986; Hattie, 2009). As Brown and
Knight (1994) put it, good feedback is achieved when students are suitably
guided towards developing deep understanding to affect learning.
Feedback has been introduced and used intensively in English as a Foreign
Language (EFL) and English as a Second Language (ESL) classrooms as a
method of encouraging students to collaborate with others, contribute to
learner autonomy, and develop a sense of ownership (Berg, 1999; Carson &
Nelson, 1996; Tsui & Ng, 2000; Paulus, 1999).
Some cultures are particularly sensitive to receiving feedback and feedfor-
ward, therefore knowledge of a specific classroom culture can be helpful.
Hattie and Timperley (2007) asserted that learning context is very important in
providing feedback. Noels (2001) added that by acknowledging students’ needs
and preferences, feedback could help students develop an understanding of its
positive effect as a technique that could lessen the negative affective results of
their work. Furthermore, Hyland (2003) advised that teachers should imple-
ment feedback by explaining to students that they are part of the process, and
responsible for their own development. Peacock’s study (as cited in Al-wossabi,
2019) stated that when teachers employ different types and techniques of
feedback in relation to students’ preferences and styles, it would support and
advance their learning. This point is of great importance because feedback not
only helps students and teachers achieve effective learning and teaching, it also
70 Junifer A. Abatayo
My teaching context
The Sultanate of Oman and its Ministry of Education have high hopes that
young Omanis, equipped with a good education can cope up with the demands
of time. To support all Omani students’ learning, the Education Council of
Oman (2017) put emphasis on its Education Vision 2040. Education authorities
mandated that development of the education system and curricula should pro-
vide students with positive reinforcement and support in classrooms across dif-
ferent levels. Highlighting the support needed in the classroom, one important
move is to monitor students’ progress and lead them to the next phase of learn-
ing. This is where feedback and feedforward mechanisms play their role in the
development of students’ learning. However, the lack of communication
between teacher and student, between student and student, and among students
posed a threat to achieving effective teaching and learning. Interestingly, Al-Issa
(2006) an Omani scholar, stressed that the teacher’s role is important in raising
students’ awareness about taking individual responsibility and in advancing their
own learning. When feedback is timely, and purposely given, it will help stu-
dents advance their learning.
In 2016, I initially conducted an exploratory study that addressed the
importance of the integration and implementation of feedback and feedforward
in the context of Omani students’ learning experiences. Students’ form of
writing is one aspect of teaching that has influenced my desire to go extra miles
A reflective practice in an EFL classroom 71
taught one of these writing classes, I noticed that there were no explicit men-
tions of feedback sessions on students’ writing in the course profile and port-
folios. Although feedback can be conducted indirectly, it is important that
students and teachers share and understand the feedback mechanism to improve
teaching and advance learning. According to Black and Wiliam (1998), all
activities undertaken by teachers and students in assessing themselves and their
work can help develop teaching and learning and can increase engagement in
the classroom. At the Faculty of Language Studies, it is central to adopt and
implement a clear feedback mechanism that can influence good learning and
teaching. I realized that providing feedback and effective feedforward within a
learning and teaching context is essential to the development of students’
potential. In the case of EFL instruction, in particular in the writing classroom,
my students experienced both global and developmental errors, and I believe
that if these errors are used effectively as a form of feedback, it can help stu-
dents improve their work.
As an EFL teacher, I always engage my students in the teaching and learning
process in order to offer them ownership of their written work. It is true that
there are contradictions between teacher and student perceptions in relation to
the different forms of feedback used in the classroom. This led me to suggest an
assessment protocol posing these questions in context: ‘Where I am going?’
‘How I am going?’ and ‘Where to next?’ (Hattie & Timperley, 2007). With
these questions in mind, it is my desire and teaching goal to guide students and
motivate them to explore opportunities for learning. In my EFL classroom, my
students’ voices are heard, and I always remind them that their mistakes in
writing are part of their development and can help them develop awareness in
becoming self-directed learners.
‘We need to ensure that all writing teachers receive training on how to
implement the suggested feedback mechanism’.
‘Student-level representatives should be invited to increase students’
awareness on feedback structure’.
‘Teachers must be given the chance to also explore other feedback types
that will work in EFL context’.
‘The feedback model should also be initially implemented in Level 1
writing course to determine its effectiveness and limitations’.
‘As members of the faculty, we must ensure that assessment literacy be
the focus of our seminar–workshops so we can initiate a timely feedback
mechanism, thus supporting assessment literacy across disciplines’.
The model below (see Figure 5.1) is developed and modified based on Hattie
and Timperley’s (2007) three major feedback questions. The design of this
feedback instrument integrated the results of focus group discussions I con-
ducted with my colleagues who are involved in the development of the writing
course. Program leaders, course coordinators, and student level representatives
met and discussed the suggestions and feedback from the students.
In the feed up stage, learning outcomes, class activities, assessments, and
marking criteria were explained explicitly to students. This helped them
understand the nature of the course and teachers’ expectations of their output.
The importance of learning outcomes was highlighted in the beginning of the
course, so that students can prepare for tests and other in-class requirements and
activities. Rubrics and other scoring guides were also discussed and made clear
for the students during the first week of the class. In fact, students were
involved in the development of the rubrics used in evaluating their work.
During the feedback stage, information was given to students regarding their
own progress and achievement. Here, students were asked to submit their written
work depending on the class activity every week. This was followed with the
revision of their work integrating the valuable feedback and comments from their
teachers and peers. Revisions at this stage involved two stages: teacher focus feed-
back and some suggestions using corrective feedback strategies. To understand
students’ learning experiences, and to determine what they can do and cannot do
in a particular task, students were asked to map their own learning using the ‘can
do statements’. These are statements which are rooted from the course learning
outcomes that provide students’ level of learning in the classroom. The results of
this feedback mechanism helped teachers improve teaching strategies and adjust
teaching tasks within students’ grasp. To add meaningful context to this mechan-
ism, reflective sessions were also conducted where students got involved in focused
group discussions, online discussions, and teacher consultations.
Lastly, feedforward is a discussion phase between teacher and students on
what they have achieved and how they can improve their own learning in the
future. In addition, I conducted a discussion on the possibility of how they can
move on to the next level of knowledge so as to help them to do better in the
next phase of learning.
Aside from Moodle, I also explored other practical ways to transform feed-
back using technology. I created a WhatsApp group in my tutorial class with
a weekly chat or sessions that I called ‘Help & Receive Help’ (HRH). Stu-
dents in a group could send a message asking for help with improving and
correcting spelling mistakes, the use of correct words, and sentence con-
struction. My students were given a theme or topic every week, so that they
were guided in their participation. Words and sentence types were sent in the
WhatsApp group, then students offered help through sharing or providing
correct spelling, grammar, or sentence construction. There were some chal-
lenges with this, but I believe the practicality of this technology was an
opportunity that helped students improve their writing. Due to technological
advancement, the implementation of feedback has been widely supported,
especially on Internet chat sites where both teachers and students share
thoughts and ideas through electronic mail, bulletin board systems, and on
line discussion boards (Chen, 2016; Braine, 2001; Ware, 2004).
Here is some unedited student feedback regarding the use of technology in
the classroom (i.e. Moodle, an on-line discussion board, and WhatsApp):
References
Al-Issa, A. S. (2016). Meeting students’ expectations in an Arab ICLHE/EMI context:
Implications for ELT education policy and practice. International Journal of Applied
Linguistics and English Literature, 6(1), 209–226.
Al-wossabi, S. A. N. (2019). Corrective feedback in the Saudi EFL writing context: A
new perspective. Theory and Practice in Language Studies, 9(3), 325–331.
Asari, Y. (2019). EFL teachers’ L1 backgrounds, beliefs, and the characteristics of their
corrective feedback. Journal of Asia TEFL, 16(1), 250.
Beaumont, C., O’Doherty, M., & Shannon, L. (2011). Reconceptualising assessment feed-
back: a key to improving student learning? Studies in Higher Education, 36(6), 671–687.
Berg, E. C., (1999). The effects of peer trained response on ESL students’ revision types
and writing quality. Journal of Second Language Writing, 8, 215–241.
Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Edu-
cation: Principles, Policy & Practice, 5(1), 7–74.
Braine, G. (2001). A study of English as a foreign language (EFL) writers on a local-area
network (LAN) and in traditional classes. Computers and Composition, 18, 275–292.
82 Junifer A. Abatayo
Borich, G. D. (2010). Effective teaching methods. (8th ed.) Pearson Education Inc.
Brown, G. A., Bull, J., & Pendlebury, M. (2013). Assessing student learning in higher edu-
cation. Routledge.
Brown, J. D. (2019). Assessment feedback. The Journal of Asia TEFL. 16(1), 334–344.
Brown, S. & Knight, P. (1994). Assessing learners in higher education. Kogan Page.
Budimlic, D. (2012). Written feedback in English: Teachers’ practices and cognition
[Unpublished master’s thesis, Norges teknisk-naturvitenskapelige universitet,
Fakultet for samfunnsvitenskap og teknologiledelse, Program for lærerutdanning].
Carson, J. G., & Nelson, G. L. (1996) Chinese students’ perception of ESL peer
response group interaction. Journal of Second Language Writing, 5(1),1–19.
Chen, T. (2016). Technology-supported peer feedback in ESL/EFL writing classes: A
research synthesis. Computer Assisted Language Learning, 29(2), 365–397.
Education Council of Oman (2017). Philosophy of education in the sultanate of Oman.
Ellis, R. (2009). Corrective feedback and teacher development. L2 Journal, 1(1), 1–18.
Ellis, R. (2005). Principles of instructed language learning. System, 33(2), 209–224.
Evans, N. W., Hartshorn, J., & Allen Tuioti., E. (2010). Written corrective feedback:
practioners’ perspectives. International Journal of English Studies, 10(2), 47–77.
Ferris, D. (1999). The case for grammar correction in L2 writing classes: A response to
Truscott (1996). Journal of Second Language Writing, 8, 1–10.
Hattie, J. (2009). The black box of tertiary assessment: An impending revolution. In L.
H. Meyer, S. Davidson, H. Anderson, R. Fletcher, P. M. Johnston, & M. Rees
(Eds.), Tertiary assessment & higher education student outcomes: Policy, practice & research
(pp. 259–275). Ako Aotearoa.
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational
Research, 77(1), 81–112.
Hillocks Jr, G. (1986). Research on written composition: New directions for teaching. National
Council of Teachers of English.
Hyland, F. (2003). Focusing on form: Student engagement with teacher feedback.
System, 31, 217–230.
Irons, A. (2007). Enhancing learning through formative assessment and feedback. Routledge.
Krashen, S. (1982). Principles and practice in second language acquisition. Pergamon.
Lantolf, J., & Thorne, S. L. (2007). Sociocultural theory and second language learning. In
B. van Patten & J. Williams (Eds.), Theories in second language acquisition (pp. 201–224).
Lee, I. (2014). Revisiting teacher feedback in EFL writing from sociocultural perspec-
tives. Tesol Quarterly, 48(1), 201–213.
Lightbrown, P. M., & Spada, N. (1999). Instruction, first language influence, and
developmental readiness in second language acquisition. The Modern Language Journal,
83, 1–22.
Lyster, R. (2011). Content-based second language teaching. In E. Hinkel, Handbook of
research in second language teaching and learning (Vol. 2). (pp. 611–630) Routledge.
Lyster, R., Lightbrown, P. M., & Spada, N. (1999). A response to Truscott’s ‘What’s
wrong with oral grammar correction’. The Canadian Modern Language Review, 55,
457–467.
McCord, M. B. (2012). Exploring Effective Feedback Techniques in the ESL Classroom.
Language Arts Journal of Michigan, 27(2), 11.
Noels, K. (2001). Learning Spanish as a second language: Learners’ orientations and
perceptions of their teachers; communication style. Language Learning, 51, 107–144.
A reflective practice in an EFL classroom 83
Paulus, T. (1999). The effect of peer and teacher feedback on student writing. Journal of
Second Language Writing, 8, 265–289.
Peacock, M. (2001) Match or mismatch? Learning styles and teaching styles in EFL.
International Journal of Applied Linguistics, 11, 1–20.
Ware, P.D. (2004) Confidence and competition online: ESL student’s perspectives on
web-based discussions in the classroom. Computers and Composition, 21(4), 451–468.
Warschauer, M. (2002). Networking into academic discourse. Journal of English for Aca-
demic Purposes, 1(1), 45–58.
Sadler, D. R. (1989). Formative assessment and the design of instructional systems.
Instructional Science, 18(2), 119–144.
Sauvignon, S. J. (2005). Communicative language teaching: Strategies and goals. In E.
Hinkel (Ed.), Handbook of research in second language teaching and learning (pp. 635–651).
Mahwah.
Swaffar, J., Romano, S., & Arens, K. (1998). Language learning online: Theory and practice
in ESL and L2 computers classroom. Labyrinth Publications.
Torrance, H., & Pryor, J. (2002). Investigating formative assessment. Teaching, learning and
assessment in the classroom. Open University Press.
Tsui, A. B., & Ng, M. (2000). Do secondary L2 writers benefit from peer comments?
Journal of Second Language Writing, 9(2), 147–170.
Chapter 6
Introduction
By searching for efficient methods of developing student teachers’ language
assessment literacy (LAL), we have concluded that checklists can be a good
option. Thus, this chapter is dedicated to exploring the potential of
checklists to be exploited as a pedagogical tool in pre-service foreign lan-
guage (FL) teacher preparation. This research has been done as part of a
long-term project aimed at designing and further reshaping a course on
assessment for master’s degree students studying English and French as their
second language. Teaching the afore-mentioned course is iterative, but the
current study presents the results obtained after teaching it in 2018–2019 to
experimental groups at three universities in Kharkiv, Ukraine. The objec-
tive of the course is to develop pre-service teachers’ competency in
selecting, adapting, designing, and administering tasks for assessing bache-
lor’s degree students’ reading, listening, speaking, and writing skills, and
assessing their oral and written performance respectively. Owing to their
advantages, checklists have been a helpful instrument for teaching. The
chapter focuses on their functions, types, structure, and implementation,
aligned with the specific aim mentioned above. Some sample checklists are
presented.
This study commenced with a review of checklists available in the area of
language assessment. A qualitative study revealed that most checklists appear
to be evaluative and designed specifically for experienced test developers, in
that they abound with meta language and require profound knowledge of
procedure. This was followed by the adaptation stage at which we tried to
enhance the clarity of rubrics in items by simplifying them and fine-tuning
them to the specific focus of the study. This necessitated a more profound
understanding of the nature of checklists, namely the principles and
requirements of their development, structure, and types. For this we
addressed not only checklists and theory on their design relevant to lan-
guage assessment, but also other areas, with the intention of exploring their
potential for teaching purposes by applying an interdisciplinary approach.
Using checklists 85
Theoretical background
At the present we can find few areas of life where checklists cannot be applied.
They are extensively used in household as shopping lists, in aviation for
ensuring pre-flight safety, in medicine for diagnostic purposes, in IT industry
for software product quality assurance, etc. The scope of their application is so
varied that enumerating all cases could be endless.
First introduced by Osborn (1953) as a simple tool in the form of a series of
comprehensive questions to encourage creative thinking in approaching complex
design tasks, checklists were meant to be used individually or in groups. The
questions relating to the point of focus should be answered one at a time to help
explore all possible ways to handle a problem.
In Multilingual glossary of language testing terms (ALTE, 1998) we find the
following definition: ‘A checklist is a list of questions or points to be
answered or covered. Often used in language testing as a tool of observation
or analysis’ (p. 137). As can be seen, this definition is fairly laconic and cannot
serve as a guideline for designing checklists. In order to have a better under-
standing of the phenomenon let’s turn to some other definitions in online
dictionaries hit by a Google search.
‘A checklist is a list of all the things that you need to do, information that you
want to find out, or things that you need to take somewhere, which you make in
order to ensure that you do not forget anything; a list of things, names, etc. to be
checked off or referred to for verifying, comparing, ordering, etc.’ (Collins).
‘A checklist is a comprehensive list of important or relevant actions, or steps to
be taken in a specific order’ (WebFinanceInc).
‘A checklist is a list of items required, things to be done, or points to be
considered, used as a reminder’ (Lexico).
‘A checklist is a type of informational job aid used to reduce failure by com-
pensating for potential limits of human memory and attention. It helps to
ensure consistency and completeness in carrying out a task’ (Educalingo).
There is a common practice to use checklists for peer-, self-, and rater
assessment of speaking or writing. As Green (2013) remarks, they help focus on
important aspects of oral performance whilst being straightforward to use,
although he claims that using checklists deprives the process of its real-life
communicative value by concentrating only on the presence/absence of certain
elements. He also finds it possible to develop rating scales out of checklists.
Khalifa & Salamoura (2011) and O’Sullivan, Weir, and Saville (2002) pointed
out the potential of checklists to facilitate a comparison between different
speaking test formats. Cambridge self-evaluation checklists of different profi-
ciency levels are designed to help learners proofread and edit their pieces of
writing (Cambridge Assessment). Wu (2014) offers cognitive processing
checklists to test takers to self-report on their reading.
Checklists can also be used for providing immediate diagnostic feedback to
testees (Green, 2013).
In FL teaching a checklist is an instrument that helps FL practitioners evaluate
language teaching materials. Quantitative and qualitative teacher-made software
checklists are used to evaluate English language learning websites as additional
tools (Fuentes & Risueno Martinez, 2018, p. 27), and checklists are used to
evaluate strengths and weaknesses in an English language textbook (AbdelWa-
hab, 2013).
For assessing pupils’ reading and writing in a native language, Fiderer (1999)
designed a series of checklists. Though an assessment tool, these checklists could
be perfect guidelines for teaching purposes.
In classroom instruction checklists are considered pedagogical tools with vast
functionality: observation of students in the learning process, evaluation of
instruction, an assessment tool, and memory aids/mnemonic prompts (Dudden
Rowlands, 2007; Strickland & Strickland, 2000).
Observation checklists are lists of things to look at when observing either
individuals or a group of learners doing some activity at or after class (Strickland
& Strickland, 2000). They are employed in a search for remedies for teaching
some aspects or teaching specific groups of learners (Alberta Education, 2008;
Dudden Rowlands, 2007). They help to quickly gather information about how
well learners perform, about their strengths or weaknesses, and about their
learning styles. Normally they are written in a yes/no format and contain spe-
cific criteria. They can include spaces below or in the far-right column for brief
comments, which provide additional information not captured in checklists
(Alberta Education, 2008).
Teachers rely on evaluation checklists in collecting data about the efficacy of
their teaching methods, accuracy, appropriacy, and completeness of tasks done
by their students, assessing outcomes. They help teachers clarify what is indi-
cative of a successful performance (Fuentes & Risueno Martinez, 2018, p. 27).
Checklists meant as an achievement assessment tool should cover a specific area
taught in accordance with the curriculum.
Using checklists 89
Entry/exit checklists (Wilson, 2013) relate more to the stage of their appli-
cation, thus, they are used either at the beginning or at the end of the research
or development process to evaluate a product’s degree of readiness for
submission.
Research checklists (Wilson, 2013) have a very specific area of application.
They are meant for researchers to use for the evaluation and/or review of their
own or other’s research.
Gawande (2009) differentiated two types of checklists that depend on how
experienced the respondent is: a) DO-CONFIRM checklists, which are relied on
by experienced people after doing something to make sure all the necessary steps
have been taken; b) READ-DO checklist, which is used by the less experienced
or unexperienced, who do what an item says before proceeding to the next item.
All the types mentioned above are meant to be mnemonic devices (for
example, Dudden Rowlands, 2007; Scriven, 2000; Wilson, 2013), and it is not
their only common feature. As we could see, they all have more in common.
We found these classifications rather confusing due to their ambivalent nature.
In fact, all the procedures described above require both observation and eva-
luation based on some standards/criteria that arranged in a specific order,
though following that order is not crucial.
As we could see checklists have a number of advantages but as well, they also
have a number of limitations. Summing up the theoretical findings on check-
lists, and based on our own experience, we recognize the following
advantages:
Checklists:
Are fairly easy to develop and use (also in AbdelWahab, 2013; Scriven, 2000).
Can be detailed, though short and meaningful.
Allow for exercising a high degree of control by the teacher.
Help the teacher to evaluate, not only a particular student, but also other
students, as well as the results of students’ interaction.
Can be aligned with particular tasks (also in Dudden Rowlands, 2007;
Wilson, 2013).
Allow students to self-monitor their own progress (also in Dudden
Rowlands, 2007).
Designing checklists can be time-consuming if done for the first time and
not based on a justified theory framework.
Using printed versions of checklists requires extra material resources.
There is a necessity for double checking as respondents can tick the boxes
without reading the items (also in Scriven, 2000).
Using checklists 91
In any case, checklists should be used only as an additional tool for teaching
and assessing, supplementing other time-proven and well-reputed tools. They
need to be validated to be efficient (for example, AbdelWahab, 2013;
Gawande, 2009; Scriven, 2000).
Gawande (2009) insists on keeping them short in order to fit the limit of
working memory, with simply worded items, and without unfamiliar language.
Judging by these general characteristics, checklists can undoubtedly contribute
to achieving our overall goal of developing student teachers’ language assessment
literacy. However, their efficient implementation in the language classroom
required profound determining characteristics more specific to the context of
classroom learning. Meanwhile, this issue seems neglected by researchers, despite
the evident value of the tool under consideration. There are very few well-
grounded theoretical works, and limited practical evidence dedicated to the use
of checklists by university teachers and students. For that, we revisited the studies
from varied disciplines.
On the basis of the generalized characteristics of checklists presented above,
we compiled the ones specific for our teaching context (See Characteristics of
checklists in Appendix A). These characteristics may serve as requirements for
designing checklists in order to develop LAL pre-service.
Research problem
One of the objectives of the study was to determine each aspect of classroom
assessment’s degree of importance in order to develop a checklist to cover it.
Experimental teaching was used in the checklists’ development process, which
tested the practical significance of each checklist in order to decide which points
should be included into the final checklists and to validate them in further
experiments. First, we are going to illustrate the preparatory steps that should be
taken when adjusting to a particular local teaching context. In fact, to start with,
we need to describe the context. As it was mentioned above, the objective of the
course is to develop master’s degree students’ language assessment literacy (com-
petency). We consider LAL to be part of their assessment literacy, an integrated
competency, which is the cumulative result of learning different disciplines. By
LAL we mean master’s degree students’ ability to perform assessment and evalua-
tion of bachelor’s degree students’ communicative competency, which includes:
planning classroom assessments, preparing assessment materials (selecting or adapt-
ing tasks from respected sources meeting the standards, designing tasks or their
complexes), administering assessments, rating oral and written performance,
marking and scoring papers, reporting the results to students and other stake-
holders, and providing feedback on the obtained results. The course is not meant
to instruct students on test design, calibrating items, validating tasks, or anything
involving repetitive piloting or the mathematical statistics typical of large-scale,
high-stake tests. Moreover, students are supposed to learn about traditional and
alternative methods of classroom assessment.
92 Olga Ukrayinska
include items concerned with elements that were key for our students. In some
cases, we failed to find any checklists, such as for text adaptation or the self-
evaluation of formal letter (See Self-evaluation: Formal letter checklist in Appendix
B for the specifics of the concrete task).
Rationale
Working with ready-made checklists, we finally decided on assigning students to
only respond to those items that applied to our specific situation, and to add issues
that we considered important, but that had not been addressed in existing checklists
for item writers, test designers, interlocutors, raters, and markers. For example, the
questions related to audio text recordings or the equipment used to play them were
eliminated from the Listening task checklist as the students were not engaged in audio
and video recording production. They responded to items addressing the use of the
internet, but not all of those related to equipment, as they did not have a choice of
equipment, apart from their mobile phones. The question ‘Is the language of the
rubric/item/option accurate?’ needed specifying as the students did not understand
what they were expected to concentrate on, and just ticked the box when evalu-
ating their own and the other students’ tasks. We found that the grammar mistakes
the students most typically made were failing to agree subject-predicate, missing out
the indefinite article when needed, and not sticking to the word order in questions.
This is why we introduced three subcategories for this item:
The language of the item is grammatically accurate when:
a) The verb is used in the third person singular (with -s) in the Present
Simple.
b) There is an auxiliary verb after the question word.
c) There is a/an before singular countable nouns.
Checklists should cover all necessary steps that have to be carried out when
selecting, adapting, and designing a task, but at the same time should be practical
and realistic tasks for our students. Bearing this in mind, we thoroughly considered
items to be deleted in order to minimize the amount of reading by students (there
was a risk they would overlook some points when bored, tired, or confused), and
to maximize their success with using checklists. Another option was to break
down some longer checklists into separate checklists. For instance, we did not offer
the use of Speaking production assessment checklist at the initial stage of learning, and
students filled in Speaking production assessment checklists (for grammar/range of voca-
bulary/coherence/cohesion/pronunciation/fluency) one at a time while studying the
same oral performance sample. Our rationale was that it is still a challenge for
students to assess all necessary aspects simultaneously in one oral performance.
Therefore, they were trained to assess one aspect at a time using a corresponding
checklist. The list of all the checklists designed in line with the course is presented
in Appendix C.
94 Olga Ukrayinska
Method
Quantitative and qualitative methods were used for collecting and analyzing
the data related to the efficacy of using checklists in the learning process. This
chapter discusses the results based on the application of the qualitative method.
The applied research methods include systematization of fundamentals of
checklist design and implementation, critical analysis of the checklists and the
products made by the students with their help, simulation of quizzes/tasks/
scales development, and the piloting and application processes at the uni-
versities. Practical methods are qualitative analysis of the items in the checklists,
namely their inclusiveness and clarity.
Using checklists 95
The data presented in the present study were collected through direct
observation and through qualitative analysis of the tasks by the students having
used the tailored checklists. The sample of participants included more than 150
students in three universities in Kharkiv, Ukraine. The participants were those
students who performed the role of item writers, raters, markers, and those
who contributed as assessees.
The sample using the elaborated checklists consisted of, on average, 21-year-old
students with practically no teaching experience.
We also decided to add the option ‘Imperatives’ (Find and circle…; Check off
the presence of the KEY; Ask a question to encourage the student to speak), as our
checklists are meant to be guidelines and this format fits this context.
In some checklists there is either a space or an extra column for comments,
explanations, and/or references. For all task evaluation checklists to be filled
in by the student’s groupmates, we provided a space for their comments in
case they didn’t check something off or found it inappropriately done. More
complex checklists are normally divided into sections to classify phenomena
under analysis. We found this practice useful as it was a way to attract stu-
dents’ attention to particular things and keep them concentrated on these
subcategories (e.g., Item: Stem, options).
96 Olga Ukrayinska
Implications
Having experimented with checklists and obtained more than satisfactory
results, we are now in a position to claim that this pedagogical tool is suitable
for use in developing LAL pre-service due to its qualities of being straight-
forward, gradable, and flexible. We have determined the practical value of
checklists, though we have not yet explored their full potential. The pre-
sented general characteristics, and the description of the structure, functions,
and ways of using checklists may serve as a foundation for further researches
in this area. Despite the limited scope of this study, our findings can provide
broader implications for university teachers and teacher trainers. These indi-
viduals can further develop and experiment with checklists reflecting their
particular instructional context and the intended outcomes. Checklists are a
practical tool for teachers to track their students’ competency. As we see it,
developing LAL with the help of a checklist is an integral part of the assess-
ment literacy development of preservice teachers. This is an opportunity for
them to develop their professional competency, and to create partnerships
with teachers and younger students. Moreover, students can learn to design
checklists themselves and will thus acquire this tool as their learning strategy.
The results specified in checklists will facilitate their reflection on learning. It
is crucial for future teachers to get real, pre-service teaching practice while
they still have an opportunity to make mistakes, to learn, and to get remedial
help on the spot, which will prevent future problems.
Conclusion
The present chapter was dedicated to the study of the theoretical grounds and
practical application of checklists for developing student teachers’ language
assessment literacy. Theoretical findings and ready-made checklists were revis-
ited with the view of adapting them to the targeted learning context and our
specific goal. We outlined the general characteristics of checklists and fine-
tuned them. This chapter reports the findings of the experimental teaching of
students doing their master’s degrees. The results obtained make us believe that
the use of standardized checklists helps make the learning process more effi-
cient, accurate, and reliable. Checklists provide a practical guide for tasks and
rating scales design, and they indicate what constitutes each process based on
target learners’ characteristics. These standards can be further refined and
applied for other particular tasks. The chapter presents a sample of customized
checklists to enable university teachers and other experts to evaluate them. The
use of checklists is expected to make the teaching and evaluation process more
efficient, accurate, and reliable. Future studies will focus on the reliability and
validity of the elaborated checklists and present statistical evidence of their
efficiency.
98 Olga Ukrayinska
Structure
Checklists have minimum of two columns and maximum of six columns. Some
have spaces for comments. The number of columns depends on the number of
people using the checklist. Thus, checklists with two columns are to be used by
individual students only or by the teacher. Checklists with three columns are to be
completed by the student and by the teacher evaluating their product or the
appropriacy/accuracy of the observation with the teachers’ comments below the
table. Checklists with four columns are to be filled in by the student and three other
groupmates, evaluating the product of the student with the student’s comments
included below the table. Checklists with five columns are to be used by the stu-
dent and three groupmates, and checklists with six columns are to be completed by
the student, three groupmates, and the teacher. The patterns vary depending on the
function of the checklist. If meant to guide, then they are completed by only the
student themselves. If used for evaluation and quality assurance, then they are filled
in by other people and are followed by their commentaries.
Content
The range and number of items should be well-grounded.
Items in checklists should be based on learning objectives and reflect stages
of task development. They should be standards specific for the context,
Using checklists 99
Functions
From a student’s perspective:
To guide the processes of task or input selection, task or input adaptation, item
writing, rating scales design and assessment of oral and written performance.
To remind the student of the range of elements to include and order of steps to take
to demonstrate the completeness of the task.
To evaluate the elaborated products (peer and self-evaluation).
To prevent errors in elaborated tasks and written performance
to encourage to reflect on the results (both successes and failures).
To guide students through the selection, adaptation, and design processes, and
to check the processes compliance.
To control the accuracy and completeness of steps taken by students.
To standardize students’ behaviour.
To remind students of the range and order of things to do.
To identify whether key steps have been taken.
To identify the presence or absence of conceptual skills.
To prevent errors.
To evaluate the results.
To know if students need assistance or further instruction.
To simplify complex tasks.
Types
Operational checklists are used to standardize assessment procedures.
Observational checklists are used by students to assess oral and written
performance.
Evaluative checklists are used to evaluate elaborated products for quality assurance.
Application
Checklists are used after the lectures are given, when students have been
familiarized with the terminology and basics of assessment procedures.
They are made use of throughout the course at the planning, develop-
ment, and administration stages.
100 Olga Ukrayinska
References
AbdelWahab, M. M. (2013). Developing an English language textbook evaluative
checklist. Journal of Research & Method in Education (IOSR-JRME), 1(3), 55–70.
Alberta Education. (2008). Assessment in mathematics: Assessment strategies and tools: Obser-
vation checklist. Alberta Education. http://www.learnalberta.ca/content/mewa/html/a
ssessment/observation.html
Association of Language Testers in Europe (ALTE). (2001). Resources – free guides and refer-
ence materials. Association of Language Testers in Europe. https://www.alte.org/Materials
Association of Language Testers in Europe (ALTE). (1998). Studies in language testing:
Multilingual glossary of language testing terms. (Vol. 6) Cambridge University Press.
Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice: Designing and devel-
oping useful language tests. Oxford University Press.
Cambridge Assessment. (n.d.). Checklist to improve your writing – Level C1. Cambridge
University. https://www.cambridgeenglish.org/Images/286979-improve-your-
english-checklist-c1.pdf
Collins. (2020). Checklist. Collins English dictionary. Collins. https://www.collinsdictiona
ry.com/dictionary/english/checklist
Council of Europe. (2011). Manual for language test development and examining. Council of
Europe. https://rm.coe.int/manual-for-language-test-development-and-examining-
for-use-with-the-ce/1680667a2b
Dudden Rowlands, K. (2007). Check it out! Using checklist to support student learn-
ing. The English Journal, 96(6), 61–66.
Educalingo. (2020). Checklist. Educalingo: The dictionary for curious people. Educalingo.
https://educalingo.com/en/dic-en/checklist
Fiderer, A. (1999). 40 rubrics & checklists: To assess reading and writing. Scholastic Inc.
106 Olga Ukrayinska
Fuentes, E. M., & Risueno Martinez, J. J. (2018). Design of a checklist for evaluating
language learning websites. Porta Linguarum, 30, 23–41.
Fulcher, G. (2015). Re-examining language testing: A philosophical and social inquiry.
Routledge.
Gawande, A. (2009). The checklist manifesto: How to get things right. Metropolitan Books.
Green, A. (2013). Exploring language assessment and testing: Language in action. Routledge.
Khalifa, H., & Salamoura, A. (2011). Criterion-related validity. In L. Taylor (Ed.), Stu-
dies in language testing: Examining speaking: Research and practice in assessing second lan-
guage speaking. (Vol. 30) (pp. 259–292). Cambridge University Press.
Lexico. (2020). Checklist. Lexico: The English dictionary. Lexico. https://en.oxforddictiona
ries.com/definition/checklist
Mukundan, J., & Nimehchisalem, V. (2012). Evaluative criteria of an English language
textbook evaluation checklist. Journal of Language Teaching and Research, 3(6), 1128–1134.
O’Sullivan, B., Weir, C. J., & Saville, N. (2002). Using observation checklists to validate
speaking-test tasks. Language Testing, 19(1), 33–56.
Osborn, A. (1953). Applied imagination: Principles and procedures of creative problem solving.
Charles Scribner’s Sons.
Scriven, M. (2007). Key evaluation checklist. https://wmich.edu/sites/default/files/atta
chments/u350/2014/key%20evaluation%20checklist.pdf
Scriven, M. (2000). The logic and methodology of checklists. https://web.archive.org/
web/20100331200521/http://www.wmich.edu/evalctr/checklists/papers/logic%
26methodology_dec07.pdf
Strickland, K., & Strickland, J. (2000). Making assessment elementary. Heinemann.
Tsagari, D., Vogt, K., Froehlich, V., Csépes, I., Fekete, A., Green, A., Hamp-Lyons, L.,
Sifakis, N., & Kordia, S. (2018). Handbook of assessment for language teachers. The Eur-
opean Commission.
WebFinanceInc. (2020). Checklist. In Business dictionary. WebFinanceInc. http://www.
businessdictionary.com/definition/checklist.html
Weir, C. J. (2005). Language testing and validation: An evidence-based approach. Palgrave
Macmillan.
Wilson, C. (2013). Credible checklists and quality questionnaires: A user-centered design method.
Elsevier.
Wu, R Wi-fen. (2014). Studies in language testing: Validating second language reading
examinations: Establishing the validity of the GEPT through alignment with the Common
European Framework of Reference. (Vol. 41). Cambridge University Press.
Chapter 7
Introduction
The use of the English language in the Arab world and United Arab Emirates
(UAE) has grown. This growth has had an impact on academic achievements
and studies. Hence, one of the requirements to be enrolled at a university in the
UAE is to have English language skills that meet specific standards. Students at all
levels must develop their English as they progress through their school years until
they reach the university level. Due to the rapid changes in education, testing
language skills has become important in measuring students’ language ability.
Several tests have been used as a tool to exit high school and proceed to higher
education, or to exit into a desired major at universities. Research has been
conducted on the effectiveness of these tests as a tool of language measurement
not only within the UAE, but also worldwide (Freimuth, 2014; Gitsaki et al.,
2014; Raven, 2011).
The International English Language Testing System (IELTS) test measures
students’ skills in reading, writing, listening, and speaking. This test may impact
students’ language usage and their performance of university or school tasks
(Ata, 2015). This test has been used as an entrance tool for higher education
institutions in the UAE, as well as for exiting foundation studies into specific
majors of interest (Freimuth, 2014). Despite all the data on the efficiency of
IELTS tests, this efficiency still differs from one culture to another and from
one student to another. This is because various conditions affect the imple-
mentation of the test, the test takers, and the band classification result. The
amount of test preparation by students is another factor that affects test results
(Gitsaki et al., 2014). Students at a particular level or from similar cultural
backgrounds might have common errors when taking the test, and these can be
revealed either during preparation courses, or by the band result (Ata, 2015;
Hughes, 2003) The IELTS test score has influenced the study plans of those
joining colleges and universities within EFL communities. As research has
stated, there is a correlation between test scores, students’ coursework, and how
108 Fatema Al Awadi
Social-
Constructivism
Theory of
Language
Figure 7.1 A chart showing the correlation between the theories in the theoretical
framework
An investigation 109
Method
Findings
As the students’ progressed through the levels in their education, it seems they
further understood the narrative history genre, which was represented through
the increased number of topic words to narrate events. The students used several
words to narrate their experiences to the reader and to represent the events’
sequence and time, and to reflect upon their own teaching performance. For
instance, student A in her level 1 reflection, used minor topic words to indicate
the time, such as ‘in the third week’. Meanwhile, she increased the use of
sequential narrating words when writing her level 7 reflection, such as ‘then’,
‘firstly’, ‘after twenty minutes’, and ‘in addition’. Students A, B, and C also
demonstrated an understanding of the purpose of writing reflective journals,
which included their opinions on their teaching, and which identified issues
along with providing recommendations (see Table 7.2). I assume that the stu-
dents developed an understanding of adding sequential topic terms to the narra-
tive as they proceeded in learning further aspects to improve the genre context.
In addition, in most of the reflections, students used an informal tone with some
formality on occasion. They employed the topic vocabulary and expressions to affect
the tone of the text. Some of these informal expressions were ‘I was delighted’, ‘I felt
happy’, ‘playing hangman game’, ‘relation with the student’ etc. On the other hand,
the students used words to increase the formality of the context such as: ‘rule viola-
tions’, ‘instructor’, ‘my mentor’, ‘special need student’, ‘school facilities’, ‘meetings’
etc. There was also frequent use of first person pronouns, which raised the level of
informality, as the writers were narrating the history of their practicum experiences.
Grammatical cohesion
With reference to the literature about the importance of the grammatical
cohesion in reflections in particular, and the narrative history genre in general,
followed by ‘suddenly’ to indicate a sudden change in the event; she also used
‘after that’ to follow the sequence of what she narrated. Similarly, she added a
controversy in the event using ‘however’ to indicate that the event changed into
the opposite. In contrast, student A tended to use coordinating linking words like
‘and’ and ‘but’ more frequently in her reflection, and she used other conjunc-
tions less frequently than other students. In spite of this, her reflection pieces still
had a basic type of cohesion since the use of conjunctions was of either sub-
ordinating or coordinating conjunctions. This development in the use of con-
junctions showed that the students gained more language input, which led to
further understanding on the use and functions of conjunctives within the genre.
The existence of different types of references in the focus group’s reflections
showed the achievement of further cohesion. First, the analysis of the texts
revealed that the students used personal pronouns to refer to other participants
in reflections and to avoid repetition. This was clear through the use of ana-
phoric referencing using the 3rd person pronoun ‘he’, ‘she’, ‘they’ or ‘it’ (see
Table 7.3). To elaborate, student A in her level 4 reflection used the third
person pronoun ‘she’ to refer to her MST (Mentor School Teacher) in the
previous sentence. Student C used the pronoun ‘it’ to refer to the activity the
student wanted to choose. Similarly, student B used ‘it’ to refer to the noun
‘language’ in her level 6 reflection. Second, since students A, B, and C were
narrating their own experience, they used the first person pronoun ‘I’ to refer
to themselves as the main character in their own reflections (see Table 7.3).
Third, the students used possessive pronouns for the purpose of possessive
reference, such as the third person possessive ‘their’. For example, student A
used ‘their’ in reflection 7 to refer to the class that she will be teaching. Also,
student B used ‘their’ to refer to the students mentioned in reflection 4, which
was also used in student C’s reflection 2 for the same purpose (see Table 7.3).
118 Fatema Al Awadi
Furthermore, the focus group used the object pronoun ‘her’ to refer to a sin-
gular character in the story, as in student C’s reflection 3, where it was used as a
reference to a girl in the class. The plural form ‘them’ was used more often to
refer to characters in the narrative history genre, and thus referred to the ‘stu-
dents’ in almost all the reflections. Additionally, student A used the reflective
pronoun ‘themselves’ to refer to the ‘literacy class’ in reflection 7 as well as
using ‘myself’ to refer to herself in the same reflection. While student C used
‘themselves’ to refer to ‘one by one student’ and ‘their friends’, student B didn’t
make use of reflexive pronouns in her texts. This is also applied to the use of
the possessive (-s’, -‘s’) to relate things to the nouns in the writing, and this was
mostly found in student A’s reflections.
In addition, the students used mostly the determiners ‘this’, ‘that’, ‘those’,
and ‘these’ for referencing purposes, which in most occasions preceded the
noun (Figure 7.3). These determiners were mostly used by all students to refer
to nouns such as ‘class’, ‘lesson’, ‘strategy’, ‘student’, and ‘teaching practice’ etc.
Only student C used ‘those’ when she quoted a sentence from a book where it
was used to refer to ‘teachers who did not have high quality relationships’.
‘That’ was also mentioned in student C’s reflection 1 to point to ‘warning for
the second time’, and it was also used by student B in reflection 5 to refer to
the sentence ‘eyes on me’. Student A used ‘that’ to point to ‘a brilliant lesson
plan’ and ‘preparing a great lesson’ in reflection 3. This shows that the students
understood that the reflective genre is related to their personal experience in
practicum and employed the determiners to refer to the participants in the text.
Another feature of grammatical cohesion was the use of ‘article reference’.
Students used the definite article ‘the’ at several places in the text to refer back
to something that was introduced previously. Figure 7.4 shows that the stu-
dents used ‘the’ mostly when talking about ‘class’, ‘time’, ‘students’, ‘classroom’,
‘teacher’, etc., which all were introduced previously in their reflections and
were mentioned with the definite article ‘the’ since the reader was familiar with
the concept. This demonstrates the students’ ability to relate ideas and showed
links that were developed as they progressed.
Lexical cohesion
Regarding the lexical cohesion aspects found in the reflections, different items
were noticed. The first key device was the use of ‘word repetition’ as part of
tracking the readers’ attention along the reading, which was also mentioned as part
of key topic vocabulary use. Repeating vocabulary as in Table 7.2 helped to tie
together the ideas in the reflections, such as the repetition of the phrases and
vocabulary like ‘students’ learning’, and ‘teacher’ (see Figure 7.5). Student A
repeated ‘students’ learning’ which was the focus of the teacher during that lesson.
Similarly, student B, in her eighth reflection, discussed the importance of the
‘questioning technique’ in the classroom. Thus, she repeated words such as
‘questions’, ‘questioning techniques’, ‘answers’ and ‘guessing’, to keep the
attention on the same topic being discussed.
A similar example can be found in what student C discussed in her first
reflection – that she will observe her Mentor Teacher and consider her tasks
and responsibilities. Therefore, ‘role and responsibilities’ was repeated twice to
Figure 7.4 Excerpts showing the use of ‘the’ for referencing purposes
120 Fatema Al Awadi
Figure 7.5 Excerpt from student A’s reflection 6 showing the repetition of lexical
terms
emphasize the concept. Also, ‘teachers’ and ‘students’ were repeated a lot in the
reflection to lead the narrative (see Figure 7.6). Student A repeated words such
as ‘misbehaviour’, ‘violently’, ‘lesson plan’, and ‘attention’ as well, to emphasize
the importance of a lesson plan in controlling behaviour. This shows that stu-
dent A had a previous ability to tie the ideas together and link points through
‘word repetition’, while the other two improved this as they progressed.
Linked to this point, the students used another feature of lexical cohesion,
which was the synonym. It is true that the amount of repetition exceeded the
number of synonyms in the text, but still some synonyms existed in some
reflections of each student. Student C used ‘students’ rather than saying ‘boys
and girls’ (see Table 7.6). She also used synonyms of ‘teacher’ in the same
reflection like, ‘tutor’ and ‘instructor’. Another use of synonyms was in stu-
dent’s A reflection 1 when she wrote ‘happy – delighted’ in the same text, and
she used synonyms in reflection level 3, such as ‘misbehaviour problems –
misbehaviour actions’, ‘confident – self-assured’ and ‘kids – students – children’
which showed a variation in using different terms that meant the same (see
Table 7.4). Student B used synonyms in reflection 5 such as ‘involve – engage’
and ‘kids – students’ (see Table 7.5). However, she had an error at the end of
reflection 5 when using the synonyms ‘in my opinion’ and ‘I think’ as one
whole phrase at the beginning of the sentence ‘so in my opinion I think my
…’. Considering these data and IELTS preparation course materials, the use of
synonyms shows the students’ attempts to use a range of vocabulary which
could be due to increased language knowledge.
Students appeared to use the antonyms to show contradictory ideas under
similar topics (see Table 7.4, Table 7.5, & Table 7.6). For instance, student A,
in reflection 3, mentioned the following antonyms: ‘kids – adults’ and ‘nega-
tively – effectively’ to indicate classroom management strategies and their
effectiveness. As shown in Table 7.4, student A used ‘higher ability students –
lower ability students’ to show the differences between the levels, ‘together –
one student’ to demonstrate the differences in interaction patterns in class,
‘damaged – fixed’ and ‘the circuit was broken – the circuit must be connected’
to show how she explained and helped the students in the exploration to dis-
cover differences. There was more effective use of antonyms in student A’s
reflection 8, which added a different type of link between the ideas discussed
when she taught a science lesson. Conversely, both students B and C used
fewer antonyms than student A. Student B used antonyms in the first and last
reflections such as ‘confident – low self-esteem’, ‘correct – wrong’ and ‘critical
answer – guessing’, and student C used three antonyms in reflection 5 as the
highest number among reflections, but in the last two reflections she did not
use any.
This analysis shows also the use of collocations as another device to increase
the text cohesion and make the written piece more predictable. All the students
tended to use similar collocations related to the topic vocabulary. These are
demonstrated above in Table 7.4, Table 7.5, and Table 7.6. They were ‘lan-
guage skills’, ‘teaching strategies’, ‘paying attention’, ‘class time’, ‘checking
understanding’, and a ‘teacher assistant’. It appeared that use of collocations in
the reflections increased accordingly, depending on the topic discussed and as
students progressed. To elaborate, the increase occurred when the students
reflected on their teaching experiences rather than when they were observed
for a task to be completed as part of the practicum requirement. For example,
student A in reflections 4, 1, 7, and 8 used more collocations as she talked
about her own experience in teaching and working in schools, while in
reflections 2 and 3 she narrated observation experiences and was instructed to
write about topics such as ‘observing the school mentor’ and ‘the importance of
Table 7.4 Examples of synonyms, antonyms and collocations in student A’s reflections
Focus group Reflections Synonyms Antonyms Collocations
Student A Reflection 1 Student – kids - Built a good relationship
Delighted – happy Played a game
Special need students
Inappropriate behaviour
Every week
Teaching practice
In my opinion,
Reflection 2 Lesson preparation
122 Fatema Al Awadi
Discussion
To answer the stated research focus questions, the qualitative data will be dis-
cussed. For this purpose, students’ reflections from different education levels,
IELTS reports, preparation courses materials, and student interviews will be
examined to answer the research questions and hypothesis. The hypothesis that
students with prior preparations for the IELTS test could perform better in
writing reflections, and the fact that their differences in IELTS bands could
correlate with achievement in writing reflections, are considered when grading
students’ exams.
The data revealed different results related to the correlation of IELTS bands
and students’ writing of reflections. The first result found is that the IELTS
band cannot reflect all the students’ actual language ability accurately (Ata,
2015; Freimuth, 2014). For example, some students were recognized to be at a
higher level and already performed strongly in writing reflections, as in the case
of student A. The reason is that student A developed an understanding of
writing the genre of reflections categorized as a narrative history genre (Dere-
wianka, 2000; McGuire et al., 2009). In addition, the students developed a
sense of writing the narrative history genre, which may be due to the devel-
opment of writing skills and the increase in language level as they progressed
126 Fatema Al Awadi
(Harmer, 2001). The reason for this is the efficiency in using the topic voca-
bulary and genre register, which added clarity to the content being discussed
(Quirke et al., 2009). This cannot guarantee the students’ successful under-
standing of the genre, but rather it can represent the understanding of writing
reflections due to course requirements, or being trained to write such texts
(Lightbown & Spada, 2013; Wingate & Tribble, 2012). Therefore, the stu-
dents with high language ability can be expected to have a wide vocabulary,
which can be employed to support the text register and enhance the clarity of
the genre (Bawarshi & Reiff, 2010). This also leads to representing the social
context and the purpose of writing reflective journals through the presenta-
tion of terms that identify the narration, as well as providing critical opinions
to mirror the teaching experiences and recommend further improvements in
learning and teaching (Bawarshi & Reiff, 2010; Lukin et al., 2011). More-
over, based on the interview answers and the previous analysis, students A and
B already had very good language skills, which didn’t relate to the IELTS test
band. This was related to to having strong foundational knowledge of English
from instruction in schools, or when entering university. Some students with
good language skills considered the requirement to take the IELTS test to be
part of completing their studies (El Massah & Fadly, 2017; Freimuth, 2014)
rather than as a way to improve their English, which shows a weakness in the
IELTS test as a testing tool.
In contrast, student B’s language ability was lower than the received IELTS
band when she joined the education program. Hence, that low language ability
was shown in her writing of reflective journals through less coherence, and less
grammatical and lexical cohesion. Perhaps the reason for having an IELTS band
that is higher than the student’s actual language skills could be due to the stu-
dent being trained to answer the questions and practiced in how to deal with
IELTS test types (Hughes, 2003; Mahlberg, 2006; Panahi & Mohammaditabar,
2015). However, writing reflective journals proved that free writing strategies
can show students’ actual abilities in understanding and presenting writing
aspects. Writing reflective journals can also affect written genre comprehension
and the ability to present the purpose of the text, these abilities can also be the
result of having a wide range of grammatical resources and vocabulary
(Cameron, 2001; Leech et al., 2001).
The second result identified is that IELTS bands can still indicate a change in
some students’ language performance, even if that change is considered to be a
slight difference from the actual English language ability (Gitsaki et al., 2014;
Raven, 2011). The representation of the IELTS band can be shown either as
the student’s recent performance or a development in English language skills.
The findings showed that student C’s writing performance matched the IELTS
band 5.5 and student B’s language development; however it did not match
student A’s performance (Brown, 2007; Spencer, 2009). Perhaps this is because
when student C first joined the university’s education programme, her lan-
guage level fell within the range of an IELTS band, which was indicated by her
An investigation 127
understanding not only of the IELTS test’s writing genres, but also of reflec-
tions (Asador et al., 2016; Wilson, 2016). However, with student B’s case, and
due to the development of language that occurred at the end of her education
years, the IELTS band described the language learning level, which matches the
analysis of the reflections done in the previous section.
Therefore, a further result can be arrived at: The change in student B’s level
was due to taking preparation courses to improve her language, which was
stated in the interviews questions (Hughes, 2003; Panahi & Mohammaditabar,
2015). This led to the student raising her understanding of writing reflections,
learning how to represent a tied, coherent genre through using a text register,
developing her use of punctuation, and providing a line of connected thoughts
using the grammatical cohesion (Bawarshi & Reiff, 2010; Harmer, 2004; Lukin
et al., 2011). In reflective writing, the student was able to present opinions on
her own teaching, think critically about issues and identify solutions, as well as
judge the classroom experience (McGuire et al., 2009). Student C shared the
same opinion about taking the IELTS band and preparation courses, and
showed similarity between the evaluation of her reflections and the match with
her second score.
Limitations
The study has limitations due to the size of the focus group, as only three stu-
dents were selected for the research. Therefore, the results cannot represent all
English language learners taking the IELTS test, and cannot represent all cases
of students writing reflections in higher education. Also, the research only
involved female students, and therefore cannot be generalized over male lear-
ners. Another limitation is that, due to the time at which the research was
conducted, it was difficult to gain the tools easily from the students, as some
had already graduated and other had lost some documents. For example, stu-
dents A didn’t have the first IELTS band, and students C had not yet reached
level eight and therefore hadn’t completed the eighth reflection. The number
of students taking the preparation courses was considered low, which is not
enough to measure their effectiveness with the consideration of other students’
attitudes toward the course.
Recommendations
Two main types of recommendations are stated to enhance this study; one is
related to the improvement of writing reflective journals, and the other is
related to further research to be conducted related to this study. Based on the
findings, it is recommended that language learning must not be separated from
college assignments, especially if that requires the use of language to write a
report, reflection, or essay. Another aspect is that EFL learners require further
consolidation of language even if they are at a college level, and more emphasis
128 Fatema Al Awadi
Conclusion
This study was implemented to explore the correlation of IELTS band and
preparation courses to students’ writing of reflective journals in one of the UAE
higher education institutions. The research found that the IELTS band cannot
be understood as a final judgment, as there are different factors affecting it.
Students’ level in English, previous language knowledge, study input, and
preparation courses can all affect the IELTS test as a tool for language testing
(Ata, 2015; Quirke & Zagallo, 2009). Yet, the study revealed that preparation
courses had an impact on reflective writing as they added further input and
consolidation of the language for two participants.
The students’ language and writing showed improvement as the students
progressed through the levels. They built further confidence towards writing
reflective journals, which was observed through their reflection samples (Bam-
berg, 2011; Hughes, 2003). Their ability to critically identify issues, anticipate
solutions, critically judge the experience, and provide opinions on and recom-
mendations for further enhancement were effectively constructed (Hyland,
2007). That change was notable at the end of the study levels and after the
second IELTS band test, which could be due to their education studies or
preparation courses.
The students’ development in English writing and other skills was identified
through the students’ responses to questions in the interviews. The IELTS
reports, and materials from the preparation courses provided a clearer idea of
the amount of change in the language. The students who took the preparation
courses were provided with opportunities to practice and improve their writing
and language skills while preparing for the IELTS test. Lessons in the prepara-
tion courses were designed to cover broader skills of writing and text organi-
zation than just those required for IELTS writing tests (Panahi &
Mohammaditabar, 2015; Slavin, 2014; Wilson, 2016). However, the student
An investigation 129
who didn’t take the course also developed her language abilities, which can be
related to the input given through her college study. In spite of this, the IELTS
score was considered highly important simply because it is a requirement; as the
students stated in the interviews, there were other factors leading to the change
in their language and writing of reflections.
The amount of knowledge gained through education courses and the
assessment of reflections also helped to shape the students’ understanding of
reflective journals. The students learned to narrow their focus to narrating the
practicum experience and they identified strengths and weaknesses (Spencer,
2009). The findings also show that the course criteria for language assessment
may not match the ones for IELTS, which presents another factor to support
that IELTS requirements may differ from academic writing at university. To
conclude, students’ second language learning performance and ability cannot be
judged by only a language test such as the IELTS. The students face challenges
in learning language and understanding different genres, which they can over-
come through additional practice and input given by instructors.
References
Asador, M., Marandi, S., Vaezi, S., & Desmet, P. (2016). Podcasting in a virtual English for
academic purposes course: learner motivation. Interactive Learning Environments, 24(4),
875–896.
Ata, A. W. (2015). Knowledge, education, and attitudes of international students to
IELTS: a case of Australia. Journal of International Students, 5(4), 488–500.
Bamberg, M. (2011). Narrative discourse. Narratology beyond literary criticism: Mediality,
disciplinarity. In J. C. Meister, T. Kindt, & W. Schernus (Eds.), (pp. 213–237). Walter
de Grutyer.
Bawarshi, A., & Reiff, M. (2010). Genre: An introduction to history, theory, research and
pedagogy. Parlor Press.
Berk, L. (2009). Child development. Pearson.
Brewster, J., Ellis, G., & Girard, D. (2002). The primary English teacher’s guide. Penguin
English.
Brown, P. (2007). Reflective teaching, reflective learning. MA. Thesis. University of
Birmingham.
Buri, C. (2012). Determinants in the choice of comprehensible input in science classes.
Journal of International Education Research, 8(1), 1–18.
Burton, J. (2009). Reflective writing-getting to the heart of teaching and learning. In J.
Burton, P. Quirke, C. Reichmann, & J. Peyton (Eds.), Reflective writing: a way to life-
long teacher learning (pp. 1–11). TESL-EJ Publications.
Cameron, L. (2001). Teaching languages to young learners. Cambridge University Press.
Creswell, J. (2012). Research design: qualitative, quantitative, and mixed methods approaches.
Sage Publications.
Derewianka, B. (2000). Exploring how texts work. Primary English Teaching Association.
El Massah, S., & Fadly, D. (2017). Predictors of academic performance for finance stu-
dents: Women at higher education in the UAE. The International Journal of Educational
Management, 31(7), 854–864.
130 Fatema Al Awadi
Emmitt, M., Pollock, J., & Komesaroff, R. (2003). Language and learning: An introduction
for teaching. Oxford University Press.
Freeman, D., & Freeman, Y. (2001). Between worlds: Access to second language acquisition.
Heinemann.
Freimuth, H. (2014). Cultural bias in university entrance examinations in the UAE. The
Emirates Occasional Papers, 85, 1–81.
Gitsaki, C., A. Robby, M., & Bourini, A. (2014). Preparing Emirati students to meet
the English language requirements for higher education: a pilot study. Education,
Business and Society: Contemporary Middle Eastern Issues, 7(3), 167–184.
Harmer, J. (2004). How to teach writing. Pearson.
Harmer, J. (2001). The practice of English language teaching. Longman.
Hyland, K. (2007). Genre pedagogy: Language, literacy and L2 writing instruction.
Journal of Second Language Writing, 16, 147–164.
Hughes, A. (2003). Testing for language teachers. Cambridge University Press.
Johnson, K. (1999). Understanding language teaching: Reasoning in action. Heinle ELT.
Johnston, N., Partridge, H., & Hughes, H. (2014). Understanding the information lit-
eracy experiences of EFL (English as a foreign language) students. Reference Services
Review, 43(4), 552–568.
Krekeler, C. (2013). Languages for specific academic purposes or languages for general
academic purposes? A critical reappraisal of a key issue for language provision in
higher education. Language Learning in Higher Education, 3(1), 43–60.
Kyriacou, C. (2007). Essential teaching skills. Nelson Thornes.
Leech, G., Cruickshank, B., & Ivanic, R. (2001). An A-Z of English grammar & usage.
Pearson.
Lightbown, P., & Spada, N. (2013). How languages are learned. Oxford University Press.
Lukin, A., Moore, A., Herke, M., Wegener, R., & Wu, C. (2011). Halliday’s model of
register revisited and explored. Linguistics and the Human Sciences, 4(2), 187–213.
Mahlberg, M. (2006). Lexical cohesion: corpus linguistic theory and its application in
English language teaching. International Journal of Corpus Linguistics, 11(3), 363–383.
McCarthy, M., & Carter, R. (1994). Language as discourse perspectives for language teaching.
Longman Group.
McGuire, L., Lay, K., & Peters, J. (2009). Pedagogy of reflective writing in professional
education. Journal of the Scholarship of Teaching and Learning, 9(1), 93–107.
Mills, J. (2014). Action research: a guide for the teacher researcher. Pearson.
Moore, T., & Morton, J. (2005). Dimensions of difference: A comparison of university
writing and IELTS writing. Journal of English for Academic Purposes, 4(1), 43–66.
Panahi, R., & Mohammaditabar, M. (2015). The strengths and weaknesses of Iranian
IETLS candidates in academic writing task 2. Theory and Practice in Language Studies, 5(5),
957–967.
Qin, W., & Uccelli, P. (2016). Same language, different functions: a cross-genre analysis
of Chinese EFL learners’ writing performance. Journal of Second Language Writing, 33,
2–17.
Quirke, P., & Zagallo, E. (2009). Moving towards truly reflective writing. In J. Burton,
P. Quirke, C. Reichmann, & J. Peyton (Eds.), Reflective writing: a way to lifelong teacher
learning (pp. 12–30). TESL-EJ Publications.
Raven, J. (2011). Emiratizing the education sector in the UAE: contextualization and
challenges. Education, Business and Society: Contemporary Middle Eastern Issues, 4(2),
134–141.
An investigation 131
Introduction
Language assessment is an important element of language education because it
feeds language teaching and can be used for different purposes. On the one hand,
assessment gives feedback about the effectiveness of instruction, the progress of
language learners in the target language, whether the expected learning outcomes
are achieved, what students learn or do not learn, and whether the syllabus, the
teaching method, and the materials are useful for the ongoing language learning
(Bachmann, 2005; Brennan, 2015; Hidri, 2018). On the other hand, assessment
helps teachers to make results-dependent decisions in order to improve their
instruction, to foster a better language learning context, as well as to evaluate
themselves in respect to their teaching (Rea-Dickins, 2004). Moreover, language
assessment can motivate both teachers and students – for teachers, they can find
out how effective their teaching is, and for students, they can detect their strengths
and weaknesses regarding their language development (Heaton, 2011). As a con-
sequence, they can be motivated to enhance their teaching/learning more.
Language teachers need to be competent in the target language they teach,
to know what and how to teach, and to know how to assess. Among these
three qualifications, language assessment can be considered an essential part of
professional competence because it is a teacher’s guide to planning their
teaching to support learning, and to regulating their pedagogical decisions.
Accordingly, it can be deduced that language teachers have two roles – a tea-
cher and an assessor (Scarino, 2013; Wach, 2012). In order to carry out efficient
and suitable assessment practices, language teachers need to be assessment lit-
erate. Language assessment literacy (LAL) means possessing the required
knowledge of assessment and the ability to perform assessment practices. If
teachers are assessment literate, then they can respond to the needs of their
educational context more effectively. Therefore, LAL is a fundamental aspect of
their teaching competence. However, to be able to assess effectively and take
advantage of assessment results appropriately are not easy as they seem. Since
language teachers are not born with the required competence or ability related
to assessment (Jin, 2010), they need to be trained and equipped with the
136 Aylin Sevimel-Sahin
Review of literature
Recently, there has been much more emphasis on language assessment due to
changing notions and approaches in language education. Earlier, the concepts of
teaching and assessment were considered separate, and assessment was performed
independently of teaching, especially, in the form of testing (Viengsang, 2016).
But as a result of sociocultural theories of learning which have also influenced the
domain of language assessment, the testing notion has been transformed into the
concept of assessment (Berger, 2012; Hidri, 2016; Inbar-Lourie, 2017;
O’Loughlin, 2006), which incorporates not only measuring language proficiency
of learners as in testing, but also monitoring and improving their progress in the
target language (Csépes, 2014). Hence, the importance of language assessment to
motivate and reinforce language learning by tracking the development of learners
has been realized. That is, tests or assessment tools can be used to improve
learning apart from just measuring or testing language knowledge (Heaton,
2011). So, assessment for learning has become more important than assessment of
learning. In line with these considerations, researchers have focused more on
how teachers can be effective assessors according to their teaching contexts in
order to support language learning, what kind of characteristics they need to have
for this, and how they can better administer assessment. For all these issues,
researchers have recently explored what it means to be language assessment lit-
erate, and whether teachers have acquired LAL necessities and have been able to
practice them in their classes successfully.
Conceptual framework
The concept of LAL originated from the term assessment literacy (AL) introduced
by Stiggins (1995) in general education. Yet, the concept of AL remained rather
general. Thus, researchers became concerned with the specific competencies of
teaching subject areas that may require different kinds of assessment. For for-
eign language teaching/learning environments, they tried to describe what
LAL refers to and what it constitutes. Also, since teachers are the main stake-
holder in foreign language education (Giraldo, 2018), the definitions of LAL
are mostly based on the assessment literacy of classroom teachers, which is also
called ‘language teacher assessment literacy’. LAL refers to the competence of
knowing what, when, and how to use assessment to gather information about
language learners’ development as well as using the results to enhance the
quality of instruction (Jeong, 2013). Nonetheless, Fulcher (2012) presented a
comprehensive definition of LAL:
LAL of novice EFL teachers 137
Considering this definition, LAL does not only mean the knowledge and skills
of assessment but also it deals with reasoning, impact, frameworks, and con-
textual and ethical issues related to language assessment. Similarly, Giraldo
Aristizabal (2018) indicated language teachers need to be competent both in
classroom assessment and large-scale testing to make their students develop in
the target language. Hence, Fulcher’s definition of LAL can be considered quite
extensive in terms of its meaning. However, it should be noted that the term
foreign language assessment literacy (FLAL) is preferred in the current study in
order to emphasize foreign language learning context; that is, the English as a
foreign language (EFL) context.
As for the constituents of LAL, researchers have reported their own frameworks
corresponding to their own LAL perspectives. For instance, Davies (2008) asserted
LAL is made up of ‘knowledge’ (defining language measurement and framework),
‘skills’ (designing, administering, and analyzing) and ‘principles’ (using properly
and ethically). In the same vein, Inbar-Lourie (2008) demonstrated the compo-
nents of LAL as ‘what’ is to be measured as language construct, ‘how’ assessment is
carried out, and ‘why’ a certain assessment practice is conducted. Fulcher (2012)
also suggested a three-layered model of LAL in which there are ‘practices’ (con-
structing, applying, evaluating), ‘principles’ (fundamentals, ethics, concerns) and
‘contexts’ (origins, frameworks, impact). In addition, Hill (2017) focused on the
elements of classroom teacher language assessment literacy and argued that LAL
consists of ‘practice’ (knowledge and skills), ‘concepts’ (own understanding and
beliefs), and ‘context’ (impact of teaching environment). Likewise, Inbar-Lourie
(2013) and Stabler-Havener (2018) emphasized the importance of teaching con-
texts with respect to LAL components and they pointed out that knowledge and
skills of assessment should be employed suitable to a teacher’s own context. Fur-
thermore, Giraldo Aristizabal (2018) and Hidri (2016) highlighted that beliefs,
conceptions, and previous experiences about language assessment affect the way
how LAL is constructed. As a conclusion, the proposed constituents of LAL in the
literature have a lot in common such as knowledge, skills, practices, concepts,
contexts, and principles.
When it comes to the characteristics of language assessment literate teachers,
several researchers found similar dispositions (Gotch & French, 2014; Huang &
He, 2016; Khajideh & Amir, 2015; Rogier, 2014; Ukrayinska, 2018). For
138 Aylin Sevimel-Sahin
to assess (Berry et al., 2017; Vogt & Tsagari, 2014). On the other hand, LAL
training given through teacher education programs was found to be fruitful.
For instance, Hilden and Frojdendahl (2018) demonstrated EFL teachers gained
certain assessment abilities and developed more learner-centred conceptions for
assessment by means of their teacher education training programs.
In spite of the benefits of training, especially with respect to knowledge base,
some other studies focused on the points missed by such training, because EFL
teachers came across difficulties in assessment despite their training, and as a
result, they found teacher education programs could not develop the LAL of
their future teachers as expected (Djoub, 2017; Fulcher, 2012; Gebril, 2017;
Klinger, 2016; Lam, 2014). For example, Turk (2018) indicated pre-service
training related to language assessment knowledge was good, but teachers were
not able to take effective assessment procedures. Also, Hatipoglu (2015) and
Lam (2014) reported such training was mostly based on testing-oriented issues
while highlighting the importance of testing in language learning/teaching.
Considering the points of lack in such training, most studies determined that
LAL training could not provide EFL teachers with practical assessment skills
(Hatipoglu, 2010; Lam, 2014; Sariyildiz, 2018; Sheehan & Munro, 2017; Yan
et al., 2018) and hence, EFL teachers could not perform appropriate assessment
strategies in their classes. In addition, Mede and Atay (2017) noted that EFL
teachers had difficulties in designing assessment and giving proper feedback due
to the lack of training in those points. Semiz and Odabas (2016) also revealed
training should include testing language skills apart from grammar and voca-
bulary. Likewise, Mede and Atay (2017), and Turk (2018) suggested EFL tea-
chers need training, especially in formative assessment, because the training they
had at university did not improve their assessment for learning strategies.
Moreover, Tsagari and Vogt (2017), and Vogt and Tsagari (2014) pointed out
that as EFL teachers criticized, LAL training did not emphasize the local con-
ditions with respect to language assessment, and therefore, they had difficulties
in responding to the assessment needs of their local contexts.
It can be seen that there have been notable attempts to investigate concept of
LAL in the literature. Yet, few of them have investigated how LAL is viewed
from the eyes of EFL teachers, especially the beginner ones, and what LAL
training contains, and whether it is beneficial to teachers. This issue is also
present in Turkey. Therefore, there is a need to understand how LAL is per-
ceived and experienced by in-service EFL teachers, especially by novice ones,
because they have just started in their teaching profession and their language
assessment knowledge and skills can be regarded as being updated and fresh.
The current study is designed to investigate what Turkish EFL teachers know
about the concepts of ELTE (English Language Testing and Evaluation) and
FLAL (Foreign Language Assessment Literacy), whether their training is suffi-
cient for their recent assessment practices, and how they put what they learn
into practice. In this way, the present study is believed to provide some insight
into the perceptions and experiences of novice EFL teachers with respect to
LAL of novice EFL teachers 141
their LAL, to reflect on the assessment training of novice EFL teachers, and to
contribute to the field of LAL research due to the limited numbers of similar
studies, especially with novice teachers.
For these purposes, the following research questions were addressed:
1 What do novice EFL teachers think about the concepts of ELTE and FLAL?
2 How do novice EFL teachers evaluate their LAL training taken at
university?
3 How does novice EFL teachers’ LAL training affect their practices of language
assessment in their classes?
Methodology
The present study was designed as a phenomenological study, which is one of
the qualitative research designs. The main goal of phenomenology is to
obtain a deeper understanding of the meanings of lived experiences through
describing peoples’ multiple perceptions to them by means of finding
common shared experiences of a concept or a phenomenon (Creswell, 2007;
Fraenkel et al., 2011; Patton, 2002). Since the purpose of this study is to
reveal the perceptions and experiences of novice EFL teachers about the LAL
concept, phenomenology was the preferred approach to investigate, describe,
and interpret the LAL commonalities among novice EFL teachers to get a
better understanding of LAL.
As for the participants, 22 novice EFL teachers working at state schools teaching
different language levels across Turkey took part in the study. They graduated
from the same university and the same program (ELT) within five years. Since this
study was conducted in the Spring Semester 2018, the graduates of 2013, 2014,
2015, 2016, and 2017 participated in the research. All of them had taken the
ELTE course by the same course lecturers with the same syllabus and the course-
book mentioned above. They were novice in-service EFL teachers because their
teaching experience ranged from six months to a maximum of five years. There-
fore, the sampling of the participants was based on purposeful sampling, and all the
participants took part in the study voluntarily (see Table 8.1).
Nodes
Name Files References
− FLAL-Eng (Alumni) 3 367
+ Shortcomings 2 12
+ Suggestions 2 88
− Difficulties in using 2 27
+ Definition of ELTE 1 44
+ Importance of ELTE 1 27
− Perceptions of own FLAL 1 44
Figure 8.1 The view of the themes and subthemes in NVivo 11 Pro Program
144 Aylin Sevimel-Sahin
was also calculated through Miles and Huberman’s (1994) suggested procedure and
the agreement was found high between the raters (97%).
Finally, in accordance with the determined themes, the findings were inter-
preted and discussed narratively by giving some examples from participants quotes.
Findings
The novice EFL teachers conveyed their thoughts and experiences about English
language assessment in terms of both their ELTE training at university and their
ELTE applications in their current classrooms. The analysis of their responses
yielded two themes: ‘FLAL-ELTE Concept’ and ‘Experiences about ELTE (past
& present)’ (see Figure 8.2). The sample commented more on experiences about
ELTE (past & present) (f=252) than the FLAL-ELTE concept (f=115).
The first theme is about how the FLAL-ELTE concept is perceived by
novice EFL teachers. For ‘FLAL-ELTE Concept’, novice EFL teachers first
defined what ELTE means. They mostly concentrated on the issue of language
proficiency (language skills and areas) and the aims of language learning. Most
of them indicated language assessment means measuring language proficiency
and determining the deficiencies in the target language. Still, only a few
reported that ELTE refers to the understanding of whether course aims are
achieved, the planning and evaluation of language learning progress, and
reporting the findings. For example, one noted that ‘assessment is to measure
the proficiency level of language learners as well as evaluating their successes
and progress in that foreign language’ (Low.T.10).
Apart from this, novice EFL teachers focused on the importance of testing
and evaluation in English language classes. Most of them underscored the fact
that ELTE is important because of its significant role in testing language profi-
ciency, determining and evaluating language progress, showing whether the
aims are achieved, and getting feedback about deficient points in that language.
For instance, one of the novice EFL teachers pointed out that:
As far as their perceptions of their own levels in FLAL – that is, how much
qualified they felt in English language assessment – half of the participants
perceived themselves to be highly competent (n=11) whereas only two tea-
chers felt themselves inadequate; others felt they could deal with assessment at
the moderate level (n=7). Nonetheless, only one novice EFL teacher, who felt
highly competent in terms of FLAL, mentioned that teachers should have the
LAL of novice EFL teachers 145
knowledge and skills of ELTE itself (i.e. knowing how to construct exams,
how to interpret results, and how to report them). In terms of FLAL compo-
nents, one underlined that:
The second theme revealed by the findings of novice EFL teachers’ statements
is ‘Experiences about ELTE (past & present)’. For ‘Experiences about ELTE
(past & present)’, they highlighted their ‘previous experiences about ELTE
course’ (f=198), and their ‘current experiences about using’ their knowledge
and skills in their professional lives (f=54). The first subcategory is about their
‘previous experiences about ELTE course’ they took in their ELT training
program. First, they mentioned what they acquired after they took the ELTE
course. For example, they stated they learned a lot about language assessment,
the criteria for effective testing (i.e. reliability, validity), how to design several
language testing tools to test students’ language skills, and the kinds of tests or
tasks that can be used to measure language proficiency. They also underscored
that the course was useful to prepare teacher candidates for their future teach-
ing career regarding language assessment, and their expectations and needs were
more or less met thanks to this course. Therefore, most of them found the
content of the course was sufficient and beneficial to their teaching lives. For
instance, some of them exemplified that:
Surely, this course [ELTE in teacher training] has helped me a lot. Simply
put, it has been helpful with respect to such topics as ‘how a language test
should be designed, what it should be covered, how the questions/tasks
should be organized, and how the evaluation should be made.
(Moderate.T.7)
On the other hand, despite being helpful, there were some aspects of the
ELTE course that were criticized by all the novice EFL teachers irrespective of
their perceived FLAL levels, and they, made some suggestions for further
improvement. For example, the course was found to be very theoretical and
lacking in the practical dimension because the respondents said they could not
LAL of novice EFL teachers 147
practice what they learned during the course until they were appointed. The
novice EFL teachers believed training about language assessment should have
included more practical exercises and implementations such as simulations to
perform tests. Besides, few of them underlined that the class hours were
inadequate, and the last semester was too late to take the course. For instance,
some commented that:
The content of the course [ELTE course] was satisfying in terms of theory
but the practical side was not enough.
(Moderate.T.20)
The [ELTE] course was very theoretical. More practice would have been
better. For example, for each language skill, students should prepare an
assessment tool every week, and the criteria for the evaluation should be
interpreted.
(High.T.1)
Besides, some novice EFL teachers argued that they did not have any
opportunity to test language learners in the practicum; they were only eval-
uated in terms of their teaching skill and not testing skills. Thus, according to
them, teacher candidates should be made to prepare and administer some tests
in their teaching practice for the sake of gaining experience in assessment in
addition to teaching. Therefore, one of them suggested that: ‘Before teacher
trainees start [in the] teaching profession, they should be provided with
experience [practice] in terms of assessment process by making them to prepare
at least a part of a language exam during micro-macro training [practicum
courses]’ (Moderate.T.5).
Moreover, some novice EFL teachers complained they did not learn any-
thing about the evaluation aspects of testing such as scoring and feedback. For
this reason, new topics such as analyzing test results should be added. One
participant teacher recommended that: ‘[To the ELTE course content], exer-
cises of preparing various test items, techniques of evaluation and analysis, and
statistical calculations [can be added as new topics]’ (Moderate.T.2).
Furthermore, some participant teachers discussed the great degree to which
the ELTE course content concentrated on testing grammar and vocabulary and
how some attention was given to receptive skills testing. They also emphasized
how they learned to design multiple choice tests, cloze tests, and the like.
Nevertheless, the testing of productive skills (speaking and writing) was not
much emphasized in the course or in other types of assessment, such as alter-
native assessment methods. For instance, one of them indicated that:
In order to make the ELTE better, some novice EFL teachers suggested
in-service training as a way to cover some of the missing parts of the
course, which would help to eliminate their inabilities related to language
assessment. For instance, they might learn from other teachers’ experiences
by sharing their testing techniques and discussing their assessment ideas.
Also, they assumed in-service training might be helpful to remind them of
previous ELTE knowledge and practices they learned as they complained
they forgot certain things about assessment throughout years. For that, one
of them stated that:
With respect to in-service training, some also reported that not all teachers had
language assessment training at their universities and they criticized that even
those who took ELTE training, might still be incompetent in language assess-
ment because not all universities took care of the course at the expected level
and to the expected extent. For instance, one of them argued that:
It did work [ELTE course was helpful at work]. For example, I pay attention
to the difficulty level of test items while preparing exams. I avoid giving clues
about correct answer in true-false test items. I try to be careful about the
clarity of questions. I prepared each test item which measures only one skill. I
pay attention to the length of the gaps in the fill-in-the-blanks test items.
(High.T.8)
Likewise, one of the novice EFL teachers, who felt highly competent, focused
on their ability to prepare language tests according to students’ levels and
course aims in contrast to their colleagues who used available exams regardless
of such points. That participant reported that:
On the other hand, although they stated they could apply their testing
knowledge, and did well in designing own language tests, they mentioned
they still had some challenges. For example, they stated their students had
lower levels of English language proficiency even though their age and edu-
cation levels were high; and thus, they could not apply high-level tests to
their students. In other words, they could not respond to the needs of their
local contexts as the external factor. This is because there was nothing about
Turkish contextual issues related to language assessment in their LAL training;
they just studied imaginary situations and hence, the course was considered
somewhat idealized by the participants. One of the novice EFL teachers drew
attention to the Turkish examination system, which is based on high-stakes
testing in the form of multiple-choice questions, and how such exams have
priority in Turkey. Therefore, they had to conduct written exams like that in
their classrooms.
In addition, most of the novice EFL teachers focused on the lack of prac-
tice as the internal factor, and complained about their inability to perform
effective testing practices in their current classes because they were unable to
adapt their theoretical knowledge to their teaching contexts. It is because, as
they mentioned before, they did not have any chance to experience language
testing during their LAL training at university. For that, one of them argued:
‘I have difficulties in practice and implementation. I have theoretical knowl-
edge [about language assessment], but I have difficulty in combining such
knowledge with the teaching style and testing system at state schools [in
Turkey]’ (Moderate.T.20).
150 Aylin Sevimel-Sahin
To sum up, it can be concluded that most of novice EFL teachers had appar-
ently great knowledge of ELTE thanks to the course and learned how to design
language tests, but they had some difficulties, especially in practicing, due to lack
of experience and practice during the course. While they defined the concept of
ELTE from testing perspective and believed the importance of assessment, they
mostly emphasized the notion of summative assessment both in terms of the
definitions and their current uses of assessment in teaching contexts.
Discussion
The present study has focused on the perceptions and experiences of novice
EFL teachers as well as their LAL training. The findings yielded two themes:
‘FLAL-ELTE Concept’ and ‘Experiences about ELTE (past & present)’ that
illustrate the responses to the research questions thematically.
Regarding the first research question (what the sample thought about ELTE
and FLAL), the responses gathered under the theme ‘FLAL-ELTE Concept’
indicated their perceptions related to these concepts. Novice EFL teachers
perceived ELTE as measuring language proficiency and determining the success
or deficiencies of students in the target language. Therefore, they mostly con-
centrated on the testing purposes of achievement and diagnosis. However, even
though they appreciated the importance of ELTE in teaching for several of its
benefits, only few of them were aware of other assessment purposes, such as to
improve learning/teaching. So, it can be inferred that their perceptions were
based on testing rather than assessment. This result was also similar to the stu-
dies which showed EFL teachers thought of assessment as only testing (Berry et
al., 2017; Duboc, 2009; Klinger, 2016; Tsagari, 2013). As Giraldo Aristizabal
(2018) has maintained, beliefs and previous experiences affect the perceptions
of teachers, and testing perception in this study can be attributed to the content
of their ELTE course, which focused on only testing issues; therefore, their
perceptions were shaped according to the training they underwent. This also
reflected in their practice in such a way that they used summative assessment to
determine the product of learning. As Herrera and Macias (2015) have put
forward, testing means summative assessment while ignoring other purposes, as
in this study.
As for the LAL concept itself, few of the novice EFL teachers’ responses
highlighted the dimensions such as ‘knowledge’ and ‘skills’ (Davies, 2008; Hill,
2017), ‘what’ and ‘how’ (Inbar-Lourie, 2008) and ‘practices’ (Fulcher, 2012)
along with some ‘contextual issues’ with respect to ‘local conditions’ (Hill,
2017; Stabler-Havener, 2018). However, when compared to the proposed
components of LAL in the literature, the dimension of ‘principles’ (use of tests
ethically and appropriately) (Davies, 2008; Fulcher, 2012), the element of ‘why’
(reasoning behind assessment) (Inbar-Lourie, 2008), and certain contextual
issues, such as historical and philosophical frameworks that form the origins of
assessment (Fulcher, 2012), were not much discussed. Therefore, it can be
LAL of novice EFL teachers 151
concluded that the novice EFL teachers were not very familiar with what being
assessment literate means, which is similar to the conclusions of Berry et al.’s
(2017), and Semiz and Odabas’ (2016) studies. This finding can be related to
their training content which focused on only knowledge and skills, and thus,
they were not introduced to LAL concept before graduation.
When it comes to perceived levels of LAL, most of the participant novice
EFL teachers felt they were good at assessment, which indicates they had
higher perceived levels of LAL. This finding is different from some studies
which found lower levels of LAL combined with teachers not feeling prepared
to test (Buyukkarci, 2016; Fard & Tabatabei, 2018; Tsagari & Vogt, 2017; Vogt
& Tsagari, 2014). This point of difference can be the result of their belief that
their knowledge of language assessment was good and hence, they felt con-
fident in assessment knowledge though they were confronted with difficulties
in practicing their testing skills.
Considering the second research question (how the sample evaluated their
LAL training as the course-based training (ELTE course) at university), the
responses for the subtheme ‘Previous Experiences about ELTE’ demonstrated
their opinions and experiences of, and their suggestions for the course. Most
of them stated they learned a lot about language testing at the end of the
course, and all the knowledge they gained was useful to their testing knowl-
edge in their teaching. Therefore, they found the course sufficient. It seems
that at the end of such LAL training, novice EFL teachers became aware of
language testing issues, and familiar with certain knowledge about and abil-
ities in language testing. This finding is similar to some studies in the literature
that showed ELTE course provides teachers with a basic training, which
makes them familiar with language testing issues and procedures (Hilden &
Frojdendahl, 2018; Semiz & Odabas, 2016; Turk, 2018). In contrast to the
study of Gebril (2017), which revealed the training was ineffective in devel-
oping LAL of its attenders, this study reached positive impacts. Nevertheless,
when the novice EFL teachers evaluated the ELTE course, they stressed
testing issues, rather than assessments, such as learning how to design multiple
choice tests to measure grammar and vocabulary knowledge. The did not
report anything about formative assessment, assessment for learning, or alter-
native assessment techniques; they just reported the ways summative assess-
ment tests what is learned in the end. Also, they noted that little emphasis on
testing productive language skills was given during the course period. Thus,
these findings can be associated with the content of the course itself and the
book used as the main resource; they were based on language testing topics
and there were not any recent topics in the syllabus. This was also illustrated
by some research studies in the literature that noted that the content of
training was exam-oriented, as in the current study (Hatipoglu, 2015; Lam,
2014). Therefore, as Mede and Atay (2017), and Turk (2018) have suggested,
training should include the topic of formative assessment in order to equip
teachers with using assessment for learning.
152 Aylin Sevimel-Sahin
Apart from these opinions, novice EFL teachers criticized the course for
being very theoretical and they noted it did not improve their language testing
skills; they just became familiar with how to design tests, and what kind of test
items can be used for each language skill testing. Further, they could not gain
experience in their teaching practice because only their teaching skills were
taken into account in the practicum. Therefore, despite being good in terms of
theory, the course was deprived of practice. Similarly, this finding has also been
much present in the literature; LAL training lacked the issue of developing
practical skills in terms of language assessment (Hatipoglu, 2015; Lam, 2014;
Sariyildiz, 2018; Sheehan & Munro, 2017; Yan et al., 2018).
Most of the participant novice EFL teachers argued there was nothing about
analyzing, interpreting, and evaluating test scores in the course; this topic was
ignored in the training phase. As Hudaya (2017), and Mede and Atay (2017)
have demonstrated training should educate teachers in how to interpret scores
and accordingly, how to give feedback to improve language learning. They also
stressed that class hours were inadequate since there was no time for practice, and
the last semester to take the course was too late. All in all, it seems the course-
based LAL training was beneficial, and the content was satisfying in terms of
theory. However, though novice EFL teachers received training, they felt they
still needed more training to be good assessors of their teaching contexts and to
improve their practical skills, as underlined by some studies in the literature
(Djoub, 2017; Fulcher, 2012; Klinger, 2016). Thus, as some novice EFL teachers
highlighted, in-service training could provide the missing course content.
As for the third research question (how the sample’s training affected their
assessment practices in teaching), the responses for the subtheme ‘Current
Experiences about ELTE’ illustrated whether the novice EFL teachers were
able to use their acquired knowledge and skills in their career life. Most of
novice EFL teachers, in line with the course content, focused on the testing
issues: They exemplified how they paid attention to testing criteria, how they
designed language tests, and how they measured what was learned by using
testing techniques such as multiple choice, true/false statements, and fill-in-the-
blanks. Because the ELTE course content was testing-oriented, it can be
inferred that the novice EFL teachers used mostly language tests and traditional
testing techniques to measure the sum of learning, and thus, they were able to
apply what they learned in the course to their teaching lives. In contrast to
Ukrayinska’s (2018) study, which showed that teachers still had some challenges
in designing language tests even after training, the participants of the present
study were able to properly design their own language tests.
However, similar to those studies in the literature that showed teachers
employed mostly traditional testing tools rather than portfolio or other types of
assessment to measure knowledge of language than language skills (Buyukkarci,
2014; Duboc, 2009; Mede & Atay, 2017; Oz, 2014; Semiz & Odabas, 2016;
Tsagari, 2013; Wach, 2012), novice EFL teachers in this study also used such
testing procedures. This finding can be attributed to the fact that Turkey is an
LAL of novice EFL teachers 153
exam-oriented country and the sample was most familiar with multiple-choice
exams and, as in other studies in the literature (Saka, 2016; Yan et al., 2018), they
were expected to use them. Thus, they preferred such testing tools in their classes
due to the testing policy of Turkey, as well as what they learned in the training.
In addition, though novice EFL teachers reported they had good knowledge
of designing language tests, they also stated they had some difficulty in applying
their tests, due to the local needs of their teaching context. Thus, as some stu-
dies indicated, LAL training should include more discussion of the local con-
ditions in relation to language assessment to assist future teachers (Tsagari &
Vogt, 2017; Vogt & Tsagari, 2014). Moreover, most of them highlighted that
they lacked practice, owing to the fact they could not gain experience during
the course period, and they just started to practice their language testing skills.
They thus had some challenges in relation to this; they could not put their
knowledge into practice as they expected. Several studies in the literature
concluded that most in-service teachers had knowledge of assessment proce-
dures, but could not practice them effectively; so too did the present study
(Giraldo Aristizabal, 2018; Hakim, 2015; Jannati, 2015; Mede & Atay, 2017;
Oz & Atay, 2017; Yan et al., 2018). The need for further training was stressed
in much of the literature (Hatipoglu, 2010; Sariyildiz, 2018; Sheehan &
Munro, 2017; Tsagari & Vogt, 2017; Wach, 2012), therefore novice EFL tea-
chers need much more LAL training in relation to certain topics to be assess-
ment literate.
After all, though the present study revealed the ELTE course provided basic
LAL training, and thus trained teachers in terms of language testing knowledge
and techniques, novice EFL teachers still need much more training to be good
assessors of their teaching contexts and to gain the characteristics of assessment
literate teachers to respond to the needs of their students.
Conclusion
The current study aimed to investigate the perceptions and experiences of
novice EFL teachers about language assessment, and also, the effect of their
teacher education training on their development of LAL. Overall, as LAL
training, the ELTE course was found to be beneficial and helpful because of
knowledge of language testing and skills of how to design language tests were
acquired in the course, and accordingly, applied in professional life. Yet, the
perceptions, experiences, and practices of novice EFL teachers regarding lan-
guage assessment were limited to testing, rather than assessment itself. Further,
the course contributed to theoretical knowledge of language testing, rather
than practical knowledge. Therefore, there are still some gaps to be filled in
order to make novice EFL teachers better in terms of LAL, as demonstrated by
some studies in the literature (Hatipoglu, 2010; Sariyildiz, 2018; Sheehan &
Munro, 2017; Tsagari & Vogt, 2017). For this reason, some pedagogical
implications can be shared.
154 Aylin Sevimel-Sahin
Notes
1 This paper is based on the doctoral dissertation titled ‘Exploring foreign language
assessment literacy of pre-service English language teachers’.
2 Corresponding Author: Dr. Aylin Sevimel-Sahin, ELT Department, Anadolu University,
Eskisehir, Turkey, aylinsevimel@anadolu.edu.tr
References
Bachman, L. F. (2005). Statistical analyses for language assessment. Cambridge University
Press.
Berger, A. (2012). Creating language-assessment literacy: A model for teacher education.
In J. Hüttner, B. Mehlmauer-Larcher, S. Reichl, & B. Schiftner (Eds.), Theory and
practice in EFL teacher education: Bridging the gap (pp. 57–82). Short Run Press Ltd.
Berry, V., Sheehan, S., & Munro, S. (2017, May 3–5). Exploring teachers’ language
assessment literacy: A social constructivist approach to understanding effective practices [Paper
presentation]. ALTE 6th International Conference Learning and Assessment: Making
the Connections, Bologna, Italy. http://eprints.hud.ac.uk/id/eprint/33342/
Brennan, M. (2015, May 21–23). Building assessment literacy with teachers and students: New
challenges? [Paper presentation]. ACER EPCC Conference, Sydney, Australia. https://
www.acer.org/files/eppc15-Brennan-Building-Assessment-Literacy-with-teachers-a
nd-students2.pptx+&cd=1&hl=tr&ct=clnk&gl=tr&client=firefox-b-d
Buyukkarci, K. (2014). Assessment beliefs and practices of language teachers in primary
education. International Journal of Instruction, 7(1), 107–120.
Buyukkarci, K. (2016). Identifying the areas for English language teacher development: A
study of assessment literacy. Pegem Egitim ve Ogretim Dergisi [Pegem Journal of Education
and Instruction], 6(3), 333–346.
Creswell, J. W. (2007). Qualitative inquiry and research design: Choosing among five approa-
ches (2nd edition). SAGE Publications.
Csépes, I. (2014). Language assessment literacy in English teacher training programmes
in Hungary. In J. Hovarth, & P. Medgyes (Eds.), Studies in honour of Marianne Nikolov
(pp. 399–411). Lingua Franca Csoport.
Davies, A. (2008). Textbook trends in teaching language testing. Language Testing, 25(3),
327–347.
DeLuca, C., & Klinger, D. A. (2010). Assessment literacy development: Identifying gaps
in teacher candidates’ learning. Assessment in Education: Principles, Policy & Practice, 17
(4), 419–438.
Djoub, Z. (2017). Assessment literacy: Beyond teacher practice. In R. Al-Mahrooqi, C.
Coombe, F. Al-Maamari, & V. Thakur (Eds.), Revisiting EFL assessment (pp. 9–27).
Springer International Publishing.
Duboc, A. P. M. (2009). Language assessment and the new literacy studies. Lenguaje, 37
(1), 159–178.
Fard, Z. R., & Tabatabei, O. (2018). Investigating assessment literacy of EFL teachers in
Iran. Journal of Applied Linguistics and Language Research, 5(3), 91–100.
Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2011). How to design and evaluate research
in education (8th edition). McGraw-Hill.
Fulcher, G. (2012). Assessment literacy for the language classroom. Language Assessment
Quarterly, 9(2), 113–132.
156 Aylin Sevimel-Sahin
Jeong, H. (2013). Defining assessment literacy: Is it different for language testers and
non-language testers? Language Testing, 30(3), 345–362.
Jin, Y. (2010). The place of language testing and assessment in the professional pre-
paration of foreign language teachers in China. Language Testing, 27(4), 555–584.
Karagul, B. I., Yuksel, D., & Altay, M. (2017). Assessment and grading practices of EFL
teachers in Turkey. International Journal of Language Academy, 5(5), 168–174.
Khadijeh, B., & Amir, R. (2015). Importance of teachers’ assessment literacy. Interna-
tional Journal of English Language Education, 3(1), 139–146.
Klinger, C. J. T. (2016). EFL professors’ beliefs of assessment practices in an EFL pre-service
teacher training undergraduate program in Colombia (Publication No. 10239483) [Doctoral
dissertation, Southern Illinois University Carbondale]. ProQuest Dissertations and
Theses Global.
Lam, R. (2014). Language assessment training in Hong Kong: Implications for language
assessment literacy. Language Testing, 32(2), 169–197.
Mede, E., & Atay, D. (2017). English language teachers’ assessment literacy: The
Turkish context. Dil Dergisi-Ankara Universitesi TOMER [Language Journal-Ankara
University TOMER], 168(1), 43–60.
Miles, M. B., & Huberman, A. M. (1994). An expanded sourcebook: Qualitative data ana-
lysis (2nd edition). SAGE Publications.
Munoz, A. P., Palacio, M., & Escobar, L. (2012). Teachers’ beliefs about assessment in
an EFL context in Colombia. Profile, 14(1), 143–158.
Newfields, T. (2006). Teacher development and assessment literacy. Authentic Commu-
nication: Proceedings of the 5th Annual JALT Pan-SIG Conference, 48–73. http://hosted.
jalt.org/pansig/2006/PDF/Newfields.pdf
O’Loughlin, K. (2006). Learning about second language assessment: Insights from a
postgraduate student online subject forum. University of Sydney Papers in TESOL, 1,
71–85.
Onalan, O., & Karagul, A. E. (2018). A study on Turkish EFL teachers’ beliefs about
assessment and its different uses in teaching English. Journal of Language and Linguistic
Studies, 14(3), 190–201.
Oz, H. (2014). Turkish teachers’ practices of assessment for learning in the English as a
foreign language classroom. Journal of Language Teaching and Research, 5(4), 775–785.
Oz, S., & Atay, D. (2017). Turkish EFL instructors’ in-class language assessment literacy:
Perceptions and practices. ELT Research Journal, 6(1), 25–44.
Patton, M. Q. (2002). Qualitative research and evaluation methods (3rd edition). SAGE
Publications.
Rea-Dickins, P. (2004). Understanding teachers as agents of assessment. Language Test-
ing, 21(3), 249–258.
Rogier, D. (2014). Assessment literacy: Building a base for better teaching and learning.
English Language Teaching Forum, 3, 2–13.
Sahinkarakas, S. (2012). The role of teaching experience on teachers’ perceptions of
language assessment. Procedia – Social and Behavioral Sciences, 47, 1787–1792.
Saka, F. O. (2016). What do teachers think about testing procedures at schools? Proce-
dia – Social and Behavioral Sciences, 232, 575–582.
Sariyildiz, G. (2018). A study into language assessment literacy of pre-service English as a
foreign language teachers in Turkish context [Unpublished master’s thesis]. Hacettepe
University.
158 Aylin Sevimel-Sahin
Teachers’ assessment of
academic writing
Implications for language assessment
literacy
Zulfiqar Ahmad
Introduction
In most academic settings, the course teachers are responsible for creating,
administering, and grading all the course assessment interventions, which include
but are not limited to: quizzes, in-class assignments, portfolio management, and
mid and final term examinations. The course teachers are expected to produce
and report academically reliable accounts of students’ performance as charted out
in curricular, institutional, and national policies. This multifaceted role, coupled
with pedagogic assignments, anticipates a high level of assessment literacy (AL),
which Stiggins (1995, p.240) understands as ‘knowing the difference between
sound and unsound assessment’. Gapsin understanding and executing the princi-
ples of sound assessment are liable to produce inaccurate test results that may be
vulnerable to faulty interpretations and decisions, and may adversely affect the
stakeholders’ perceptions of assessment, more specifically test takers’ perceptions
(Rahimi, Esfandiari & Amini, 2016).
In the field of language teaching, the term language assessment literacy (LAL)
has been introduced to differentiate this specialized form from its more global
variant of AL (Giraldo, 2018). LAL is based on the premise that the raters are
knowledgeable about the language they teach and test, as well as adequately
trained and skilled in the theoretical and practical underpinnings of language
testing (Davies, 2008; Fulcher, 2012; Inbar-Lourie, 2013). Following these
assumptions and Malone (2013), the operational construct for this study has been
situated in teachers’ ability to create and follow appropriate assessment rubrics as
well as grade academic writing, paraphrasing in this case, as closely to the con-
struct of the writing task as is possible.
Several studies report teachers’ lack of suitable training and skills in LAL (Lin,
2014; Popham, 2006), but most of these studies are based only on survey reports
involving different stakeholders related to LAL. One serious limitation of these
type of studies is that they base their findings and conclusions on the perceptual
understanding of the participants without actually analyzing the teachers’ real-life
assessments of any specific language skills. This research gap in LAL prompted the
researcher to use already graded examination scripts as the unit of analysis in order
160 Zulfiqar Ahmad
to find out the appropriateness of exam rubrics, the measurement scale, and tea-
chers’ use of these rubrics and measurement scale. The researcher anticipated that
the relationship of these variables with the test scores would help not only to
identify gaps in assessment practices, but also to foreground implications for the
LAL training of teachers of academic writing in particular and English as a Foreign
Language in general.
LAL thus refers to the execution of skills and knowledge in a way that is
grounded in theory, and which aims toempower the teacher to have a clear
cognizance of his or her role as an assessor of academic writing. The role, which
may appear supra-academic in its orientation, involves an understanding of the
nature, application, and implications of the what, why, when, and how of assess-
ment. What refers to the language trait being assessed, whymeans the purpose of
assessment, when means the learning or course stage when a particular trait should
be tested, and how includes the assessment processes inclusive of test design,
administration, and grading. Designing an academic writing test is primarily the
job of a language assessment specialist or trained writing examiners, as is done in
large scale standardized tests such as the International English Language Testing
System (IELTS) and Test of English as a Foreign Language (TOEFL). But in
most academic contexts it is the teachers of academic writing who have the
responsibility of managing all the essentials of assessment. This indicates the need
for training in test construction, assessment rubrics, and consistent measurement
of the writing tasks. Owing to contextual variations and curricular preferences, it
is hard to establish a workable construct for writing (Weigle cited in Ahmad,
2019), and even a small deviation from contextual parameters, which are situated
in institutional policies and course objectives, can adversely affect the purpose of
assessment as well as performance of the teachers as raters. It is equally important
to ascertain the timeframe for assessment or the learning stage when a particular
assessment intervention is to be used.
LAL also prioritizes the rationale for assessment so that the teachers must know
what they are assessing for and why in this specific way. However, the most sig-
nificant dimension of LAL seems to be its focus on the ‘how’ of assessment which
entails holistic yet rationalistic implementation of the skills and knowledge
received through training and experience. LAL expects the raters to be able to
produce a reliable and valid assessment of the writing sample despite individual
differences. Stiggins (cited in Herrera &Macias, 2015, p.307) bases his notion of
LAL competence on the following benchmarks: (a) identifying clear rationale for
assessment, (b) explicitly stating anticipated outcomes, (c) using appropriate assess-
ment strategies and methods, (d) designing reliable assessment items, rubrics, and
sampling, (e) eliminating rater bias, (f) reporting the results honestly, and (g)
employing assessment as a pedagogic tool.
Researchers (e.g. Lin & Su, 2015; Sultana, 2019, etc.) have identified a
lack of appropriate training in LAL among English as a Foreign Language
(EFL) teachers. Mai (2019, p.104), for instance, argues that ‘most teachers of
English at all levels of language education still face the challenge of identi-
fying “criteria” for writing assessment scales’. Issues like this can have serious
implications for the performance of teachers of academic writing who are
responsible for designing and grading writing exams. Most teacher training
programmes do include a module on language testing and assessment but
they are not comprehensive enough to equip the teachers, especially novi-
ces, to confidentlyundertake language assessment. There is a dearth of both
162 Zulfiqar Ahmad
i the extent to which the assessment criteria and rubrics provide for appro-
priate analysis of students’ paraphrasing skills
ii the extent to which the test scores correlate with the assessment criteria and
rubrics
Method
This section of the chapter details the participants and research context of the
study, the characteristics of the writing samples collected for paraphrase analysis,
and the analytical procedures adopted for analysis of the data.
Analytical procedures
The first step after the paraphrase samples had been collected was to type the
hand written student writing in a Word document with all the errorsintact to
maintain originality and transparency. Each text was allotted a code, and word
length and exam score were recorded for later analysis. The assessment rubrics
had four performance descriptors which were graded on four-point criteria –
order, paraphrasing, ideas, and language use (grammar and mechanics). The
next step was to devise a measurement scale because the exam scripts had been
marked holistically with a rounded score for the overall performance instead of
the four-point criteria stated in the rubrics. Because the focus of the study was
to investigate how the teachers had assessed the sample paraphrases and not the
grammatical issues, the researcher decided not to analyze language problems in
view of the absence of specific marks for the language use, and to consider the
teacher-awarded scores to account for the three measurement criteria, namely:
order, paraphrasing, and ideas. However, a few interventions had to be intro-
duced to facilitate the analytical process. The researcher developed a template
which was used to segregate and analyze the sample paraphrases by the criteria
of: order, paraphrasing, and ideas. Since all paraphrases followed the order of
the ST, no further analysis was done.
For paraphrasing, the rubrics were found to be vague and ambiguous as they
did not provide for a systematic scale or criteria which could be used to analyze
teachers’ assessment of paraphrasing. Therefore, the researcher had to first
establish a text length which could be considered a paraphrase. Two groups of
paraphrases were identified – paraphrases with 150 or more words were
assumed to be a reliable text-length equivalent of the ST, and paraphrases with
149 or lesser words were assumed to be either summaries of the ST or an
unreliable version of the ST. To find out if the sample texts were substantial,
superficial, patchwriting, or inaccurate, the written samples were analyzed for
these paraphrasing standards based on the difference between the original and
the plagiarized parts of the text. For analysis, plagiarism was operationalized to
be the incidence and frequency of five or more consecutive words (Shi, 2012;
Sun & Yang, 2015) from the ST or repetition of the ST words with minor
changes in word order. The last measurement criteria – order– was analyzed
based on the count of missed ideas per text. The samples were also analyzed for
correlation of the exam scores with the performance descriptors. The texts
were also analyzedfor citing the source in compliance with the academic
conventions.
Statistical Package for the Social Sciences (SPSS) was used to obtain descriptive
statistics for the word length, exam scores, paraphrasing, ideas, and missed ideas.
Percentage scores were obtained for these variables as well as for citations given
or not, and for the four performance descriptors. Non-parametric correlation
analysis was also done to ascertain the presence of any statistically significant
correlations between different variable of the study.
168 Zulfiqar Ahmad
Results
The paraphrases had been examined on the four-point assessment criteria
(order, paraphrasing, ideas, and language use) which had been set for the ori-
ginal assessment at the research site. Since all the paraphrases (n=55) were
found to adhere to the order of the STs, no further analysis was conducted. As
for the paraphrasing, all the paraphrased texts were both patchwriting and
inaccurate. However, the third descriptor (i.e. ‘ideas’) was segregated between
complete and incomplete paraphrases to allow for further analysis. A little more
than half of the paraphrases failed to achieve the operationalized word length
for this study. Percentage scores reveal that 47.27% of the paraphrases were 150
or more words, whereas 52.72% of the paraphrases were found to be in the
range of 50 to 149 words. The major reason for this seems to be the number of
ideas that had been dropped by the students in their attempt to paraphrase the
ST. Only 20% of the paraphrases restated all of the ideas from the ST. 21.81%
of paraphrases were found to have missed 3 ideas while 16.36% of paraphrases
had 2 and 4 dropped ideas respectively. 33.89% of the paraphrased texts were
plagiarized, with the minimum being.75% and the maximum 87.37%.
The students’ test scores ranged from 4 to 8 out of 10. 36.36% of the para-
phrases were awarded 6 followed by 16.36% of the texts awarded 6.5 and 7
points, and 12.72% by paraphrases awarded either 5 or 5.5 points. 60.09% of
paraphrases were found to be ‘Proficient’ with the score range from 6 to 7.
Following the exam rubrics, 20.09% of the paraphrases with the score of 5.5,
6.5, 7.5, and 8, could not be identified with any of the performance descrip-
tors. 87.27% of the paraphrases did not cite the source of the ST.
SPSS was used to obtain descriptive statistics and correlation analysis for the
text length, test scores, paraphrasing, and the missing ideas. The results for the
Text Length (TL), Test Score (TS), Paraphrasing (PP), and Missed Ideas (MI)
were found to be M = 153.91; SD = 34.576, M = 6.08; SD =.744, M =
52.16; SD = 36.156 and M = 3.15; SD = 2.360 respectively. These figures
indicate that the paraphrased texts varied considerably in their length in com-
parison with the ST. Most of the TS range was, however, closer to the mean.
On the other hand, PP and MI were unevenly dispersed among the corpus of
the paraphrased texts and thus illustrated why most of the paraphrases were not
closer to the word length of the ST i.e. 207 words. Spearman’s rho (rs) failed to
identify any statistically significant association between the variables except for
between the TL and PP, rs =700; p =.01; the statistically negative one between
TL and MI, rs =-.712; p =.01, and PP and MI, rs = -.435; p =.01.
The results for the Text Length Range 1(TLR1) 50 to 149 words per para-
phrase, TLR1, Test Score Range 1 (TSR1), Paraphrasing Range 1 (PPR1), and
Missed Ideas Range 1 (MIR1) had M = 124.73; SD = 22.326, M = 6.15; SD
=.822, M = 27.73; SD = 15.517 and M = 4.58; SD = 2.230 respectively.
Spearman’s rho test of correlation failed to find any statistical relationship among
these variables. On the other hand, the descriptive statistics for the same variables
Teachers’ assessment 169
in Text Length Range 2 (TLR2) with 150 or more words were found to be M =
180.07; SD = 19.007, M = 6.02; SD =.675, M = 74.72; SD = 35.425 and M =
1.86; SD = 1.642 respectively. Spearman’s rho was negatively significant between
the TLR2 and Missed Ideas Range 2 (MIR2), rs = -.602; p =.01. The results
indicated that the word length and the plagiarized text did not affect the test scores
in the two groups; however, texts with shorter word range had more missing ideas
than the texts with 150 or more words. Similarly, the missing ideas did not seem to
determine the text scores as there was a fraction of a difference between the mean
scores of the two groups. The results for the second group also revealedthat the
higher the number of words, the lesser the number of missing ideas.
Discussion
This section of the study focuses on the discussion about the relevance of the
assessment criteria, rubrics, test scores, and teachers’ performance as assessors to
figure out the implications for LAL.
The results of the paraphrase analysis reveal serious shortcomings both in the
assessment criteria and the assessment process. These findings support Mellati and
Khademi (2018) that teachers’ assessment literacy affects students’ writing per-
formance results. The standards set as performance descriptors are both vague and
ambiguous to the extent that they do not permit uniform and reliable assessment
of the students’ paraphrases. There is no descriptor category for the score range of
5.5 and it is not clear if this score should be considered ‘Proficient’ or ‘Exemp-
lary’ in terms of performance description. The same is true of the score range for
7.5 and above. In addition, the labelling of the descriptors into ‘Standard not
met’, ‘Progressing’, ‘Proficient’, and ‘Exemplary’ may well describe linguistic
competence but not paraphrasing as an academic literacy skill. There is no such
explanation for paraphrasing in the research studies done on the subject. Fol-
lowing Shi (2012) and Sun and Yang (2015), paraphrasing is either substantial, or
patchwriting, or superficial, or inappropriate. The following excerpts in Table 9.2
from students’ paraphrasing illustrate the point:
from the present study were examination scripts and the paraphrases were
treated as an exam activity. A study which collects samples of paraphrases
from, for instance, research articles or term papers may reflect a different
response both in terms of student performance and rater assessment. The
study also did not include teachers’ and students’ perceptions. The relation-
ship between teachers’ and students’ beliefs can provide further insights into
the matters encompassing LAL.
Conclusion
Paraphrasing is an important academic literacy skill, and following Ahmad
(2019, p.279), students’ exposure to ‘the contemporary practices in the domain
of academic literacy’ helps them to ‘gain membership of their specific discourse
community’– one of the very basic aims of academic writing programmes.
Such aims cannot be materialized if the language assessment system is not
properly supported by background training of the assessors in LAL. Lack of
competence in LAL can affect teachers’ judgment and decisions, which in turn
can challenge the academic veracity of students’ results, course objectives,
course assessment and evaluation, and broader institutional, social, and national
policies. A lot is expected from teachers in terms of course delivery and assess-
ment. They must be facilitated through awareness-raising and practical training
programmes in LAL, both at the pre- and in-service levels for the benefit of the
learners and the academia.
References
Ahmad, Z. (2017a). Academic text formation: Perceptual dichotomy between pedago-
gic and learning experiences. Journal of American Academic Research, 5(4), 39–52.
Ahmad, Z. (2019). Analyzing argumentative essay as an academic genre on assessment
frameworks of IELTS and TOEFL. In S. Hidri (Ed.), English language teaching research
in the Middle East and North Africa: Multiple perspectives (pp. 279–299). Palgrave
Macmillan.
Ahmad, Z. (2017b). Empowering EFL learners through a needs-based academic writing
course design. International Journal of English Language Teaching, 5(9), 59–82.
Bailey, K. M., & Brown, J. D. (1996). Language testing courses: What are they? In A.
Cumming, & R. Berwick (Eds.), Validation in language testing (pp. 236–256). Multi-
lingual Matters.
Campbell, C., Murphy, J. A., & Holt, J. K. (2002). Psychometric analysis of an assessment
literacy instrument: Applicability to preservice teachers. Paper presented at the Annual
Meeting of the Mid-Western Educational Research Association, Columbus, OH.
Coombe, C., Davidson, P., O’Sullivan, B., & Stoynoff, S. (Eds.), (2012). The Cambridge
guide to second language assessment. Cambridge University Press.
Davies, A. (2008). Textbook trends in teaching language testing. Language Testing, 25(3),
327–347.
Fulcher, G. (2012). Assessment literacy for the language classroom. Language Assessment
Quarterly, 9(2), 113–132.
Giraldo, F. (2018). Language assessment literacy: implications for language teachers.
Profile: Issues in Teachers’ Professional Development, 20(1), 179–195.
Herrera, L., & Macías, D. (2015). A call for language assessment literacy in the education
and development of teachers of English as a foreign language. Colombian Applied
Linguistics Journal, 17(2), 302–312.
Hidri, S. (2016). Conceptions of assessment: Investigating what assessment means to
secondary and university teachers. Arab Journal of Applied Linguistics, 1(1), 19–43.
Howard, R. M. (1995). Plagiarism, authorships, and the academic penalty. College Eng-
lish, 57, 788–806.
Inbar-Lourie, O. (2013). Language assessment literacy. In C. A.Chapelle (Ed.), The
encyclopedia of applied linguistics (pp. 2923–2931). Blackwell.
Janatifar, M., & Marandi, S. S. (2018). Iranian EFL teachers’ language assessment literacy
(LAL) under an assessing lens. Applied Research on English Language, 7(3), 307–328.
Kalajahi, S. A. R., &Abdullah A. N. (2016). Assessing assessment literacy and practice-
samong lecturers. Pedagogika/Pedagogy, 124(4), 232–248.
Keck, C. (2006). The use of paraphrase in summary writing: A comparison of L1 and L2
writers. Journal of Second Language Writing, 15, 261–278.
Kennedy, C., & Thorp, D. (2007). A corpus-based investigation of linguistic responses
to an IELTS academic writing task. In L. Taylor, & P. Falvey (Eds.), Studies in lan-
guage testing: IELTS collected papers – Research into speaking and writing assessment (Vol.
19, pp. 316–379). Cambridge University Press.
Lin, D. (2014). A study on Chinese middle school English teachers’ assessment literacy
[Unpublished Doctoral Dissertation]. Beijing Normal University.
Lin, D., & Su, Y. (2015). An investigation of Chinese middle school in-service English
teachers’ assessment literacy. Indonesian EFL Journal, 1(1), 1–10.
174 Zulfiqar Ahmad
López, A., & Bernal, R. (2009). Language testing in Colombia: A call for more teacher
education and teacher training in language assessment. Profile: Issues in Teachers’ Pro-
fessional Development, 11(2), 55–70.
Mai, D. T. (2019). A review of theories and research into second language writing and
assessment criteria. VNU Journal of Foreign Studies, 35(3), 104–126.
Malone, M. E. (2013). The essentials of assessment literacy: Contrasts between testers
and users. Language Testing, 30(3), 329–344.
Mayor, B., Hewings, A., North, S., Swann, J., & Coffin, C. (2007). A linguistic analysis
of Chinese and Greek L1 scripts for IELTS academic writing task 2. In L. Taylor, &
P. Falvey (Eds.), Studies in Language Testing: IELTS collected papers – Research in speak-
ing and writing assessment (Vol. 19, pp. 250–314). Cambridge University Press.
Mellati, M., & Khademi, M. (2018). Exploring teachers’ assessment literacy: Impact on
learners’ writing achievements and implications for teacher development. Australian
Journal of Teacher Education, 43(6), 1–18.
Mertler, C. A. (2004). Secondary teachers’ assessment literacy: Does classroom experi-
ence make a difference? American Secondary Education, 33(1), 49–64.
Nunan, D. (1988). The learner centred curriculum. A study in second language teaching.
Cambridge University Press.
Ölmezer-Öztürk, E., & Aydın, B. (2018). Investigating language assessment knowledge
of EFL teachers. Hacettepe University Journal of Education, 34(3), 602–620.
Ölmezer-Öztürk, E., & Aydın, B. (2019). Voices of EFL teachers as assessors: Their
opinions and needs regarding language assessment. Eğitimde Nitel Araştırmalar Dergisi–
Journal of Qualitative Research in Education, 7(1), 373–390.
Oshima, A., & Hogue, A. (1999). Writing academic English (3rd edition). Addison-Wesley
Publishing Company.
Oshima, A., & Hogue, A. (2006). Writing academic English (4th edition). Longman.
Öz, S., & Atay, D. (2017). Turkish EFL instructors’ in-class language assessment literacy:
perceptions and practices. ELT Research Journal, 6(1), 25–44.
Plake, B. S., & Impara, J. C. (1997). Teacher assessment literacy: What do teachers
know about assessment? In G. D. Phye (Ed.), Handbook of classroom assessment: Learn-
ing, achievement, and adjustment (pp. 53–68). Academic Press.
Popham, W. J. (2006). All about accountability: A dose of assessment literacy. Improving
Professional Practice, 63(6), 84–85.
Popham, W. J. (2009). Assessment literacy for teachers: Faddish or fundamental? Theory
Into Practice, 48, 4–11.
Rahimi, F., Esfandiari, M. R., & Amini, M. (2016). An overview of studies conducted
on washback, impact and validity. Studies in Literature and Language, 13(4), 6–14.
Roig, M. (1999). When college students’ attempts at paraphrasing become instances of
potential plagiarism. Psychological Reports, 84, 973–982.
Shi, L. (2012). Rewriting and paraphrasing source texts in second language writing.
Journal of Second Language Writing, 21, 134–148.
Shi, L. (2004). Textual borrowing in second language writing. Written Communication,
21, 171–200.
Shi, L., Fazel, I., & Kowkabi, N. (2018). Paraphrasing to transform knowledge in
advanced graduate student writing. English for Specific Purposes, 51, 33–44.
Stiggins, R. J. (1995). Assessment literacy for the 21st century. The Phi Delta Kappan, 77
(3), 238–245.
Teachers’ assessment 175
Sultana, N. (2019). Language assessment literacy: An uncharted area for the English
language teachers in Bangladesh. Language Testing in Asia, 9(1), 1–14.
Sun, Y. C. (2012). Does text readability matter? A study of paraphrasing and plagiarism
in English as a foreign language writing context. The Asia-Pacific Education Researcher,
21, 296–306.
Sun, Y. C., & Yang, F. Y. (2015). Uncovering published authors’ text-borrowing
practices:Paraphrasing strategies, sources, and self-plagiarism. Journal of English for Aca-
demic Purposes, 20, 224–236.
Thomas, J., Allman, C., & Beech, M. (2004). Assessment for the diverse classroom: A
handbook for teachers. Florida Department of Education, Bureau of Exceptional Edu-
cation and Student Services. http://www.fldoe.org/ese/pdf/assess_diverse.pdf
Verma, S. (2015). Technical communication for engineers. Vikas Publishing.
Yamada, K. (2003). What prevents ESL/EFL writers from avoiding plagiarism? Analyses
of 10 North-American college websites. System, 31, 247–258.
Chapter 10
Reliability of classroom-based
assessment as perceived by
university managers, teachers,
and students
Olga Kvasova and Vyacheslav Shovkovy
Introduction
After Ukrainian higher education joined the Bologna process in 2005,
decision-makers report that all university curricula have been redesigned in
compliance with modules and credits. Generally, such redesign is accompanied
by the development of a national quality assurance system to ascertain that the
level of education is of the required standard. Therefore, the introduction of
modules and credits has critically increased the role of the summative assess-
ment of levels attained by students at the end of each academic course and at
graduation.
The recent British Council (BC) report on the state of teaching English for
specific purposes in Ukraine reveals that the Bologna requirements to define
English language curriculum modules and credits have been implemented par-
tially, whereas the development of a meaningful quality assurance system is still
pending (Bolitho & West, 2017). One aspect of the issue is that the evidence of
students’ achievements is not based on the external (standardized) tests, which
impedes comparability of results across various institutions in the country.
Another such aspect refers to the quality of internal, institutional assess-
ment, which in actual fact substitutes external quality assurance, and is
therefore setting-specific. Focusing on institutional quality assurance, the
experts concluded that there were generally poor standards of tests and
examinations resulting from a lack of testing and assessment expertise in
those who prepare assessment materials.
In Ukraine, test preparation is solely the responsibility of instructors since no
test development units, with specially trained staff, have been included in uni-
versities as yet. Nationwide, summative tests are constructed by teachers whose
major function is to teach and implement assessment for learning; this allows us
to refer to teacher-constructed summative tests as ‘classroom-based summative
tests’. The authorship of summative tests raises concerns about the reliability of
information regarding attained language levels, which is primarily required of
assessment. Decision-making based on inaccurate information may have far-
reaching consequences on education policies in the national scope.
Reliability of classroom-based assessment 177
Review of literature
Summative assessment has the purpose of reporting on learning achieved at a
certain time, therefore its accuracy and objectivity cannot be questioned. In
Western tertiary education systems, summative assessments have long been
used to meet the increased demands for accountability. These assessments are
regular, systematic, rational, and formalized, and have provided plentiful rea-
sons for appraisal and critique. The mandatory character of summative assess-
ment is opposed to the continuous, informal character of formative assessment
and its emphasis on promoting better learning. As Houston & Thompson
(2017) point out ‘[f]ormative (feedback) assessment is intended to help stu-
dents with future learning, whereas summative (feedout) assessment warrants
or certifies student achievements to others, including potential employers’
(p.2). Lau (2017) challenges the artificially created dichotomy of ‘summative’
and ‘formative’ assessment wherein summative is bad and formative is good,
178 Olga Kvasova and Vyacheslav Shovkovy
and advocates for the idea that formative and summative assessment need to
work in harmony without being opposed to each other. Brown (2019) goes
so far in his argument by asserting that formative assessment – assessment for
learning – is a meaningful teaching framework rather than assessment the
major function of which is ‘verifiability for its legitimacy as a tool for deci-
sion-making’. How fair are all these claims? It is worthwhile considering the
purposes of assessment specified in official documents on assessment practices
in the contexts where quality assurance is well-established.
Among the four purposes of assessment classified in the code of practice
for assessment offered by the UK Quality Assurance Agency (QAA), the
pedagogy-related purpose of ‘providing students with feedback to promote
their learning’ is given priority; this is largely pertaining to formative
assessment. The next two purposes – measurement (‘evaluation of student
skills’) and standardization (‘providing a mark or grade to establish the level
of a student’s performance’) – seem to serve both formative and summative
assessments. The certification purpose (‘communicating to the public the
level of individual achievement as reflecting the academic standard’) is
overtly pertinent to summative assessment (QAA, 2006 as cited in Norton,
2009 p.134). Despite the seeming balance amongst assessment purposes, in
Western higher education, assessment of learning predominates over assess-
ment for learning. However, more attention has been recently placed on the
complementary characteristics of formative and summative assessments
(Houston & Thompson, 2017), on the synergy of these two types of
assessment differing in form and function (Carless, 2006), as well as transi-
tion to alternative forms of summative assessment that are better compliant
with the requirements of 21th century education (HEA, 2012).
In Ukraine, on the contrary, formative assessment has always been deep-
rooted in classroom practices and is currently being implemented through a
variety of traditional and innovative methods (Dovgopolova, 2011; Shadrina,
2014; Olendr, 2015). In their dedication to assessment for learning (Kvasova &
Kavytska, 2014), Ukrainian teaches share beliefs revealed in Muñoz et al.’s
study (2012): According to this research, teachers view assessment as a means of
improving students’ performance and teaching methods rather than a route
towards accountability and certification. Nevertheless, a teacher’s professional
duties of evaluating learners’ achievements at the end of learning a subject or
course have always been implemented in Ukrainian education. The shift of
focus to assessment for reporting has considerably increased teachers’ workload
and responsibilities, while lack of hands-on recommendations on the develop-
ment of these high stakes summative tests have raised concerns of teachers, who
are the immediate actors of assessment.
The difference in function and use of the two types of assessment is
explained by Harlen (2007). She argues that in the course of assessment for
learning the major goal pursued by educators is promoting students’ learning.
In this case, evidence of progress is frequently expressed in grades. Reliability of
Reliability of classroom-based assessment 179
practices and raise teacher assessment literacy. These principles are ‘Planning
and Reflection [that] lead to Improvement, when supported by Cooperation
and informed by Evidence’ (Green 2014, p. 21). In other words, in classroom
conditions where the assessment literacy of teachers is not generally very high,
it is mandatory to collaborate on all stages of test development, administration,
and analysis.
Following this line of thought, and stimulated by Coombe et al.’s (2007)
suggestion that ‘it is easier to assess reliability than validity’ (p. xxiv), we con-
ceived of a research project examining issues of the reliability–trustworthiness
of summative assessments in university conditions. Examining reliability
through sophisticated statistical analysis is hardly possible in Ukraine, where
LTA is still in its infancy; despite this, an empirical, qualitative study of relia-
bility of classroom summative assessments is quite feasible. In the following
section of this chapter, we will provide the research rationale and methods of
investigation employed in our study.
Current study
Research rationale
The ultimate goal of this research was to explore the practices of summative
assessment adopted in Ukrainian universities with the view to establish the degree
of reliability–trustworthiness of assessment results. To this end, the study intended
to survey the experiences of three central groups of stakeholders – university
managers, teachers, and students – involved in organization, implementation, and
decision making based on summative assessment. We adopted a working
hypothesis that reliability of summative tests in universities may be directly
dependent on the actual level of TAL. It was also expected that the survey
would offer insights into the ways to maximize teacher training in LTA. Our
initial task, in this respect, was to determine all aspects of classroom summative
assessment that are empirically observable and assessable.
We proceeded from the assumption that reliable university summative
assessment should first be uniform for specific groups of learners (year of stu-
dies, specialty). It means it should aim at measuring the level of the same skills
that are determined in curricula, use the same testing techniques, be collected
within the same procedure (written test paper, timing), and be graded based
on the same criteria. Second, the process should adhere to the test develop-
ment cycle the major stages of which are planning (defining the test con-
struct) and the collegial choice of testing techniques, followed by item/tasks
writing, pre-testing, and test modification. The quality of the developed test
should be necessarily assured by those managers or teachers whose level of
assessment literacy is higher than average. Third, test administration procedure
should be strictly followed as far as real-life educational context allows. We
refer to maintaining test transparency (informing students about what is going
182 Olga Kvasova and Vyacheslav Shovkovy
to be measured on the test) and test security (preparing more than one variant
of test papers, conducting assessments within relatively close dates, ensuring
academic honesty). No less important are accuracy and timeliness of grading
test papers, documenting and reporting test results, and analysis of the evi-
dence collected. The post-administration assurance of the test quality is also
viewed as advisable, if not mandatory, in terms of promoting the develop-
ment of better, valid tests in the future. Fourth, since the summative assessment
that we explore is classroom-based or teacher-constructed, it should be com-
plemented by feedback provided to learners particularly in terms of mid-term
assessment. Feedback should be timely and effective, otherwise it is useless
(Coombe et al., 2007). Feedback may and should impact students’ determination
to learn better. Fifth, washback, or feedback from assessees on their satisfaction
with the fairness of test results, should be monitored and regulated. So far, the
studies show that the lowest level of student satisfaction refers to grades and feed-
back (Norton, 2009; HEA, 2012). Finally, the evidence collected through sum-
mative assessment should be supported by multiple measures, such as alternative
forms of assessment. This possibility puts classroom-based, context-relevant assess-
ment at an advantage over large-scale testing which is absolutely context-free.
Given internationally determined perspectives to involve alternative types of
assessment in the function of summative assessment (HEA, 2012), these forms of
assessment should also become integrated in the summative classroom-based
assessment in Ukraine.
The above reflections resulted in distinguishing pre-requisites to reliable sum-
mative assessment, such as:
Implementation
Methods of research
The questionnaires were prepared in consultation with specialists from the
Academy of Higher Education of Ukraine; before administering the survey, we
pre-tested the questionnaires with the help of university managers (3), fellow
teachers (6), and students (15).
Questionnaire 1 was intended for university managers. It purported to
elicit information and personal perceptions of the aspects that immediately
reflected responsibilities of the organizers and supervisors of summative
assessment within their departments, as well as the person accountable for
assessment results. This questionnaire consisted of 22 questions including 20
with the option ‘own answer’, 1 open-ended, and 1 requesting rank-
ordering. Several questions concerned uniformity, test development, and
quality assurance issues. There were also questions focused on the develop-
ment and administration of mid-term tests and end-of course tests, which
are fairly high-stake assessments for many students. The two later questions
inquired about the prospects of improving the quality of summative tests
through enhancing TAL.
Questionnaire 2 was designed for teachers and included 28 questions (27
with the option ‘own answer’ and 1 requesting rank-ordering). Part of the
questions coincided with those aimed at university managers (uniformity of
test papers and administration procedures, test preparation procedure)
although they were formulated from a somewhat different perspective – of
the staff responsible for maintaining uniformity of tests and administration, as
well as for test development and ensuring its quality/validity. Another part of
the questions reflected teachers’ practices in terms of feedback provision and
use of alternative assessments. Teachers were also invited to share their per-
ceptions of possible washback. Question 27 inquired about the forms of
184 Olga Kvasova and Vyacheslav Shovkovy
Participants
The participants in the survey were three distinct though interconnected
groups of respondents: 1) university managers (UMs), 2) teachers (Ts), and
3) students (Sts). Eleven institutions from all regions of the country were
involved in the survey (Western, Southern, Eastern, and Central parts of
Ukraine), which made our sample fairly representative in terms of reflecting
local practices. Although we collected many more responses than initially
planned, we had to exclude a considerable number of inaccurately com-
pleted questionnaires. In the end, we processed 10 questionnaires of UMs,
50 of those collected from Ts, and 50 questionnaires completed by Sts.
Participation in the survey was anonymous (excluding the UMs) and
voluntary. To ensure full comprehension of the questions by all groups of
respondents, questionnaires were formulated in Ukrainian, with the meta-
language excluded.
Data collection
The survey was conducted in paper-and-pencil format in the autumn of 2018.
The sets of responses that arrived from each university contained responses
provided by: one university manager (head of department), 5–10 teachers
working for those departments and 5–10 students who were taught by those
teachers. Consequently, the responses collected from three groups of respon-
dents in each of the ten institutions allowed the researchers to note salient
features of the assessment practices adopted in each local context. Although it
would not be difficult to align all three groups of evidence and arrive at certain
conclusions, for ethical reasons we did not do that, thus leaving the original
scope of the study unchanged.
Reliability of classroom-based assessment 185
Table 10.3 Perceptions of test objectivity vs satisfaction with test results/grades (%)
Respondent Always satisfied Not always satisfied Never satisfied
groups
test objectivity test results test test test test
objectivity results objectivity results
Managers 20 20 70 70 10 10
Teachers 58 55 36 37 8
Students 58 40 34 50 10
Reliability of classroom-based assessment 189
100
90
80
70
60
50
40
30
20
10
0
Short-É
WorkshoÉ
DistantÉ
TraineeshÉ
TraineeshÉ
StaffÉ
StaffÉ
Other
Self-study
knowledge about LTA via the staff seminars conducted by teachers at the same
rank as themselves. 64% of the respondents managed to participate in the
workshops by visiting international experts and another 64% improved their
TAL through self-study. All other opportunities had been taken by less than
50% of respondents.
The training experiences of the respondents, although being far from sys-
tematic, allowed the Ts to express their considered opinion about the most
effective formats. Below is the ranking of formats in order of effectiveness:
As is seen in Table 10.4, the respondents associated the most effective way to
enhance TAL was with traineeship abroad; the 2nd and 3rd preferred formats
included workshops conducted in Ukraine by experts in LTA, as well as workshops
led by Ukrainian experts. We explain such appreciation of these formats by looking
at ongoing processes in Ukrainian education. The increased job responsibilities
related to frequent summative assessment put numerous questions to teachers with-
out clear answers provided by national policy makers. At the same time, four recent
years witnessed a growth of opportunities for teachers to participate in the training
events organized by UALTA; it is probably due to the high standards set by the
invited experts that heightened teachers’ expectations for international collaboration.
Staff seminars, which had been mentioned in the responses to the previous
question as the most accessible format, and conferences, which had been sustain-
able leaders among scholarly conventions, were considerably downgraded by the
respondents. The issue of whether or not this resulted from Ts’ discontent with the
staff seminars’ insufficient informativeness, or the scarcity and insufficient quality of
research into LTA reported at conferences, as well as the conference format in
itself, needs to be ascertained by a special survey of academia.
Reliability of classroom-based assessment 191
None the less, we believe it is quite within our grasp to account for an increased
rating of short-term courses in LTA; moreover, we hypothesized it. Although only
20% of the respondents had an experience of short-term courses in LTA, Ts
assumed that intensive short-term training in LTA would be effective and ranked
it 4th. On the one hand, this is indicative of the recently established practices of
conducting week-long winter/summer schools for teachers in the country. On the
other hand, the idea of training in LTA meets urgent demands for implementing
reliable–trustworthy assessments in higher education. Additionally, such short-
term courses have been piloted by us and were found quite effective (Kvasova,
2016). On the contrary, longer-term courses, as well as traineeship in Ukrainian
universities, distant courses, and self-study, did not meet the respondents’ expec-
tations in terms of their effectiveness, placing them 7th–10th in the rating. We can
assume, however, that these formats were perceived as quite demanding when it
comes to material and human resources (time, effort, cost).
We also had a look into UMs’ perceptions of the effectiveness of the ways to
enhance TAL and correlated them with those discussed above. As is seen in the
diagram, a total agreement is observed in placing short-term courses on LTA in
the middle, and ranking distant-courses the lowest. In two other incidences,
regarding long-term courses and traineeship in Ukrainian universities, the
indices have a larger variance, since managers are responsible for the smooth
flow of instructional process in their departments whereas absence of staff at
their workplace could impede it. In all other incidences, the graph reveals a
similar tendency with both groups of the respondents, although Ts provided
more generous scores to the preferred formats of training events.
The interpretation of the data allowed us to arrive at conclusions regarding
all questions of this research. It appeared that the perceptions of reliability–
trustworthiness by the three groups of informants diverged considerably
wherever they could be compared.
The curve indicating the major stakeholders’ (Sts) data stretches steadily along
medium indices, which points to the respondents’ undetermined perceptions of
all aspects excluding the most meaningful for them – their satisfaction with the
4.97
4.20 3.23
3.25 3.4 3.59 3.19 3.66
3.1 3.1 2.80
2.25 2.54 2.19 1.76
1.87 1.58 1.7
0.91 0.87
Teachers Managers
100
97
70 70
61 58
58 58 55
40
30
obtained grades. The educators’ responses agree only once, in respect to the
uniformity of summative assessment; their indices of satisfaction with the test
results, however, are quite close to each other but are somewhat higher than that
of the Sts. While the Sts’ and Ts’ perceptions of objectivity coincide on mark 58,
the UMs reveal a greater degree of certainty in test reliability; in fact, this con-
tradicts their overall rigorous stance on summative assessment.
On the whole, the UMs revealed the most critical and consistent evaluation
of all aspects of test development, administration, and analysis of test quality.
We attribute such perceptions primarily to great responsibilities of the man-
agerial job they perform, as well as their generally relevant capacity to organize
and monitor summative assessments. The data also suggest that UMs have a
fairly good control over summative assessment implementation although the
degree of collaboration on some stages of test development (e.g. quality assur-
ance), in our view, needs reconsidering. Additionally, UMs should be credited
for the effort invested in TAL enhancement, in particular, for organizing staff
seminars on LTA issues at the departments they head.
The data obtained from Ts enabled a more detailed view on assessments
procedures. The practices testify to assessments being developed and adminis-
tered in compliance with setting-specific requirements although we noted that
some relevant principles of test aspects were seriously compromised. The most
obvious reason for this lies in the lack of solid, specialized training in LTA;
however, even if teachers had a proper level of TAL, they would evidently be
in need of UMs’ support in terms of creating conditions conducive to colla-
borative test development and quality assurance. Nevertheless, what is clearly
positive in the assessment practices surveyed by us is the agreement in the Ts’
and Sts’ perceptions of feedback efficiency and the absence of any negative
impact of assessment on further learning.
The limitations of the study relate to subjectivity of a survey as a method of
investigation. However, the focus on the same concepts from different perspec-
tives allowed us to collect compatible data and enabled insights into real-life
Reliability of classroom-based assessment 193
Conclusion
Reliability–trustworthiness of summative assessments in Ukrainian universities
has been considered in the study from the perspective of central stakeholders.
The information obtained from all groups of respondents shed light on the
assessment practices typical of universities from across the country. The results
confirm the existing views on TAL as a cornerstone of objective and equitable
summative assessment.
Of particular interest for the authors of this chapter are the educators’ sug-
gestions about possible ways of improving reliability of summative assessments;
both groups of educator respondents found it critical to raise TAL, identifying
the preferred formats of training in LTA – traineeship, workshops, and short-
term courses. To conclude, we received salient evidence of some progress in
building TAL in the surveyed universities and identified the most meaningful
formats of TAL enhancement; this stimulates follow-on studies as well as
practical organizational steps.
References
Alderson, J. C. (1999, May). Testing is too important to be left to testers. Plenary address to
the Third Annual Conference on Current Trends in English Language Testing,
United Arab Emirates University.
Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation.
Cambridge University Press.
194 Olga Kvasova and Vyacheslav Shovkovy
Bachman, L., & Palmer, A. (1996). Language testing in practice: Developing useful language
tests. Oxford University Press.
Bolitho. R., & West, R. (2017). The internationalisation of Ukrainian universities: the English
language dimension. British Council. https://www.teachingenglish.org.uk/sites/tea
cheng/files/Pub-UKRAINE-REPORT-H5-EN.pdf
Brown, G. T. L. (2019). Is Assessment for Learning Really Assessment? Frontiers in
Education, 4. doi:10.3389/feduc.2019.00064
Chapelle, C. A. (2013). Reliability in Language Assessment. In C. A. Chapelle (Ed.),
The Encyclopedia of Applied Linguistics (pp. 4918–4923). Blackwell/Wiley.
Carless, D. (2006, September 6–9). Developing synergies between formative and summative
assessment. Paper presented at the British Educational Research Association Annual
Conference, University of Warwick. http://www.leeds.ac.uk/educol/documents/
159474.htm
Carless, D. (2007) Learning-oriented assessment: conceptual bases and practical implications.
Innovations in Education and Teaching International, 44(1), 57–66.
Coombe, C., Folse, K., & Hubley, N. (2007). A Practical Guide to Assessing English
Language Learners. The University of Michigan Press.
Dovgopolova, I. V. (2011) Vprovadzhennia testovoi metodyky v protsess navchannia u
vyshchyh navchalnyh zakladah. [Integration of language testing into language learning
in higher education institutions]. Vyscshia shkola, 2(20), 41–50.
Douglas, D. (2010). Understanding language testing. Routledge.
Fulcher, G. (2012). Assessment literacy for the language classroom. Language Assessment
Quarterly, 9(2), 113–132.
Fulcher, G., & Davidson, F. (2007). Language testing and assessment: An advanced resource
book. Routledge.
Giraldo, F. (2018). Language assessment literacy: Implications for language teachers.
Profile: Issues in Teachers’ Professional Development, 20(1), 179–195.
Green, A. (2016). Assessment literacy for language teachers. In D. Tsagari (Ed.), Class-
room-based Assessment in L2 Contexts (pp. 8–29). Cambridge Scholars Publishing.
Green, A. (2014). Exploring language assessment and testing: Language in action. Routledge.
Gareis, C. R., & Grant, L. W. (2015). Teacher-made assessments: How to connect curriculum,
instruction, and student learning (1st edition). Routledge.
Harlen, W. (2007). Designing a fair and effective assessment system. Paper presented at the
BERA Annual Conference: ARG Symposium Future Directions for Student Assess-
ment. University of Bristol, Bristol.
Harlen, W. (2004). A systematic review of the evidence of reliability and validity of assessment by
teachers used for summative purposes. EPPI-Centre, Social Science Research Unit, Insti-
tute of Education. https://eppi.ioe.ac.uk/cms/Portals/0/PDF%20reviews%20and%
20summaries/ass_rv3.pdf?ver=2006-03-02-124720-170
Hasselgreen, A., Carlsen C., & Helness, H. (2004). European survey of language testing and
assessment needs: Report: Part one – general findings. European Association for Language
Testing and Assessment. http://www.ealta.eu.org/documents/resources/survey-rep
ort-pt1.pdf
Hidri, S. (2016). Conceptions of assessment: Investigating what assessment means to
secondary and university teachers. Arab Journal of Applied Linguistics, 1(1), 19–43.
Houston, D., & Thompson, J. N. (2017). Blending formative and summative assessment
in a capstone subject: ‘It’s not your tools, it’s how you use them’. Journal of University
Teaching & Learning Practice, 14(3).
Reliability of classroom-based assessment 195
Introduction
As social creatures, people need to communicate in social settings in order
to survive and maintain relationships of friendship, enmity, comradeship,
and acquaintanceship by means of oral and/or written language. More than
340 million people speak English, which functions as the lingua franca
around the globe (Celce-Murcia, 2013; Tarone, 2005; Ounis, 2017; Ur,
2012). Accordingly, globalization and the marketplace have necessitated oral
and written English proficiency and it is considered to be one of the key
graduate attributes (Koo, 2009). Given the current status of English and the
major role it has been playing in education and commerce in the Arab
countries and, more specifically, in the Arabian Gulf region since the beginning
of the 20th Century, overall English proficiency is considered a pressing pre-
requisite for swift employment and successful career (Ministry of Labour and
Social Affairs in Bahrain, 2017). Being able to successfully communicate in
English in corporate-level conversations is a demanding criterion to which the
educational ecosystem, and more specifically higher education institutions, need
to pay additional attention when designing curricula.
The significance of oral performance proficiency and speaking as a productive
skill have been highly stressed by curriculum specialists, educationists, and
researchers (Celce-Murcia, 2013; He, 2013; Hismanoglu, 2013; Liu, 2012; Lu &
Liu, 2011; Ounis, 2017, Yaikhong & Usaha, 2012). However, looking at the
regional and local status of English instruction reveals some adversarial realities, as
the educational systems place high importance on reading, writing, and the
teaching of form and function, while less attention is given speaking. Thus, it is
considered the most difficult skill to master due to a number of factors caused by
negligence such as the fear of negative evaluation by peers, low lexical richness,
and lack of formation skills and practice (Al Asmari, 2015; Al Hosni, 2014).
This chapter aims to explore two dimensions of the assessment of the
speaking skill: 1) the current practice of assessing speaking in tertiary education
in Bahrain with reference to some pedagogical structures and to a number of
cognitive variables related to Second Language Acquisition (SLA) and language
200 Diana Al Jahromi
Research questions
The present study aims to answer the following questions:
of the speaking skill has inspired a lot of researchers in that region, given the
number of studies in the field during the last decade (Bashir, 2014; Heng,
Abdulla, & Yusof, 2012; Hidri, 2018; Mahmoodzadeh, 2012; Mak, 2011;
Yahya, 2013). The majority of these studies acknowledged the disposition of
the teaching and assessment of speaking and related it to the influence of
speaking anxiety in EFL settings and to inadequate teaching methodologies and
practices (Hidri, 2017). Acknowledging the factors behind the lack of speaking
proficiency paves the path for appropriate pedagogical implications and
recommendations (Gebril & Hidri, 2019).
Turning to the regional and local contexts, Arab learners of English seem
to have high speaking anxiety in L2 settings (Alhamadi, 2014; Al Jahromi,
2012; Al-Shaboul, Ahmad, Nordin, & Rahman, 2013; Rabab’ah, 2016 Taha
& Wong, 2016; Yahya, 2013. Yahya (2013 revealed that fear of negative
evaluation was the key factor triggering speaking anxiety among Palestinian
undergraduate students. Similar findings were reported by Elmenfi and Gai-
bani (2016). Rabab’ah (2016) claims that Arab learners encounter speaking
difficulties because of inadequate teaching methodologies, and lack of practice
and listening tasks, while Al Asmari (2015) attributes such difficulties to les-
sened motivation and strict evaluation techniques. Locally speaking, a review
of the literature shows that there has been no published research on the status
of speaking and FLA in the Kingdom of Bahrain. However, a number of
unpublished undergraduate graduation studies have shown that university and
school students consider the speaking skill more difficult to master than
reading and writing skills. In one of these studies, only one third of high
school students reported having oral fluency.
While more than half of these studies reported having good communication
skills, the majority of them expressed increased apprehension when randomly
assigned to speak in in-class participation (Abbas, 2017). In Yousif’s (2016)
study, more than 30% of the respondents who were undergraduate students
reported increased anxiety when speaking in class because their L2 teachers did
not teach or assess speaking or provide them with ‘real’ speaking opportunities.
However, almost half of the respondents reported feeling at ease when com-
municating with family members and friends. Similarly, 93% of the respondents
in Salman’s (2016) study revealed that they faced difficulties in speaking and
attributed them to skill deficiency due to teaching (47%) and lack of confidence
(46%). Taleb (2017) used the PSCAS to measure university students’ speaking
anxiety and found a moderate level of anxiety. Categorically, students in these
local studies attributed low proficiency in the speaking skill to a number of
variables that are attested in the current study: the lack of 1) confidence, 2)
native-like speaking opportunities, 3) speaking and communication skills cour-
ses, and 4) proper teaching and assessment practices and processes. Firstly,
speaking anxiety could be instigated by a number of factors; one of the most
fundamental could be related to students’ language competence and personal
traits such as lack of confidence, fear of peer evaluation or teacher evaluation,
shyness, etc. (Abbas, 2017; Al-Nasser, 2015; Elmenfi & Gaibani, 2016; Gan,
2012; Kayoaglu & Saglamel, 2013; McCroskey, 2016; Pathan, Aldersi &
Alsout, 2014; Tanveer, 2007). In Abbas’ (2017) study, almost 60% of university
students acknowledged having speaking anxiety, a large proportion of whom
were high-achieving students while conversely low-achieving ones claimed
having low anxiety during speaking. A similar number of students placed a high
value on the impact having good speaking skills on their language proficiency.
Speaking anxiety in this regard can be directly related Krashen’s Affective
Filter Hypothesis (1987) and Vygotsky’s (1986) Social Constructivist Theory
204 Diana Al Jahromi
and Social Interaction Hypothesis. These theorists posit that L2 learning can
only be successful when the learner is in anxiety-free learning settings and
proximal zones of interaction. L2 learners of English often tend to feel appre-
hensive during speaking attempts or requests and tend to avoid situations in
which they are expected to communicate orally by skipping classes, minimizing
participation, and refusing to take part in oral presentations. When forced to
speak, their avoidance defensive strategy is triggered, resulting in apprehension.
Additionally, school and tertiary curricula design and education systems seem
to comprehensively focus on reading and writing skills at the expense of the
speaking skill (Celce-Marcia, 2013; Koran, 2015). This means students lack oral
proficiency and become anxious when having to orally communicate. It seems
that L2 speaking anxiety is not seriously addressed by practitioners and decision-
makers, who fail to gauge the effect of speaking anxiety on students’ academic
performance and classroom behaviour (Basic, 2011).
Method
Sample
The study sample consisted of 82 L2 university students (75% = female, 25% =
male) enrolled in language learning programs in public and private higher
education institutions in Bahrain (91% = public university, 9% = private uni-
versities). In addition to majoring in English, the majority of these students
206 Diana Al Jahromi
were doing a minor in Translation (43%), French (13.5%), and American Stu-
dies (12%), while another 10% were doing a single major. The vast majority of
these students were mature, given that 90% of these students were between 21
and 23 years old, while the rest were older than that. In addition, almost 80%
of these respondents were senior, about-to-graduate 4th -year students.
Data collection
An online questionnaire of 55 items was administered to private and public
tertiary-level students enrolled in EFL programs. The questions aimed to mea-
sure the relation between a number of variables such as gender, speaking
anxiety, teaching and learning practices, and language exposure outside class-
rooms. In addition to items examining the demographic backgrounds of the
respondents, speaking anxiety was measured using five-point Likert scale items
that were adapted from the FLCAS developed by Horwitz, Horwitz, and Cope
(1986), but modified to speaking. This scale identifies three levels of anxiety:
high, moderate, and little or no anxiety. Out of the 33 items that FLCAS uses,
this study used only 15 items (see Appendix A, Question items 8–33) that mea-
sure students’ oral anxiety using three dimensions: 1) fear of negative evaluation,
2) communication apprehension, and 3) test anxiety. In addition, 10 more
question items were added to measure speaking anxiety. Following these items,
the questionnaire contained 16 questions that inquired about the status of
speaking in academic curricula and instruction (see Appendix A, Question items
34–47), while five more questions investigated the extracurricular exposure to
spoken English outside classrooms (see Appendix A, Question items 49–53).
Finally, interviews with a focus-group of 30 students were conducted to verify
the status of the teaching and assessment of the speaking skill, and to verify the
effects of the variables mentioned above.
Data analysis
Data from the questionnaire was coded and analyzed using descriptive statistics
and measures of central tendency (means, standard deviations, and percentages)
were used to identify the levels of anxiety. Given that only 15 items were
selected and modified out of the 33, the original measures were not used.
Biasing for the interfaces between teaching 207
Conversely, the mean scores were used to measure responses to the question
items, using the following scale: low FLCAS = the mean scores between
1.00–2.00; moderate FLCAS = 2.01–3.50; high FLCAS = 3.51–5.00. A similar
scale was used to measure students’ satisfaction with the academic curricula
and teaching practices: high satisfaction = the mean scores between 1.00–2.00;
moderate satisfaction = 2.01–3.50; low satisfaction = 3.51–5.00. As to measuring
students’ extracurricular exposure to spoken English (question items 49–53),
the mean scores of the responses were analyzed using the following scale: no
or limited exposure = 0.00–1; moderate exposure = 1.01–2.00; high exposure =
2.01–3.00). The Statistical Package for the Social Sciences (SPSS) was used to
examine the correlations and differences among variables such as the level of
speaking anxiety, gender, academic curricula and teaching practices, and the
effect of extracurricular language exposure and use on speaking anxiety.
Pearson correlations and paired sample t-tests were used to confirm such
correlations.
Results
This section presents the findings of the survey and the interviews with the
focus group. First, based on students’ responses to the survey, the mean score of
students responses to FLCAS items (items 8–33) was 2.95 with a standard
deviation of 1.32. This indicates that students have a moderate level of anxiety
when speaking English in class (see Table 11.1).
Second, the level of anxiety was correlated with gender using a paired
sample t-test. Findings, which are presented in Table 11.2, illustrate that no
significant differences were found between males and females in the level of
anxiety when speaking in EFL settings (sig.=0.333).
Third, in relation to measuring students’ satisfaction with the academic cur-
ricula and teaching practices (items 34–47), the mean score of students
responses was 3.58 with a standard deviation of 1.11 (see Table 11.3). This
Table 11.1 Mean scores of the foreign language classroom anxiety scale (FLCAS)
N Minimum Maximum Mean SD
FLCAS 82 1.00 5.00 2.95 1.32
Table 11.3 Mean scores of students’ satisfaction with the academic curricula and
teaching practices
N Min. Max. Mean SD
Students’ satisfaction with the academic 82 1.00 5.00 3.58 1.11
curricula and teaching practices
signifies that students have a low level of satisfaction with the curricula and
the pedagogical practices related to the teaching and assessment of speaking
as a productive skill. More than two-thirds of the students reported that
their L2 curricula do not include speaking courses and that their teachers do
not provide them with in-class speaking opportunities or tasks. 62% denied
receiving any instruction related to public speaking or giving oral presenta-
tions, while the majority of them (81%) demanded having speaking courses
in their L2 programs.
When the focus group was interviewed, a number of students revealed more
details related to their dissatisfaction with the teaching and assessment of
speaking, as exhibited in the following extracts.
During these interviews, students reported heightened anxiety when speak-
ing in class. Interviewed students reported a number of factors that were major
causes of their anxiety; the first and foremost was the lack of speaking courses
Figure 11.1 Students viewpoints regarding the teaching and assessment of speaking (1)
Figure 11.2 Students viewpoints regarding the teaching and assessment of speaking (2)
Biasing for the interfaces between teaching 209
Discussion
A closer look at the results reveals that although L2 university students have
moderate levels of speaking anxiety, they are unsatisfied with their academic
curricula and with their teachers’ teaching practices related to speaking. Hence,
it is imperative that the educational system makes changes to allow for the
adequate teaching and assessment of speaking by means of introducing speaking
courses and availing formative and summative oral, interactive, and collabora-
tive learning tasks and activities. Practitioners and curriculum specialists should
be called upon to undertake drastic changes in the academic programs at the
school and tertiary levels to render speaking a core skill to be taught, learnt, and
assessed. According to the National Research Council (1996), assessment and
learning ‘are two sides of the same coin’ (p.5). Consequently, assessment as
learning emanates from the idea that learning involves the students in an active
and interactive process of cognitive restructuring (Earl & Katz, 2006).
Ur (2012) argues that it is often L2 learners’ principal objective to be able to
communicate orally and fluently in formal and informal interaction, and hence
L2 teachers need to enable them to achieve such an objective. A number of
recommendations in this regard have been suggested by numerous educa-
tionists. Hamzah and Ting (2010) reported that teaching speaking in groups
enhances motivation and lessens speaking anxiety and fear of peer criticism
among individuals. In addition, diagnostic tests need to be undertaken to pin-
point anxious students and provide them with assistance (Woodrow, 2006).
What is more, the teaching of phonology and more particularly pronunciation
needs to be introduced at the early school cycles in order to gauge learners’
accent, stress, rhythm, and intonation in the early stages of learning the target
language (Shively, 2008) and equip them with the oral skills needed to bridge
the gap between academic school levels and tertiary education and workplace
requirements (Lindsay & Knight, 2006). Hence, interactive classroom activities
need to be implemented for the production of a consistent and meaningful
output by means of introducing the practice of real-life speaking in classroom
settings to reduce speaking apprehension and help students identify the areas in
which they need enhancement to augment their oral fluency (Harmer, 2010;
Biasing for the interfaces between teaching 211
Koran, 2015). Koran argues that a good teacher is the one who assesses their
students speaking skill by means of both observations, quizzes, or exams designed
to evaluate their oral proficiency. For perfecting students’ speaking competence,
teachers have to provide constructive feedback, facilitate in-class discussions and
debates, and provide students with listening material (Harden & Crosby, 2000).
First and foremost, a holistic reformation of the academic curricula needs to be
proximately implemented in order to ensure that speaking is incorporated as an
important segment of any academic L2 program and that it is fused and assessed
as a key intended learning outcome. Longitudinal future studies that address the
pedagogical practices in the pre-tertiary educational cycles and that measure the
status of teaching and assessment of the speaking skill are required in order for
the educationists and decision-makers to be able to rectify the long-term negli-
gence of speaking and incorporate it in the L2 curricula.
Appendix A
Year 2
Year 3
Year 4
Not applicable
Other: (please specify) _________________
5 If you are a university student majoring in English, what is your minor?*
Translation
French
American Studies
No minors
Other: (please specify) _________________
6 How would you rate your speaking skill?*
Excellent
Good
Fair
Somewhat poor
Very poor
7 How would you rate your English language skills in general?*
Excellent
Good
Fair
Somewhat poor
Very poor
A. Speaking competence:
Kindly read the following statements and provide your opinion with reference
to your speaking competence.
Statement SD D N A SA
8. I never feel quite sure of myself when I am
speaking in my English class.
9. I don’t worry about making mistakes when
speaking in my language class.
10. I tremble when I know that I’m going to be
called on to speak in class.
11. It frightens me when I don’t understand
what the teacher is saying during class.
Biasing for the interfaces between teaching 213
Statement SD D N A SA
12. During English class, I find myself thinking
about things that have nothing to do with the
course.
13. I keep thinking that the other students are
better than me in English.
14. I am usually at ease during oral tests.
15. I start to panic when I have to speak with-
out preparation in my language class.
16. I don’t understand why some people get so
worried about giving oral presentations.
17. While speaking in class, I can get so nervous
I forget things I know.
18. It embarrasses me to volunteer answers in
my language class.
19. I do not get nervous speaking in English
with native speakers of English.
20. I get upset when I don’t understand why I
got bad marks in my oral test.
21. Even if I am well prepared for the oral tasks,
I feel anxious about it.
22. I often feel like not going to my language
class when there is an oral activity.
23. I feel confident when I speak in the English
class.
24. I am afraid that my language teacher is
ready to correct every mistake I make when I
speak.
25. I am afraid that students in my English class
are ready to correct every mistake I make when
I speak.
26. I can feel my heart pounding when I’m
going to be called on to participate in the
English class.
27. I always feel that the other students speak
English better than I do.
28. I feel self-conscious about speaking English
in front of other students.
29. I get confused when I am speaking in my
English class.
30. I feel overwhelmed by the number of rules
you have to learn to speak good English.
31. I am afraid that the other students will laugh
at me when I speak in English.
214 Diana Al Jahromi
Statement SD D N A SA
32. I would probably feel comfortable speaking
around native speakers of English.
33. I get nervous when the teacher asks ques-
tions which I haven’t prepared for in advance.
Statement SD D N A SA
34. My program curriculum includes a general
speaking course.
35. My program curriculum includes a public
speaking course.
36. Our language instructors encourage us to
speak in class.
37. I have given oral presentations during the
course of my study.
38. I have given more than three oral presenta-
tions during the course of my study
39. I have been taught how to give good oral
presentations.
40. Assessment in language courses includes
speaking.
41. Language courses allow for in-class speaking
activities.
42. Our instructors engage us in in-class debates
and discussions.
43. Our language department provides us with
opportunities to practice speaking.
44. Course activities and assignments require
the use of media.
45. Our L2 instructors are fluent in English.
46. Our L2 instructors teach us in English.
47. Speaking courses need to be introduced into
the program.
Biasing for the interfaces between teaching 215
48. What are the courses which promote speaking and/or giving oral pre-
sentations? (You can select more than one):
a) Major courses
b) Minor courses
c) Language courses
d) Literature courses
e) Linguistics courses
f) Other: (please specify) ____________________________
Recommendations
54. Would you like to recommend ways to enhance students’ speaking skill?
a) Yes
b) No
55. Kindly use the space below to provide us with your recommendations or
comments, if any.
References
Abbas, M. (2017). English classroom speaking anxiety among English major students
(Unpublished undergraduate thesis). University of Bahrain.
Abudrees, T. (2017). The differences in applying the aspects of connected speech
between first-year and fourth-year non-native speaking students at the University of
Bahrain (Unpublished undergraduate thesis). University of Bahrain.
Al Asmari, A. (2015). Communicative language teaching in EFL university context:
Challenges for teachers. Journal of Language Teaching and Research, 6(5), 976–984.
Al Hosni, S. (2014). Speaking difficulties encountered by young EFL learners. Interna-
tional Journal of Studies in English Language and Literature (IJSELL), 2(6), 22–30.
Al Jahromi, D. (2012). A study of the use of discussion boards in L2 writing instruction
at the University of Bahrain (Unpublished doctoral thesis). University of Sheffield.
Alhamadi, N. (2014). English speaking learning barriers in Saudi Arabia: A case study of
Tibah University. AWEJ, 5(2), 38–53.
Al-Nasser, A. S. (2015). Problems of English language acquisition in Saudi Arabia: An
exploratory-cum-remedial study. Theory and Practice in Language Studies, 5(8), 1612–1619.
Al-Qahtani, M. F. (2013). Relationship between English language, learning strategies, atti-
tudes, motivation, and students’ academic achievement. Educ. Med. Journal, 5, 19–29.
AlRashid, N. E. (2017). The effectiveness of teaching English speaking and writing in
Bahraini government secondary schools (Unpublished undergraduate thesis). Uni-
versity of Bahrain.
Al-Shboul, M. M., Ahmad, I. S., Nordin, M. S., & Rahman, Z. A. (2013). Foreign
language reading anxiety in a Jordanian EFL context: A qualitative study. English
Language Teaching, 6(6), 1–19.
Arnaiz, P., & Guillen, F. (2012). Self-concept in University-level FL Learners. The
International Journal of the Humanities: Annual Review, 9(4), 81–92.
Bashir, S. (2014). A study of second language-speaking anxiety among ESL intermediate
Pakistani learners. International Journal of English and Education, 3(3), 216–229.
Basic, L. (2011). Speaking anxiety: An obstacle to second language learning? (Unpub-
lished doctoral thesis). University of Gävle.
Bygate, M. (1987). Speaking. Oxford University Press.
Celce-Murcia, M. (2013). Teaching English in the context of world Englishes. In M.
Celce-Murcia, D. M. Brinton, & M. A. Snow (Eds.), Teaching English as a second or
foreign language (4th edition, pp. 2–14). National Geographic Learning/Cengage
Learning.
Coates, J. (2014). Women, men and language: A sociolinguistic account of gender differences in
language. Taylor and Francis.
Biasing for the interfaces between teaching 217
Dörnyei, Z. (2005). The psychology of the language learner: Individual differences in second
language acquisition. Routledge.
Earl, L., & Katz, S. (2006). Rethinking classroom assessment with a purpose in mind. Western
and Northern Canadian protocol for collaboration in education. Manitoba Education, Citi-
zenship, and Youth. https://digitalcollection.gov.mb.ca/awweb/pdfopener?smd=1&
did=12503&md=1
Elmenfi, F., & Gaibani, A. (2016). The role of social evaluation in influencing public
speaking anxiety of English language learners at Omar Al-Mukhtar University. Arab
World English Journal, 7(3), 496–505.
Ezzi. N. A. (2012) Foreign language anxiety and the young learners: Challenges ahead:
Rethinking English language teaching. In TESOL Arabia conference proceedings: Pro-
ceedings of the 17th TESOL Arabia Conference (Vol. 16, pp. 56–62). TESOL Arabia
Publications.
Fakhri, M. (2012).The relationship between gender and Iranian EFL learners’ foreign
language classroom anxiety. International Journal of Academic Research in Business and
Social Sciences, 2(6), 147–156.
Gan, Z. (2012). Understanding L2 speaking problems: Implications for ESL curriculum
development in a teacher training institution in Hong Kong. Australian Journal of
Teacher Education, 37(1), 43–59.
Gregersen, T. S. (2003). To err is human: A reminder to teachers of language-anxious
students. Foreign Language Annals, 36(1), 25–32.
Hamzah, M. H., & Ting, L. Y. (2010). Teaching speaking skills through group work activities
(A case study at form 2ES1 SMK Damai Jaya Johor). https://core.ac.uk/download/files/
392/11785638.pdf
Hanifa, R. (2018). Factors generating anxiety when learning EFL speaking skills. Studies
in English Language and Education, 5(2), 230–239.
Harden, R. M. & Crosby, J. (2000). The good teacher is more than a lecturer – the
twelve roles of the teacher. Medical Teacher, 22(4), 334–347.
Harmer, J. (2010). How to teach English. Pearson Longman.
He, D. (2013). What makes learners anxious while speaking English: A comparative
study of the perceptions held by university students and teachers in China. Educational
Studies, 39(3), 338–350.
Heng, C. S., Abdullah, A. N., & Yosaf, N. B. (2012). Investigating the construct of
anxiety in relation to speaking skills among ESL tertiary learners. 3L: The Southeast
Asian Journal of English Language Studies, 18(3),155–166.
Gebril, A. & Hidri, S. (2019). Language assessment in the Middle East and North Africa.
[special issue: The status of English language research in the Middle East and North
Africa: An introduction]. Arab Journal of Applied Linguistics, 4(2), i–vi.
Hidri, S. (2018). Assessing spoken language ability: A many-facet Rasch analysis. In S.
Hidri (Ed.), Revisiting the assessment of second language abilities: From theory to practice (pp.
23–48). Springer.
Hidri, S. (2017). Introduction: State-of-the-art of assessing second language abilities. In
S. Hidri (Ed.), Revisiting the assessment of second language abilities: From theory to practice
(pp. 1–19. Springer.
Hismanoglu, M. (2013). Does English language teacher education curriculum promote
CEFR awareness of prospective EFL teacher? Procedia – Social and Behavioral Sciences
Journal, 93, 938–945.
218 Diana Al Jahromi
Horwitz, E. K., Horwitz, M. B., & Cope, J. A. (1986). Foreign language classroom
anxiety. The Modern Language Journal, 70(2), 125–132.
Kayaoğlu, M. N., & Sağlamel, H. (2013). Students’ perceptions of language anxiety in
speaking classes. Tarih Kültür ve Sanat Araştırmaları Dergisi, 2(2), 142–160.
Kingen, S. (2000). Teaching language arts in middle schools: Connecting and communicating.
Lawrence Erlbaum Associates.
Koo, Y. L. (2009). Mobilising learners through English as lingua franca (ELF): Providing
access to culturally diverse international learners in higher education. Research Journal
of International Studies, 3(9), 45–63.
Koran, S. (2015). Analyzing EFL teachers’ initial job motivation and factors affecting
their motivation in Fezalar Educational Institution in Iraq. Advances in Language and
Literary Studies, 6(1), 72–80.
Krashen, S. (1987). Second language acquisition. Oxford University Press.
Lindsay, C., & Knight P. (2006). Learning and teaching English: A course for teachers.
Oxford University Press.
Liu, H. J. (2012). Understanding EFL undergraduate anxiety in relation to motivation,
autonomy, and language proficiency. Electronic Journal of Foreign Language Teaching, 9
(1), 123–139.
Lu, Z., & Liu, M. (2011). Foreign language anxiety and strategy use: A study with
Chinese undergraduate EFL learners. Journal of Language Teaching and Research, 2(6),
1298–1305.
Mahmoodzadeh, M. (2012). Investigating foreign language speaking anxiety within the
EFL learners’ interlanguage system: The Case of Iranian learners. Journal of Language
Teaching and Research, 3(3), 466–476.
Mak, B. (2011). An exploration of speaking-in-class anxiety with Chinese ESL learners.
System, 39, 202–214.
McCroskey, J. C. (2016). Introduction to rhetorical communication: A Western rhetorical
perspective. Routledge.
McCroskey, J. (1978). Validity of the PRCA as an index of oral communication
apprehension. Communication Monographs, 45(3), 192–203.
Ministry of Labour and Social Affairs in Bahrain. (2017). Workplace requirements for better
and faster employment. Paper presented at Media, Tourism, and Fine Arts Stakeholders’
Forum. University of Bahrain, Bahrain.
Mohammadi, M., & Mousalou, R. (2013). Emotional intelligence, linguistic intelli-
gence, and their relevance to speaking anxiety of EFL learners. Journal of Academic and
Applied Studies, 2(6), 11–22.
National Research Council. (1996). National science education standards. National Acad-
emy Press.
O’Sullivan, B. (2006). Modelling performance in oral language tests: Language testing and eva-
luation. Peter Lang.
Ounis, A. (2017). The assessment of speaking skills at the tertiary level. International
Journal of English Linguistics, 7(4), 95–112.
Park, G. P., & French, B. F. (2013). Gender differences in the foreign language class-
room anxiety scale. System, 41, 462–471.
Pathan, M., Z. Aldersi, & E. Alsout (2014). Speaking in their language: An overview of
major difficulties faced by the Libyan EFL learners in speaking skill. International
Journal of English Language & Translation Studies, 2(3), 96–105.
Biasing for the interfaces between teaching 219
Introduction
Test washback has been defined as the effects of tests on teaching and learning;
consequently, any introduction of a new test should plan for positive washback
(Wall, 2013). Assessment tasks, whether summative or formative, should
therefore be designed in a way that engages students in the necessary knowl-
edge, skills, and abilities (KSAs) needed to perform effectively in the real world
beyond the confines of the classroom. Arguably, such a focus is particularly true
for high-stakes proficiency tests, such as those used for school leaving or uni-
versity entrance, where governments and education department – especially
those within the European Union – have been obliged to take the Common
European Framework of Reference for Languages (CEFR) into account. As a
result, new educational initiatives have been plentiful as policy makers attempt
to incorporate competence-based language education (Lim, 2014).
The main purpose of such initiatives is both to promote learning and bring
about a shift in language pedagogy – from knowledge-based to more com-
municative practices – and to validly interpret what has been learned. Despite
the increasing pressure on teachers to be instigators of such changes, for many,
a lack of language assessment literacy (LAL) prevents them from successfully
fulfilling this role (Fulcher, 2012; Hidri 2019, 2018, 2014). Indeed, a lack of
LAL amongst teachers has been widely reported, which is arguably particularly
true in the case of standardized tests (Tsagari & Vogt, 2017). In Tsagari and
Vogt’s study, teachers did not feel that they had the correct training to help
students prepare for tests, and at most, test preparation took the form of
administering past papers without critically evaluating them.
Ultimately, it is teachers who will need to prepare their students for any
standardized test through the provision of support for learning outcomes,
classroom assessments to measure and track students’ progress, and other feed-
back. Students need to be given the tools to reflect on their learning, under-
stand their strengths and weaknesses, and develop learner autonomy in order to
develop life-long learning strategies. Teachers act as mediators between the
language class and the test. As such, it is arguable that the first step in instigating
Planning for positive washback 221
reforms would be a fully comprehensible description of any new test and how
it relates to the present curriculum, together with supporting construct validity
evidence. Tests with good construct validity promote positive washback, and
the move from classroom activities to test tasks should be fluid (Messick, 1996).
A new test must therefore be based on a clear definition of language profi-
ciency and have a strong relationship to the curriculum, and this information
must be provided to teachers. Teachers not only need to understand the cur-
riculum standards and test constructs but be able to relate this knowledge to
their professional practices if they are to bring about the desired washback effect
on student learning.
The present study is situated in the context of one such initiative in Spain,
where education reform laws have been introduced together with a new
communicative, competence-based curriculum. This new curriculum comes
largely in response to the growing demand in Europe for the implementation
of CEFR-related, competence-based curriculums, and as an attempt to
improve the poor results of Spanish students (European Commission, 2012),
and follows years of academic criticism of the previous system. The main cri-
ticism has been that no oral component has, until now, been included in the
exam.1 Furthermore, it has been extensively reported that teachers do indeed
teach to the test, and that consequently, a narrow form of the curriculum is
regularly taught in the classroom, with listening and speaking being largely
ignored (e.g., Amengual Pizarro, 2009; García Laborda & Fernández Álvarez,
2011). Yet, listening is an essential component of communicative competence
(adults spend nearly 50% of their time listening) and plays a key role in suc-
cessful language acquisition (Wagner, 2014), thus contributing to academic
success. This is especially true in the context of university entrance, where
universities are increasingly offering courses taught in English.
The situation in Spain is therefore ripe for change; in order to achieve a
positive impact on teaching and learning, any new assessment should not only
clearly evaluate the competencies outlined in the new curriculum, but also
provide evidence that this is the case. While tests have been shown to bring
about changes in educational systems in many different contexts (Cheng, Sun,
& Ma, 2015), positive change can only be brought about if a test accurately
reflects the aims of the curriculum (Wall, 2013). It is hoped that this study will
be a timely contribution to just such an outcome.
Theoretical background
A central issue in language testing is the question of the theory-defined con-
struct: Before test development can begin, it is essential that a theoretical stance
on the nature of language ability first be taken (Chapelle, 2012). Addressing the
continual debates concerning language proficiency constructs, Bachman (2007)
concludes that both competence and task-based perspectives should be taken
into account. Such an approach resonates well with the CEFR, which provides
222 Caroline Shackleton
1 Predicting content.
2 Monitoring comprehension.
3 Making inferences.
These strategies mediate between trait and context and, because task specific
behaviours are context relevant, listeners must develop ‘real world strategies’ in
order to achieve comprehension (Field, 2008a). Figure 12.1 shows a repre-
sentation of the proposed theoretical construct for listening ability.
Response
Representation
of speech in
memory
Semantic processing
Discourse
Construction
Prior
Planning Monitoring Inference
knowledge:
world, pragmatic,
Metacognitive strategies
Parsing
Word string
Linguistic processing
Linguistic
Knowledge Lexical Search
Phonological
string
Input Decoding
Speech Input
Prediction
Figure 12.1 Proposed model of listening ability (based on Field, 2008a, 2013a)
224 Caroline Shackleton
Not only must any theory-based process model of listening ability be repre-
sented in the construct of a new test, but it must be clearly shown that candi-
dates use the same KSAs as they would in the target language use domain
(TLU). Context-specific features of test tasks are normally outlined in the test
specifications, and evidence should be provided that tasks do indeed represent
the proposed TLU. These context specific features should include elements
such as the source of input texts, channel of delivery, number of plays, and the
response format. Most importantly, we need to consider the characteristics of
the input passages and how these will relate to the TLU. A key debate here is
that regarding the authenticity of the audio used.
At present, most language tests use scripts, i.e., written texts which are then read
aloud (Buck, 2018; Wagner, 2014). These texts are often revised and edited before
being produced in a studio by actors; ‘far too often listeners are expected to be able
to understand texts that are meant to be read’ (Vandergrift & Goh, 2012, p.167).
Here, construct under-representation is an obvious threat, as a scripted text lacks
many of the characteristics of natural speech (Field, 2008a, 2013b, 2017; Vander-
grift & Goh, 2012). Indeed, several studies highlight the differences between
spoken and written discourse (for review, see Wagner & Toth, 2017). Natural,
connected speech is very different from the written word and can include gram-
matical mistakes, shorter idea units, and ellipses. Furthermore, it tends to be less
logically organized as a consequence of its unplanned nature (Wagner, 2014). Not
only can spoken language be more colloquial, containing fillers and repetition, but
its intonation patterns carry substantial meaning (Buck, 2018). In contrast, Field
(2013b) argues that actors mark commas and full stops, there are no hesitations or
false starts, and voices rarely overlap. Furthermore, test developers often put in
scripted distractors, making a recording much more informationally dense and
placing too great a strain on the working memory (Field, 2013a).
Consequently, there are many calls for a move towards more authentic input
texts, both for teaching and assessment purposes (e.g., Field, 2008a, 2013a;
Gilmore, 2011; Vandergrift & Goh, 2012; Shackleton, 2018a; Wagner, 2014;
Wagner & Toth, 2017). As Field (2013a, p.143) states, ‘if a test is to adequately
predict how test takers will perform in normal circumstances, it is clearly
desirable that the spoken input should closely resemble that of real-life con-
versational or broadcast sources’. In the case of school leaving/university
entrance tests, the range of genres found in the TLU should be sampled and
these would represent a continuum of aurality (Shohamy & Inbar, 1991; Van-
dergrift & Goh, 2012), from a planned talk to a spontaneous conversation.
A related issue concerns questions of accent, English as a lingua franca (ELF),
and the ongoing debate about the status of the native speaker as an ideal model
for assessment. Most international tests still limit themselves to accents drawn
from the major native-speaker varieties (British English, American/Canadian
English, and Australian English). However, the relevance of standard native-
speaker varieties has more recently been brought into question, and the lack of
ELF-based examples in most language tests has been the subject of criticism
Planning for positive washback 225
Research problem
The proposed new test must take the above debates into account; if positive
washback is to be encouraged in the present context, I would argue that both
authentic input texts and a range of both native and non-native speaker varieties
be included in the test construct. There are further reasons why this should
indeed be the case: If a new test is to be used for university admissions, it should
be clear just what competencies are being assessed, and here a CEFR-related test
can provide test users with a well-defined description. I would also contend that
there are a number of reasons why any CEFR-based test for the context under
discussion must be aimed at a B2 CEFR level; not only do the B2 descriptors
most resemble current Spanish curriculum requirements, but B2 is currently the
required university entrance level for most other European countries, as it is
considered to be the most appropriate level for basic academic study and work
insertion. While there have been some doubts expressed as to whether a higher
level may be necessary (Taylor & Geranpayeh, 2011), B2 is generally felt to be a
feasible minimum. For example, Carlsen (2018) reported that students entering
Norwegian universities with a proficiency level lower than B2 lacked the
necessary language skills for success on their courses. Such findings would suggest
that for the Spanish education system to keep in line with other European
countries, an ideal scenario would see students leaving upper secondary school
with a B2 level minimum (Deygers, Zeidler, Vilcu, & Hamnes Carlsen, 2018;
Lim, 2014). In light of the above issues, the motivation for the present study can
be framed as the need to develop a test which engages the proposed listening
ability model, which incorporates authentic discourse (including a variety of
accents), and which adequately reflects those CEFR B2 competences relevant to
the context of school leaving/university entrance demands.
Rationale
The correct identification of the TLU is clearly a key factor in the successful
operationalization of any test construct and, as such, the test specifications
should reflect those CEFR B2 abilities which students would be expected to
employ beyond the confines of the test in a school leaving/university entrance
context. Accordingly, the current study drew upon the following sources in
order to develop its specifications:
226 Caroline Shackleton
Following the previous discussion, it was decided to use only authentic audio
files sourced from the internet or produced as a natural response to prompts in
order to obtain samples of non-adapted natural discourse which would include a
variety of accents (including one L2 speaker). Four different tasks were chosen, so
as to create adequate construct coverage and to minimize the task effect by
including a mix of task types. Each sound file lasts between three and five minutes
and is to be heard twice. In order to develop tasks that correspond to real-world
communicative events, purposeful items based on expert behaviour need to be
developed. To this end, a textmapping protocol (see Green, 2017) was followed. This
process makes no reference to a transcript; instead, after the purpose for listening
has been decided, a group of experts notes down the salient ideas taken away from
a given audio in order to replicate the real-world listening process as faithfully as
possible. In this way, an attempt is made to replicate specific types of listening
(Weir, 2005, p.101) by reaching a consensus on meaning, thereby modelling the
activity on expert cognitive processing behaviour as suggested by Field (2008a).
The development of all subsequent items is then based on this understanding of
the audio material in question.
Table 12.1 gives a breakdown and brief description of the four tasks based on
audios which were considered to be suitable for exploitation in accordance
with the test specifications. A small pilot study was carried out and items which
appeared to discriminate badly or be too easy/difficult for the pilot population
were removed. In order to discover if the test represents the proposed cognitive
processing view of listening, the following research question was addressed:
To what extent does the behaviour elicited from a test taker correspond to
the relevant knowledge, skills and abilities that would be required of him/
her in a real-world context?
Method
Data collection
After piloting the methodology with two participants, seven volunteers esti-
mated to have a CEFR B2 listening proficiency level were enrolled (male = 4,
female = 3), and the following two-stage design was employed:
Whilst finalizing their answers, participants explained how they had reached the
answer to each item. Each of the four tasks was completed separately in order to
reduce the time lag between doing the test and reporting it to a minimum. Partici-
pants were given the option of reporting in their L1 in order to reduce the cognitive
load when expressing their thoughts (Banerjee, 2004), although in the event only
one participant chose to actually do this. Once collected, the reports were transcribed
in preparation for coding using Qualitative Data Analysis (QDA) software Minor Lite.
The data was coded separately for each item on the test as a representation of the level
of processing reached in order to correctly solve the item. These levels of processing
were drawn directly from the listening ability model and are as follows:
L – Lexical recognition: The understanding of isolated vocabulary from the
audio input.
IU – Idea unit: A proposition, which could be as little as a noun phrase
(Buck, 2001, p.27–28), is used to answer the item. This is understanding at a
very literal level and includes local factual information.
MR – Meaning representation: The listener relates a proposition to the
context and uses prior knowledge in order to interpret meaning.
DR – Discourse representation: The listener is able to integrate information
into a wider semantic representation, including speaker intention.
In order to generalize from the results, reliability checks must be carried out.
In the present study, the researcher re-coded one entire protocol six months
after the original coding. The resulting intra-coder agreement between both
sessions was 87% exact agreement and Cohen’s Kappa, which takes into
account agreement by chance, was 0.782 (p <.0.001), 95% CI (0.65, 0.91), a
substantial agreement according to Landis and Koch (1977). The results are
presented quantitatively with illustrative examples in order to draw conclusions
about the listening processes necessary to answer the test items (for further
examples see Shackleton, 2018b). Furthermore, as construct-irrelevant strategies
would obviously pose a threat to the validity of the test, it was also decided to
report qualitatively on candidates’ strategy use.
Planning for positive washback 229
Data analysis
Test tasks provide the candidate with a purpose for listening and informa-
tion about the context at hand, information which can then be acted upon
to activate relevant schemata and generate hypotheses (Shohamy & Inbar,
1991). The concurrent reports from this planning stage were analyzed
according to emerging themes in QDA Minor Lite and were categorized as
follows:
This part of the test taking process has been called ‘assessing the situation’
(Buck, 2001 p.104) and all participants in the present study reported such
strategy use. By activating relevant schemata using cues from the task title,
picture, and items they were able to make predictions about the content of the
audio files. This strategy use was especially evident on the MCQ items, which
allowed the participants to build a skeleton story of just what they were going
to hear. Building on previous knowledge schemata, they also made predictions
based on their knowledge of the world and personal experiences, as can be seen
in the example below referring to Task 2:
Example:
So, he’s talking first about the flat and then when he arrives what he’s
going to do then what was hard at first in USA. What he had to learn, the
differences between Mexico and USA, why he felt accepted … I’m
thinking about key points.
OK I understand. Cos I’ve been living in USA and I understand the
situation.
In this one he finds it difficult … for me it will be name or accent, cos
that’s what happened to me.
And he wants Americans to know … for me it would be to know
where he’s from… people see Mexican people like they are from a village
and they don’t have culture and internet and things like that.
(Participant 7, Task 2)
The results of the retrospective reports are presented as the highest level of
processing reached to correctly answer the items on each of the tasks. The
results for Task 1 – the gist task – are shown in Figure 12.2, where it can be
seen that most of the correct answers were arrived at as a result of a meaning
representation of the sound file, while the three highest scoring participants
reached a discourse representation on three of the items.
230 Caroline Shackleton
Discourse
representation
Listening process
Meaning
representation
Idea unit
Iexical recognition
0 2 4 6 8 10 12 14 16 18 20
Frequency
I heard ‘I completely agree’ and so I think it’s this because you can’t
disagree about women doing sport as it is not accepted. And I heard
about people fighting against each other, this was another key piece of
information for me … people fighting. This is a boxing competition.
(Participant 4, Q1.5)
Discourse
representation
Listening process
Meaning
representation
Idea unit
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Frequency
At first, his main problem was to find somewhere to live so he found a guy
online that was renting part of his apartment. I hear that it was not diffi-
cult, he don’t use this exact words, but he said something like he contact
by email and it was relatively easy because he sent the email, he get the
answer about the flat and then his professor pick up him from the airport
and drop directly to the apartment so for me that’s easy, it’s not time-
consuming and he said ‘that’s it’, the professor took him to the flat and he
was there.
(Participant 5, Q2.1)
In many cases, the participants used inference to decipher the speaker’s under-
lying intentions, as shown in the example below.
Example:
The metacognitive strategy of monitoring was also highly evident for this task,
especially for discarding distractors, as can be seen in the following example:
Example:
… he said that the American people, they try to guess where he’s come
from and the people say France, Poland. I’m 60% sure that it is ‘work out
his accent’. ‘Try to get to know him’, I think no. I discard ‘understand his
accent’ no because he can communicate, they can understand fluently …
yeah he said something about his name but this was more about the phy-
sical aspect, he don’t look like the typical Mexican guy, it’s not about say
his name … that wasn’t the meaning.
(Participant 5, Q2.7)
Similar results were found for Task 3, which is also a MISD/IPM task, but
includes a dialogue rather than a monologue. Figure 12.4 shows the level of
processing reached in order to answer items correctly for this task.
As in Task 2, most items were answered by reaching a discourse representa-
tion of the audio. In the instances where the item was answered correctly by
simply understanding an idea unit, meaning was created by using inference,
monitoring, and contextual clues. Here, listening may be seen as a problem-
solving process which includes a combination of strategies used in an orche-
strated way (Vandergrift, 2003).
Task 4 is a NF task and the intention is to test search listening for specific
information and important details. Figure 12.5 shows the level of processing
reached to answer the items on this task.
Although most of the correct answers were arrived at through a discourse
representation of the text, it should be noted that due to the nature of the
input audio – a teacher giving a class quite factual instructions about an
upcoming geography trip – the discourse was quite straightforward and con-
tained mainly local factual information. As might be expected, therefore, it was
Discourse
representation
Listening process
Meaning
representation
Idea unit
0 2 4 6 8 10 12 14 16 18 20 22 24
Frequency
Discourse
representation
Listening process
Meaning
representation
Idea unit
Iexical recognition
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
Frequency
found that strategy use was minimum and items were mainly answered by
applying linguistic knowledge, although intonation patterns were also used as
clues signalling important information.
Example:
To summarize, it can be seen that the participants demonstrated that they were
using the knowledge and skills proposed by the listening ability model in order
to solve test items. This observation can be seen in more detail in Table 12.2,
which gives the level of processing reached on an item-by-item basis. Table
12.2 shows that the most difficult items on the test for this small group of
participants were Q3.5 and Q4.4 – two of the most difficult items, as shown by
a Rasch analysis of test scores. There were only two correct answers in which
evidence of proper recourse to the listening ability model was not observed
(items Q3.3 and Q3.7), though this may simply have been due to the lack of
full reporting. Incorrectly answered items (26%) were the results of candidates
either missing the information or being unable to decode sufficient input.
Discussion
While the CEFR outlines listening behaviours for each proficiency level, it
does not provide a clear description of just which processes and strategies
should be used at each level and, as such, these are open to interpretation and
must necessarily be extrapolated from the ‘can dos’. The CEFR describes B2
level listeners as being able to understand main ideas and follow conversations
and talks, descriptors which would suggest meaning and discourse representa-
tions of the input. This was indeed found to be the case in the present data,
234 Caroline Shackleton
Table 12.2 Frequencies of level of listening process reached for each correct item
Lexical Idea unit Meaning Discourse No repor- Number of
recognition representation representation ted correct
processes responses
(N=7)
Q1.1 1 0 2 2 0 5
Q1.2 0 0 3 0 0 3
Q1.3 0 2 2 0 0 4
Q1.4 0 1 4 0 0 5
Q1.5 1 2 2 0 0 5
Q1.6 2 2 3 0 0 7
Q1.7 0 1 2 1 0 4
Q2.1 0 0 2 4 0 6
Q2.2 0 0 2 5 0 7
Q2.3 0 0 2 4 0 6
Q2.4 0 0 2 3 0 5
Q2.5 0 0 3 4 0 7
Q2.6 0 0 2 2 0 4
Q2.7 0 0 3 4 0 7
Q2.8 0 1 2 2 0 5
Q3.1 0 0 2 5 0 7
Q3.2 0 0 5 2 0 7
Q3.3 0 0 2 3 1 7
Q3.4 0 1 0 2 0 3
Q3.5 0 1 1 0 0 2
Q3.6 0 0 1 4 0 5
Q3.7 0 0 2 3 1 7
Q3.8 0 1 2 3 0 6
Q4.1 0 2 1 1 0 4
Q4.2 0 1 2 4 0 7
Q4.3 0 0 2 1 0 3
Q4.4 0 1 0 1 0 2
Q4.5 0 0 1 2 0 3
Q4.6 0 1 1 5 0 7
Q4.7 0 0 0 6 0 6
Q4.8 1 2 1 1 0 5
Q4.9 0 1 1 3 0 5
Q4.10 0 2 0 3 0 5
Total 5 22 60 80 2 171
Planning for positive washback 235
where most correct answers were shown to be reached from higher levels of
processing. In contrast, lower level listeners would have had difficulty reaching
meaning and discourse representations as they would need to focus their
attention on lower level perceptual processing (Field, 2017). Correct answers
based on the understanding of individual vocabulary items and idea units were
found to be minimum, and in all cases were shown to be the result of recourse
to meta-cognitive strategy use.
In terms of construct-irrelevant variance, none of the guessing strategies
during the pre-listening stage led to the correct answer being ascertained.
Nevertheless, one participant – the lowest scoring – appeared to guess two
answers correctly for Task 3 (an MCQ task) without understanding the audio,
that is to say, without proper recourse to the listening ability model. Such a
finding has also been reported in previous studies regarding MCQ items (Yi’an,
1998). Nevertheless, and in contrast to Yang (2000), who found that 48% to
64% of the listening and reading items on the old TOEFL Practice Test B
could be answered using construct irrelevant test-wiseness strategies, the figure
for the present study represented only 1% of total correct answers.
It can therefore be argued that the test scores measure the construct well and
may be meaningfully interpreted for generalization. In this regard, the verbal
reports demonstrated that higher-scoring participants understood the audios
better than lower-scoring participants and that the listening process model was
very much evidenced.
In sum, the results presented above provide ample evidence that the participants
in the present study did indeed follow the proposed listening ability model. Here,
CEFR B2 level participants demonstrated fairly automated listening skills and were
seen to be able to use their world knowledge (including pragmatic, contextual,
semantic, and inferential information) along with meta-cognitive strategy use in
order to construct meaning.
training and support are therefore essential. LAL agendas should also include
general KSAs and principles of assessment (Fulcher, 2012), as the introduc-
tion of better formative assessments in parallel with high-stakes proficiency
tests will lead to improved test washback through the combination of
assessment for learning with assessment of learning.
A test which successfully represents the construct will hopefully encourage
teachers teaching to the test to teach to the construct itself; as a consequence,
listening skills and strategy training should necessarily become part of classroom
practice. Here, curriculums would need to take into account all aspects of the
listening process model and as such there are clear implications for pedagogy:
Listeners need training in lower-level decoding techniques in order to recog-
nize word boundaries and lexical chunks in spoken discourse (Cauldwell, 2018;
Field, 2008a). Once these decoding routines become more automated, the
working memory is freed up to be able to perform higher order meaning
building functions. Furthermore, strategy awareness training, shown to have
positive effects on learners listening ability (Vandergrift & Tafaghodtari 2010;
Zhang, 2012), would also be necessary. Indeed, a number of researchers share a
strong belief that by teaching listening strategies, we are actually teaching lear-
ners how to listen (Siegel, 2015; Vandergrift & Goh, 2012).
Such learning activities would entail learners listening for a communicative
purpose in order to develop core skills (Vandergrift & Goh, 2012). As these
core skills are included as part of the test specifications, their promotion would
facilitate a seamless transition from classroom activities to test tasks. If a test
construct is broad enough that it promotes teaching to a test which both
includes all the necessary competences for successful listening and is also a
reflection of real-life listening tasks, beneficial washback will be achieved
(Vandergrift & Goh, 2012).
In addition to the above-mentioned steps, authentic listening materials would
also need to be introduced into the classroom, thereby responding to the
numerous calls in the literature to expose students to natural language as it
becomes more and more recognized that learners need to develop the ability to
understand real-world connected speech. Such materials would represent a major
change to baccalaureate courses; a study of the text books presently in use shows
that all audio texts currently employed follow the traditional format of unnatural
sounding scripted recordings produced using actors. In contrast, the use of
unscripted audios in assessments would mean that both materials developers and
teachers would be obliged to incorporate authentic materials (Wagner & Toth,
2017 p.78). Here, studies have shown that more gains in listening ability are
made by groups of students exposed to authentic rather than scripted audio (e.g.,
Gilmore, 2011), thereby confirming that such a move could lead to positive
washback. Learner autonomy could possibly also be encouraged as students begin
to listen to other authentic materials outside the classroom.
Teachers themselves could become centrally involved in the test develop-
ment process in order to encourage understanding and foster a feeling of
Planning for positive washback 237
Conclusion
Given that listening is undeniably one of the core academic skills, it is essential
that any new test demonstrate good construct validity if it is to be successfully
used to encourage the development of beneficial washback effects in the class-
room and beyond. This study has evaluated a listening test created with the aim
of producing positive washback in the context of the Spanish education system.
The study is theory-driven – based on a cognitive process view of listening
ability well-founded in research, it provides ‘strong’ construct validity evidence
for the test (Kane, 2001). It is proposed that the introduction of the test would
lead to serious positive changes in language pedagogy. The inclusion of
authentic sound files is considered to be a major improvement on most listen-
ing tests. If we are serious in our desire to help learners understand authentic
speech – which is unpredictable in nature – then ‘… serious tests must start
using construct-valid spoken texts’ (Buck, 2018, p. xv).
Educational reforms increasingly rely on the introduction of new assessment
procedures in order to improve the quality of education (Chalhoub-Deville, 2016).
In the Spanish context, policymakers need to ensure that any educational reforms
are properly implemented if we are to achieve a positive impact on teaching and
learning. The development of a more communicative L2 English-language class-
room, one which responds to the realities of actual language use, requires not
simply the optimization of assessment of learning, but – fundamentally – assessment
for learning. Clearly, the role of teachers themselves is pivotal to this process, and
consequently they must be provided with LAL in order to be able to implement an
array of relevant assessment techniques relevant to their students’ needs. A key
element of LAL is the ability to evaluate and criticize tests and an understanding of
the core constructs underlying assessment practice. To this end, teachers need to be
provided with convincing validity evidence. Indeed, the provision of evidence such
as that given in the present study, would go a long way to simplifying training and
ensuring they were more likely to make positive evaluations of the test, thereby
involving them directly in creating positive washback in the education system.
Note
1 However, a listening component has been introduced in the provinces of Galicia and
Catalonia.
238 Caroline Shackleton
References
Alderson, J. C. (1993). Judgements in language testing. In D. Douglas, & C. Chapelle
(Eds.), A new decade of language testing research (pp. 46–57). TESOL.
Amengual Pizarro, M. (2009). Does the English test in the Spanish university entrance
examination influence the teaching of English? English Studies, 90(5), 582–598.
Anderson, J. R. (2009). Cognitive psychology and its implications. Worth Publishers.
Bachman, L. F. (2007). What is the construct? The dialectic of abilities and contexts in
defining constructs in language assessment. In J. Fox, M. Wesche, D. Bayliss, et al.
(Eds.), Language testing reconsidered (pp. 41–71). University of Ottawa Press.
Banerjee, J. (2004). Reference supplement to the preliminary pilot version of the manual for
relating language examinations to the CEF: Section D: Qualitative analysis methods. Council
of Europe. https://rm.coe.int/1680667a1f
Buck, G. (2001). Assessing listening. Cambridge University Press.
Buck, G. (2018). Preface. In G. J. Ockey, & E. Wagner (Eds.), Assessment of second lan-
guage listening: Moving towards authenticity. John Benjamins.
Carlsen, C. H. (2018). The adequacy of the B2 level as university entrance requirement.
Language Assessment Quarterly, 15(1), 75–89.
Cauldwell, R. T. (2018). A syllabus for listening decoding. Speech in Action.
Chalhoub-Deville, M. (2016). Validity theory: Reform policies, accountability testing,
and consequences. Language Testing, 33(4), 453–472.
Chapelle, C. A. (2012). Validity argument for language assessment: The framework is
simple. Language Testing, 29(1), 19–27.
Cheng, L., Sun, Y., & Ma, J. (2015). Review of washback research literature within
Kane’s argument-based validation framework. Language Teaching, 48, 436–470.
Council of Europe. (2018). Common European framework of reference for languages: Learning,
teaching, assessment. Companion volume with new descriptors. Council of Europe.
Davies, A. (2008). Textbook trends in teaching language testing. Language Testing, 25(3),
327–347.
Deygers, B., Zeidler, B., Vilcu, D., & Hamnes Carlsen, C. (2018). One framework to
unite them all? Use of the CEFR in European university entrance policies. Language
Assessment Quarterly, 15(1), 3–15.
European Commission. (2012). First European survey on language competences: Final report.
European Commission.
Field, J. (2013a). Cognitive validity. In A. Geranpayeh, & L. Taylor (Eds.), Examining
listening: Research and practice in assessing second language Listening (pp. 77–151). Cam-
bridge University Press.
Field, J. (2013b). Good at listening or good at listening tests?[Conference presentation]. ANUPI.
Huatulco.
Field, J. (2008a). Listening in the language classroom. Cambridge University Press.
Field, J. (2017). Mind the gap: Listening tests versus real world listening [Conference presenta-
tion]. IATEFL TEASIG Conference, CRELLA, University of Bedfordshire. https://
tea.iatefl.org/wp-content/uploads/2015/10/John-Field_Mind-the-gap-TEA-SI
G-Oct-17-delivered.pdf
Field, J. (2008b). Revising segmentation hypotheses in first and second language listen-
ing. System, 36, 35–51.
Fulcher, G. (2012). Assessment literacy for the language classroom. Language Assessment
Quarterly, 9(2), 113–132.
Planning for positive washback 239
García Laborda, J., & Fernández Álvarez, M. (2011). Teachers’ opinions towards the
integration of oral tasks in the Spanish university examination. International Journal of
Language Studies, 5(3), 1–12.
Gilmore, A. (2011). ‘I prefer not text’: Developing Japanese learners’ communicative
competence with authentic materials. Language Learning, 61, 786–819.
Green, A. (1998). Verbal Protocol Analysis in language testing research: A handbook. Cam-
bridge University Press.
Green, R. (2017). Designing listening tests: A practical approach. Palgrave Macmillan.
Gu, Y. (2014). To code or not to code: Dilemmas in analysing think-aloud protocols in
learning strategies research. System, 43, 74–81.
Hidri, S. (2018). Assessing spoken language ability: A Many-Facet Rasch analysis. In S.
Hidri (Ed.), Revisiting the assessment of second language abilities: From theory to practice (pp.
23–48). Springer.
Hidri, S. (2014). Developing and evaluating a dynamic assessment of listening compre-
hension in an EFL context. Language Testing in Asia, 4(4), 1–19.
Hidri, S. (2019). State-of-the-art of assessment in Tunisia: The case of testing listening
comprehension. In S. Hidri (Ed.), English language teaching research in the Middle East
and North Africa: Multiple perspectives (pp. 29–60). Palgrave MacMillan.
Inbar-Lourie, O. (2008). Constructing an assessment knowledge base: A focus on lan-
guage assessment courses. Language Testing, 25(3), 385–402.
Jenkins, J., & Leung, C. (2014). English as a lingua franca. In A. J. Kunnan (Ed.), The
companion to language assessment (pp. 1605–1616). Wiley-Blackwell.
Kane, M. (2001). Current concerns in validity theory. Journal of Educational Measurement,
38(4), 319–342.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for
categorical data. Biometrics, 33, 159–174.
Lim, G. S. (2014). Assessing English in Europe. In A. J. Kunnan (Ed.), The companion to
language assessment (pp. 1700–1708). Wiley-Blackwell.
Macaro, E., Graham, S., & Vanderplank, R. (2007). A review of listening strategies: Focus
on sources of knowledge and on success. In A. D. Cohen, & E. Macaro (Eds.), Language
learner strategies: 30 years of research and practice (pp. 165–185). Oxford University Press.
Messick, S. (1996). Validity and washback in language testing. Language Testing, 13,
241–256.
Munby, J. (1978). Communicative syllabus design. Cambridge University Press.
Phakiti, A. (2003). A closer look at the relationship of cognitive and metacognitive strat-
egy use to EFL reading achievement test performance. Language Testing, 20(1), 26–56.
Siegel, J. (2015). Exploring listening strategy instruction through action research. Palgrave
Macmillan.
Shackleton, C. (2018a). Developing CEFR-related language proficiency tests: A focus
on the role of piloting. Language Learning in Higher Education, 8(2), 333–352.
Shackleton, C. (2018b). An initial validity argument for a new B2 CEFR-related baccalaureate
listening test (Publication No. 9788491639213) [Doctoral dissertation, University of
Granada]. DIGIBUG. http://hdl.handle.net/10481/52426
Shohamy, E., & Inbar, O. (1991). Validation of listening comprehension tests: The
effect of text and question type. Language Testing, 8(1), 23–40.
Taylor, L., & Geranpayeh, A. (2011). Assessing listening for academic purposes: Defin-
ing and operationalising the test construct. Journal of English for Academic Purposes, 10,
89–101.
240 Caroline Shackleton
Tsagari, D., & Vogt, K. (2017). Assessment literacy of foreign language teachers around
Europe: research, challenges and future prospects. Papers in Language Testing and
Assessment, 6(1), 41–63.
Vandergrift, L. (2003). Orchestrating strategy use: Toward a model of the skilled second
language listener. Language Learning, 53(3), 463–496.
Vandergrift, L., & Goh, C. (2012). Teaching and learning second language listening: Meta-
cognition in action. Routledge.
Vandergrift, L., & Tafaghodtari, M. H. (2010). Teaching students how to listen does
make a difference: An empirical study. Language Learning, 60, 470–497.
Wagner, E. (2014). Assessing listening. In A. J. Kunnan (Ed.), The companion to language
assessment (pp. 47–63). Wiley-Blackwell.
Wagner, E., & Toth, P. (2017). The role of pronunciation in the assessment of second
language listening ability. In T. Isaacs, & P. Trofimovich (Eds.), Second language pro-
nunciation assessment: Interdisciplinary perspectives (pp. 72–92). Multilingual Matters.
Wall, D. (2013). Washback in language assessment. In C. A. Chapelle (Ed.), The ency-
clopedia of applied linguistics. Blackwell Publishing.
Wang, J. (2010). A study of the role of the ‘teacher factor’ in washback [Unpublished
doctoral dissertation]. McGill University.
Weir, C. J. (2005). Language testing and validation: An evidence-based approach. Palgrave
Macmillan.
Yang, P. (2000). Effects of test-wiseness upon performance on the test of English as a
foreign language [Unpublished doctoral dissertation]. University of Alberta.
Yi’an, W. (1998). What do tests of listening comprehension test? A retrospection study
of EFL test-takers performing a multiple-choice task. Language Testing, 15(1), 21–44.
Zhang, Y. (2012). The impact of listening strategy on listening comprehension. Theory
and Practice in Language Studies, 2(3), 625–629.
Chapter 13
05:55 ‘…RASA, which is the Sanskrit word for ‘juice’ or ‘essence’ … stands
for ‘Receive’, which means pay attention to the person; ‘Appreciate’,
making little noises like ‘hmm’, ‘oh’, ‘OK’; ‘Summarize’ – the word ‘so’ is
very important in communication; and ‘Ask’, ask questions afterwards’.
(Treasure, 2011)
A central theme in Paulo Freire’s work is his insistence on the need for
readers to adopt a critical attitude when reading a text. That is, readers
should critically evaluate the text and not passively accept what is said just
because the author said it. Readers must always be prepared to question and
to doubt what they have read.
242 Tim Murphey
Dufva (2013) also advocates for more negotiation in place of singular answers
in learning through ‘translanguaging’:
… the assumed stability and singularity of norms and the entailing policy
of ‘one correct answer’ is maintained in classrooms, exams, and language
tests. The alternative views advocate subjecting the norms and language
use to negotiation, and not only for tolerating but also promoting ‘trans-
languaging’ in the classrooms.
(Dufva, 2013, p. 58; see also Creese & Blackledge, 2010)
Freeman (1998) says much the same with his dedication of his book to his wife:
‘To Ann Freeman, who has shown me that teaching is about asking questions,
and that in asking questions, you will learn’ (p. vi).
The earliest known advocate of asking was probably Socrates with what is
now called the Socratic method. In this method, interlocutors engage in
cooperative, argumentative dialogue through asking and answering questions
in order to stimulate critical thinking and challenge deeper thinking and
clarifications. This method can be used to great advantage when creating
well-argued theses and arguments, but as in a courtroom, it can sometimes
lead to violent and disparaging language that is counter to collaborative
creative thinking.
Canfield and Hansen’s (1995) The Aladdin factor explores the mostly positive
dimensions of asking and argues for the good effects of daring to ask for what
you want in life, based around the old story of Aladdin and his magic lamp,
containing the repeated phrase, ‘Ask and it shall be given’. The book argues
that we do not ask enough for many reasons and that we should be asking
others and our universe for better things, and that asking is the way to create a
better world. The authors mention five barriers to asking in their first chapter:
ignorance (don’t know how to); limiting inaccurate beliefs (no one will tell
me); fear (of looking immature, stupid, or helpless); low self-esteem (no one
will help someone like me); and pride (I do not want to be seen as a needy
person). Many of these are similar to what our students in classes may be using
to avoid asking.
In chapter two of The Aladdin factor, Canfield and Hansen (1995) look at the
benefits of asking, the first being ‘You will take control of your life!’, which is
true enough, as your asking of questions liberates you to choose more options
and a variety of paths. The second is that you will have better business and
Testing abilities to understand 243
‘Asking is, at its core, a collaboration’ (Palmer, 2015, p.47). But more poetically
she writes:
Later on, she cites Hyde (1983) who ‘explains the term “Indian Giver”, which
most people consider an insult: someone who offers a gift and then wants to
take it back…’ (p.57). Hyde (1983) tackles the subject calling it ‘the commerce
of the creative spirit:’
But the origin of the term – coined by the Puritans – speaks volumes. A
Native American tribal chief would welcome an Englishman into his
lodge and, as a friendly gesture, share a pipe of tobacco with his guest,
then offer the pipe itself as a gift. The pipe, a valuable little object, is – to
the chief – a symbolic peace offering that is continually regifted from
tribe to tribe, never really ‘belonging’ to anybody. The Englishman
doesn’t understand this, is simply delighted with his new property, and is
therefore completely confused when the next tribal leader comes to his
house a few months later, and, after they share a smoke, looks expec-
tantly at his host to gift him the pipe. The Englishman can’t understand
why anyone would be so rude to expect to be given this thing that
belongs to him.
Hyde concludes:
how many minutes they talked and what percentage was in the target language.
Calling up someone who you may have met for the first time that day can be a
scary thing for many people, but they do it, and they get used to it, and many end
up calling each other for test reviews and other tasks as well. (Students put their
phone numbers beside their names on an attendance list I pass out in class and I
give them all copies so they can call each other easily, and also so they can call each
other when they are absent in order to catch up with what they missed).
Story asking
I ask my students to tell many stories about themselves and to write some of
them down in their action logs; sometimes their fellow students will read them
and ask questions about them to deepen the conversation. For example, they
write a Language Learning History that details when they first began learning
foreign languages up until now. They also write about glory, embarrassment,
regret, and mistake stories, which students discuss to show that no one is per-
fect, we all make mistakes, and we can learn to laugh at them sometimes.
Lecture pre-asking
Most lecturers enthusiastically dive into their material, but most information is
wasted like water washing over rocks (brains). Priming the students at the
Testing abilities to understand 247
beginning of a class with a number of questions about the content that will be
covered in the lecture creates curiosity and gets students to form a possible
neural network for an answer. Research shows that even if they come up with
a wrong answer, they are still capable of easily replacing wrong answers with
new answers due to the curiosity network already formed (Roediger & Finn,
2009). Such questioning also shares class time democratically with students so
they feel more empowered. As Donald Graves (2002) wrote in Testing is not
teaching: What should count in education:
Perhaps the problem [of learning well] is best understood in the context of
power within relationships. Understanding is best reached when power is
shared. In most cases teachers are in the power position when working
with their students. They have the power of assignments, corrections, and
grades. The best teachers know how to share this power; indeed, they give
it away. They are constantly uncovering where the student’s heart is situ-
ated in the writing. Through the skills of teaching they know how to add
power to the student’s intentions.
(p. 11)
Formative assessment
All the activities above are about learning through assessing, and learning what
we know and don’t know, thus promoting language assessment literacy (LAL)
not just among teachers, but also among students. Wormeli (2018, p. 284), a
great advocate of formative assessment, defines it as: ‘Frequent and ongoing ways
to check students’ progress toward mastery; the most useful assessment teachers
can provide for students and for their own teaching decisions’. Thus, I was led
to make regular tests, like formative assessment, through socializing the proce-
dures, starting about six years ago with social testing (Murphey, 2013a, 2013b).
The bottom of each test looks something like Figure 13.1 (also see Appendix A):
As you can see in Figure 13.1, I ask students to give themselves their own
grades at two separate times on the tests, first after a certain period of doing it
alone (1st score) and then after allowing them to ask others for help and give
help to others who ask for it orally (no copying; all oral). John Hattie (2012)
showed through his meta-analyses of 150 classroom activities that self-reported
grades (#1), formative evaluation (#4), feedback (#10), and reciprocal teaching
(#11) are all highly effective for learning, with the last three being highly social,
and all are included in the social testing protocol aligned with LAL. I wish to
propose a form of testing that allows students to interact more and learn more
at the same time. Although this way of testing will not solve all our problems,
it is a way to help students become more social, and to teach the worth of
social interaction and its benefits.
In Murphey (2019a), I cite mostly Vygotskian researchers who claim that:
In previous articles, I have offered data to show how students highly rate such
tests, and have suggested ways that a graduate student might study them for their
Testing abilities to understand 249
long-term impact. For the remainder of this chapter, I would like to look closely
at some qualitative data on social testing and how it seems to liberate students’
learning. I will first look at some undergraduates doing three tests in one semester
and then look at some graduate students who did a social test as a final exam.
QUIZ 1 Feedback at the bottom of the quiz (unedited except for emphasis)
1 This is my first time to take a test which encourages me to interact with others, so
this is interesting for me. This makes me think I want to have a conversation with
others more.
2 I love this test because I felt we were doing test together. I knew the importance
of cooperating.
3 This test improve our ability to ask things of others. We have to be brave, so I like
this way.
4 I thought this test is more meaningful than the way as usual test because just
remember something is not interesting but in this test not only remember, but we
use English. This is the big point of this style, I think.
1 Today we had a quiz, a new test that I never did before. We can learn from each
other and ask other people for answers. It is a new chance to communicate.
2 I really felt glad to see your words ‘Not knowing is OK’. And ‘Not asking is failing’. And
‘Helping and asking many people is your goal’. I really feel great to do today’s test.
3 I told my family that my teacher gave us a time to teach each other for us to improve
our skills, because it was the first time in my life that teacher gave us such a time. I
taught many things to my classmates during that time. I enjoyed helping them.
250 Tim Murphey
QUIZ 2 Feedback at the bottom of the quiz (unedited except for emphasis)
1 I think this test help students to improve communication skill. We’ll have to
communicate with other people in the future job, so it’s practical test.
2 This test is really useful to think about the answer with peers. Giving hints to
find a clue is the best way to know the answers … I thought my communication
skill is getting up.
3 I like this style of testing. I can talk to different people a lot & it feels good to
help someone. But sometimes they went away just after they had their answers,
not teaching me anything and it hurts. It also hurts when I tried to remind them
the story (hint) and someone just said, ‘just tell me the main point!’
4 Today, I am very brave and do not hesitate to ask. So almost all of blanks are filled.
1 Last time I felt embarrassment to ask questions, but this time was not. I enjoyed
asking and talking. Everyone’s so kind. I’m glad to have and be in this class.
2 I really like this type of test. I always feel nervous or hate to take test but I feel
relax and enjoy this class’s test. Because you said, ‘Not knowing is OK. Not asking is
failure’. This phrase I like very much. And I really like part 2, because I can help
classmates and be helped by classmates. This is very good communication I think
because help each other is really good to learning. Today is the second time to
take this type of test, so I feel more relax to take test and I could ask and help
Testing abilities to understand 251
many [more] class mates than first time. I feel really happy to talk a lot of
people and help them.
3 In today’s class I took a test. I like this test because I could talk a lot and com-
municate with many people. This test needs knowledge which I learned in class
and we require to communicate positively in this test. In addition, we could
discuss questions before the test. It is different from other tests. We can have
opportunities to speak in this test style. And also, I learned scaffolding, it is dif-
ficult to give hints and help others understanding.
4 I had fun to do test and ask. I could ask more people than before. I think I get used
to ask people because of this class! I’m looking forward to next test! Please don’t
make it harder.
QUIZ 3 Feedback at the bottom of the quiz (unedited except for emphasis)
1 I could answer almost all questions compared to last test. And actually this is
my last test in my university school life! I’m glad to take this class and this test!
Thank you for all!
2 I can help many person. I’m so happy! I have confidence because I have good
classmates. This test is so fun!
252 Tim Murphey
3 This is the third time to take this [type of] test. I asked my classmates fluently and
they feel glad to help each other. I enjoy this test.
4 This time I could ask many people and they answered kindly. Though I’m
powerless alone, I was happy that there were many people who helped me.
QUIZ 3 Feedback from action logs about the quiz (unedited except for emphasis)
1 I could ask students more than before. It’s proud that I can learn asking is not
hesitating thing.
2 I could learn what I never thought or I’ve never known. It was really useful. The
remarkable thing is mistaking is not bad; trying not do is bad! I was so impressed
that. So I’m always trying what I face first time. I don’t judge with prejudice
anymore. I’ll never forget this class.
3 (1st year) I am really excited and satisfied with this class, as this class gives me a
lot of chance to speak English than any other class. Additionally, I could make new
friends … I really enjoyed and I’ll miss this class.
4 I am glad to choose this class because I enjoyed learning English and I met different
grade student and we had a chance to talk. I felt happy when we talked.
these students, learning is more important than a good score, even while taking
a test. Still, teachers need to find good ways to explain such testing procedures
to students to enlist their altruism and understanding of how to learn more on a
deeper social level.
Conclusion
To conclude, I would like to return to our long acronym in the title TA-
TUNII-SIA-RASA (Testing Abilities – To Understand Not Ignorance or
Intelligence – Socially Interactive (formative) Assessment – Receive,
Appreciate, Shadow, and Ask). I believe that as educators, we should be
testing and teaching ways of understanding, not simply information
(ignorance or intelligence). I am convinced that socially interactive ways of
assessing have great promise for helping students grasp more intellectual
territory than simple solo exams. We need to be able to learn, even during
assessments, and see that this is indeed part of language assessment literacy.
Social testing is not only teaching asking but also altruism, as one of my
students said a few years back:
Because I had taken a test (#1) in this class and I knew how we would do
the test #2, I tried to remember as much as possible not only for myself,
but for my classmates. Last time I took the test, I was helped by others
with answers, very helpfully. So, I wanted to help my classmates more
than I did last time. In Test #2 it was interesting. I felt as if I was already
working with classmates during my preparations for the test, and that
motivated me to study. Although it was not so many people that I could
help with the quiz, I was glad to hear ‘thank you’ from them and to see
their smiles. Showing thanks to people really makes them happy.
Another student referred to asking as part of ‘vital skills to live in real life’ and
though most people will not be taking pen and paper tests at their work, they
will need to be able to ask people for help:
I really like this type of test. I’ve never done such a creative and interactive
test, and I really think that I was required to get information and help
people, and these are vital skills to live in real life!
Please read Murphey (2017a) for a more detailed understanding of social testing or
Murphey (2017b) for a short, four-page synopsis from a Stanford University blog.
Finally, I wish to dare to talk more grandly beyond our classrooms and have a
look at the questions concerning our survival and social well-being in our various
societies. I believe we need to ask our schools, educational systems, businesses,
communities, governments, our universe, and our gods more grandly for better
understanding and well-being for all, and for a more just and ecological world in
Testing abilities to understand 255
which we all can live peacefully. I want my students to be able to ask for these
things, for in asking we may indeed find the ways through LAL.
BBB
Tim’s DAD
Chez Joan
Matsuyama Woman
Paradigm Shift
Denmark TV 2 Advertisement
Ms Liz’s Class Talking Twins
14 How can you do environmental engineering to learn more English? (3
examples please)
15 Why are telling embarrassment/mistake stories good for us?(3 things)
16 What are the three SSSs in Chapter 2 for and how do they help you learn?
17 What would you do if you were language hungry?
18 What is self-regulation?
19 Cry to the world ‘I’m in love!’ when you read this line! Done/Not Done
20 Approximately, how many people’s names do you know in this class?
Going My Way
Ride and Read
Turtle with a Straw
Student Voice #1 LLHs
Student Voice #2 Job H Going Abroad
Roller Coaster
13 How does an effective helper help you? E t, A, S y u, Ref rather than C, & C
14 How are A student strategies different from C/D student strategies?
15 What are Tim’s most frequent three words in this class?
16 What are the advantages of doing IPQs?
17 What would be your rejoinder if I said, ‘I won the billion-dollar lottery!’?
18 Find a person you have never talked to, ask a question, & write their
whole name:
Short answers:
References
Canfield, J., & Hansen M. (1995). The Aladdin factor: How to ask for what you want–and get
it. Berkley Books.
Creese, A., & Blackledge, A. (2010). Translanguaging in the bilingual classroom: A
pedagogy for learning and teaching? The modern language journal, 94(1), 103–115.
Doise, W., & Mugny, G. (1984). The social development of the intellect. Pergamon Press.
Dufva, H. (2013). Language learning as dialogue and participation. In E. Christiansen, L.
Kuure, A. Mørch, & B. Lindström (Eds.), Problem-based learning for the 21st century:
New practices and learning environments (pp. 51–72). Aalborg Universitetsforlag.
Freeman, D. (1998). Doing teacher research: From inquiry to understanding. Heinle & Heinle.
Freire, P. (1985). The politics of education: culture, power, and liberation. (D. Macedo, Trans.)
Bergin & Garvey Publishers. (Original work published 1985).
Graves, D. (2002). Testing is not teaching: What should count in education. Heinemann.
Guba, E., & Lincoln, Y. (1989). Fourth generation evaluation. Sage Publications.
Hattie, J. (2012). Visible learning for teachers: Maximizing impact on learning. Routledge.
Hyde, L. (1983). The gift: Imagination and the erotic life of property. New York.
Lewis, B. (2019, July 18). Teachers should design student assessments. But first they
need to learn how. Education week. https://www.edweek.org/ew/articles/2019/07/
19/teachers-should-design-student-assessments-but-first.html
Testing abilities to understand 259
Murphey, T. (2017b, June 30). A 4-page condensed version of Tim Murphey’s book
chapter ‘Provoking potentials: Student self-evaluated and socially-mediated testing’.
Tomorrow’s ProfessorSM eNewsletter. Stanford University. https://tomprof.stanford.
edu/mail/1581#
Murphey, T. (2017c). Asking students to teach: Gardening in the jungle. In T. Gregersen,
& P. MacIntyre (Eds.), Exploring innovations in language teacher education (pp. 251–268).
Springer.
Murphey, T. (2003). Assessing the individual: Theatre of the absurd. Shiken: JALT
Testing & Evaluation SIG Newsletter, 7(1) 2–5.
Murphey, T. (2018a). Bilingual songlet singing. Journal of Research and Pedagogy of Otemae
University Institute of International Education and Hiroshima JALT, 4, 41–49.
Murphey, T. (2012). In pursuit of wow!Abax.
Murphey, T. (2019b). Innovating with ‘The Collaborative Social’ in Japan. In H. Reinders,
S. Ryan, & S. Nakamura (Eds.), Innovation in language learning and teaching; The case of
Japan (pp. 233–255). Palgrave Macmillan.
Murphey, T. (1992). Music and song. Oxford University Press.
Murphey, T. (2019a). Peaceful social testing in times of increasing individualization &
isolation. Critical Inquiry in Language Studies, 16(1), 1–18.
Murphey, T. (2017a). Provoking potentials: Student self-evaluated and socially mediated
testing. In R. Al-Mahrooqi, C. Coombe, F. Al-Maamari, & V. Thakur (Eds.), Revi-
siting EFL assessment: Critical perspectives (pp. 287–317). Springer.
Murphey, T. (1990). Song and music in language learning: An analysis of pop song lyrics and
the use of song and music in teaching English as a foreign language. Peter Lang.
Murphey, T. (2018b). Songlets for affective and cognitive self-regulation. Bulletin of the
JALT: Mind, Brain, and Education SIG, 4(12), 22–25.
Murphey, T. (2013a). Turning testing into healthy helping and the creation of social
capital. PeerSpectives, 10, 27–31.
Murphey, T. (1993).Why don’t teachers learn what students learn? Taking the guess-
work out with action logging. English Teaching Forum, 31(1), 6–10.
Murphey, T. (2013b). With or without you and radical social testing. Po`okela (Hawai’i
Pacific University Newsletter, 20(69), 6–7.
Palmer, A. (2013, February). The art of asking [Video file]. https://www.ted.com/ta
lks/amanda_palmer_the_art_of_asking?utm_campaign=tedspread&utm_medium=
referral&utm_source=tedcomshare
Palmer, A. (2015). The art of asking. Grand Central Publishing.
Roediger, H., & Finn, B. (2009,October 20). Getting it wrong: Surprising tips on how to
learn. Mind Matters. https://www.scientificamerican.com/article/getting-it-wrong/
Treasure, J. (2011, July). 5 ways to listen better. [Video file]. https://www.ted.com/ta
lks/julian_treasure_5_ways_to_listen_better?utm_campaign=tedspread&utm_m
edium=referral&utm_source=tedcomshare
Wormeli, R. (2018). Fair isn’t always equal: Assessment and grading in the differentiated
classroom (2nd edition). Stenhouse.
Conclusion
Language assessment literacy: The way
forward
Sahbi Hidri
Tarone, E. (2005), 219 Vogt, & Tsagari, 6, 20, 21, 22, 23, 24, 27,
Taylor & Geranpayeh (2011), 54, 59, 60, 138, 139, 140, 151, 153,
225, 239 180, 188, 220
Taylor & Geranpayeh, 225 Volante, L., and Fazio, X. 154,
Taylor, 6, 7, 13, 14, 16, 17, 26, 53, 57, Vygotsky, 109
58, 59, 60, 180, 225, 239 Vygotsky, 203
Teasdale, A., & Leung, C. (2000), 44 Vygotsky, L. S. (1986), 219
Teasdale, A., & Leung, C. (2000). 51
Thomas, J., Allman, C., & Beech, Wach, A. 135, 139,
M. 160, Wach, A. 152, 153,
Thorndike, E.L. (1904), 37, 38 Wagner (2014), 221, 222, 224, 240
Thorndike, E.L. (1904). 51 Wagner & Toth (2017), 224, 236, 240
Tomlinson, B. (2010). 51 Wagner & Toth, 224, 236
Torrance, Pryor, 4, 71, Wagner, 221, 222, 224
Treasure (2011), 241, 259 Wall (2013), 220, 221
Trede, F., & Higgs, J. (2010), 42 Wall (2013), 4
Trede, F., & Higgs, J. (2010). 51 Wall, & Alderson, 4
Trinity College London (2017). 65 Wall, 220, 221,
Trudgill, P., & Hannah, J. (2008), 40 Wall, 220, 221, 240
Trudgill, P., & Hannah, J. (2008). 51 Wall, 4, 220, 221,
Tsagari & Vogt (2017), 220, 240 Wang (2010), 219, 235, 240
Tsagari & Vogt, 2017 151, 153 Wang, J., 235
Tsagari & Vogt, 220 Wang, T., 201
Tsagari & Vogt. 138, 139, Warschauer, M. (2002). 83
140, 151, Weigle cited in Ahmad,2019 161,
Tsagari, D. (2012). 152, Weir et al. (2013), 33, 35, 37
Tsagari, D. 138, 139, 150, Weir, (2005), 226, 240
Tsagari, D. 65 Weir, 226
Tsagari, et al., 180, ., 240 White, E. 138
Tsui, A. B., & Ng, M. (2000). 83 Wilson 86, 89, 90
Tupas, F.R.T. (2010), 46 Wilson, 110, 111, 128
Tupas, F.R.T. (2010). 51 Wilson, 127,
Turk, M. 139, 140, 151, Wingate & Tribble, 110, 126
Woodrow, 201, 202, 210
Ukrayinska, O. 137, 139, Woodrow, L. (2006), 219
Ukrayinska, O. 152, Wormeli (2018), 247, 259
Ur, 2012). 199
Ur, 205, 210 Xu, Y., & Brown, G. T. L. (2017), 8,
Ur, P. (2012), 219 21, 24
Yan, X., Zhang, C., & Fan, J. J. (2018). Young, D. J. (1994), 219, 240
152, 153, Yousif, 201
Yan, X., Zhang, C., & Fan, J. J. Yusof, R. (2016), 219
(2018). 65
Yang (2000), 235, 240 Zeichner & Liston, 109, 110,
Yastibas, A. E., & Takkaç, M. (2018), Zhang (2012), 236, 240
22, 139, Zhang, 236
Yastibas, A. E., & Takkac, M. (2018). Zheng, Y. (2014). 51
Yi’an (1998), 235, 240 Zhou, 201
Young, 201, Zhou, M. (2016), 219
Index
Academic 50, 51, 53, 64, 71, 81, 83, 97, Basic, 15, 21, 39, 151, 153, 160, 172,
159, 160, 161, 162, 163, 164, 165, 180, 225, 241,
166, 167, 171, 176, 178, 182,
academic literacy 53, 71, 159, 160, call report, 245
163,164, 165, 169,170, 171,172, candidates, 33, 37, 39, 46, 87, 224, 228,
academic writing, 44 230, 233,
Academic, 9, 10, 18, 34, 36, 38, 44, 46, CEFR, 39, 217, 220, 221, 223, 225, 226,
204, 211, 214, 216, 218, 221, 225, 228, 233, 235, 237, 238, 239,
226, 237, 239 clarity 84, 94
Accent (native and non-native speaker classroom teachers 72, 203, 204, 205, 208,
varieties), 210, 224–26, 209, 210, 211, 220, 221, 235, 236
Accountability 6, 7, 54, 56, 177, 178, cognitive 95, 199, 210, 222, 223, 226,
238, 255 227, 228, 230, 232, 235, 237, 238,
Accuracy 45, 88, 98, 99, 177, 239, 249, 259
179, 182, coherence 93, 162,
action logging, 245, 259 cohesion 93,
Aladdin Factor, 242, 258 community, 7, 36, 46, 53, 62, 70, 71, 74,
Alternative assessment 4, 9, 18, 33, 34, 39, 80, 82, 171, 200, 172, 218, 219, 225,
41, 44, 45, 91, 151, 154, 182, 183, 241, 250, 247,
184, 185, 188, 189, 193 Competence-based curriculum, 220, 221
Anxiety 180, 200, 201–10, Components, 7, 15, 25, 26, 53, 55, 56,
anxiety, 217, 218, 219 58, 87
apprehension, 201, 202, 203, 204, 206, concept 7–10, 13–28, 37, 43, 47, 52–64,
210, 218 73, 81, 92, 99, 151, 154, 160, 179,
assessing speaking, 199–200, 204, 192, 253
assessment 3–28, 33–35, 44, 46, 47, 151, conceptions, 4, 9, 10, 14, 15, 17, 20, 21,
152, 153, 154, 159, 160, 161, 162, 24, 28, 163,
163, 164, 165, 166, 167, 168, 169, connected speech, 222, 224, 236
170, 171, 172, 176, 177, 178, 179, construct 51, 56, 64, 74, 80, 87, 96, 154,
180, 181, 182, 183, 184, 185, 187, 159, 160, 161, 162, 163, 179, 181,
188, 189, 190, 191, 192, 193, 199, 186, 187, 188, 193,
200, 51–100, 217, 218, 220, 221, 224, construct, 4, 10, 14, 15, 17, 18, 37201,
225, 236, 237, 238, 239, 240, 241, 217, 221, 222, 223, 224, 225, 226,
247, 254, 258, 259, 260, 261 227, 228, 235, 236, 237, 238, 239
assessors 5–8, 10, 17, 58, 152, 153, 164, Content validity, 37
169, 172, 177 Context, 4, 6–27, 34–35, 38–41, 52, 56,
assurance 85, 87, 96, 98, 99, 176, 178, 57, 59, 61, 65, 69, 70–73, 75–78, 81,
182, 183, 185, 186, 192, 86, 89, 91, 92, 95–98, 152, 153, 160,
Index 271
161, 163, 165, 171, 178, 179, 180, exam-oriented 151, 153,
181, 184, 185, 193, 200, 203, 204, external, 8, 28, 176, 179, 186,
221, 222, 223, 224, 225, 227,
228, 229, fair, 4, 10, 42, 218, 166, 170, 178, 179,
context, 221, 222, 223, 224, 225, 227, 180, 182, 183, 184, 185, 188, 235, 259
228, 229, 235, 237, 238, 239, 247, feedback 69, 70–83, 152, 154, 171, 177,
260, 261 178, 182, 183, 184, 185, 188,
Correlation 167, 168, 170, 189, 192,
Correlation, 201, 202, 207, 209 feedback, 211, 220, 248, 249, 250, 251,
Correlation, 9, 19 252, 253,
course-based, 20, 39, 46, 151, 154, 152, Foreign Language Anxiety, 217, 218
161, 164, 165, 166, 167, 168, 169, foreign language teaching, 199, 200, 201,
170, 180, 181, 187, 188, 204 203–211, 218, 200, 221, 222, 224,
criterion-referenced, 3, 11, 179, 235, 236
critical attitude, 241 formative assessment, 5, 7, 71, 76, 82, 83,
Critical Language Testing, 34, 41 151, 154, 177, 178, 179, 220, 236,
culture, 14, 15, 28, 40, 41, 44, 69, 81, 247, 254,
202, 229, 258 frameworks, 7, 14, 15, 24, 25, 52, 54, 55,
curriculum, 5, 6, 39, 42, 43, 45, 47, 55, 62, 160, 200, 227,
56, 70, 88, 176, 182, 186, 187, 199, functional literacy, 15, 16, 57,
217, 221, 225, 235, 236, 237, functions 54, 84, 97, 99, 199, 200,
functions, 7, 38, 236
descriptive statistics 167, 168, 206
dimensions of assessment, 16, 199, genre, 45, 224
discourse analysis, 24 grammar, 22, 40, 41, 80, 82, 93, 96, 151,
dynamic assessment, 239 154, 166, 167, 170, 204
EAP, 9, 33, 34, 37, 41, 44, 47 high-stakes tests 186, 178, 185, 186,
education, 3, 14, 19, 22, 27, 34, 35, 42, 220, 236
44, 52–55, 57, 59–65, 70, 81, 82, 88, historical, 14–16, 35, 47, 54, 55, 57, 160,
199, 153, 154, 161, 176, 177, 178, holistic assessment 162,
180, 182, 183, 190, 191, 199, 210, hypothesis 181, 203, 204, 222
217, 218, 220, 220, 221, 225, 237,
239, 247, 258, 259 IELTS, 9, 18, 35, 36, 38, 39, 44, 161,
Educational reforms, 221, 235, 237, Illiteracy, 15, 16, 57
Effectiveness 89, 92, 184, 190, 191, impact of testing, 14, 55, 160, 221
EFL, 9, 10, 19–24, 64, 69, 83–91, 93, implications 10, 63–65, 81, 97, 201, 236,
95–97, 151, 152, 153, 154, 161, 162, 153, 159, 160, 161, 165, 169,
163, 165, 166, 171, 200, 201, 204, 171, 193,
206, 207, 209, 217, 218, 219, 239, implications
240, 259, in-service courses, 6, 236
ELT practitioners, 260 inaccurate paraphrase 163,
English 51, 61, 63–65, 69–71, 81–84, 88, institutional, 151, 59, 161, 171, 172, 176,
96, 161, 165, 176, 199, 200 instruction, 4, 5, 9, 10, 22, 43, 54, 72, 74,
English as a Foreign Language, 19, 41, 69, 79, 82, 88, 89, 99, 199, 204, 205, 206,
81, 160, 161, 171, 200, 240, 259, 208, 225,
English language testing, 9, 18, 35, 161, interfaces, 8, 10, 199,
221, 222, 235 internal, 81, 76,
English, 9, 18–22, 26, 27, 33–46, 199, interpretative, 14, 171, 200
217, 218, 219, 200, 203, 204, 205, interpretative, 200
206, 207, 209, 221, 224, 225, 237,
238, 239, 240, 249, 252, 256, 259, 260 journal, 9, 44, 56, 64, 81, 82, 83
272 Index
Knowledge, 3–22, 26, 28, 34–44, 51–54, Limitations 90, 163, 171, 192,200, 227
56–65, 69, 74, 76, 79, 95, 98, 151, listening 61, 84, 87, 93, 203, 209, 211,
152, 153, 160, 161, 162, 163, 171, 220–237
190, 193, 201, 220, 221, 222, 227, literacy 51–55, 57–100, 154, 159, 160,
229, 233, 235 162, 163, 164, 165, 166, 170, 171,
172, 177, 180, 181, 182, 184, 186,
L2 82, 83, 200–211, 222, 226 189, 193,
LAL training 151, 152, 153, 154, 160, literacy, 3–27, 238, 240, 247, 254, 260
171, 236 local practices 58, 59, 184
Language 151, 152, 153, 154, 159, 161, long-term 84, 172, 191, 211, 222,
164, 167, 176, 180, 188, 199,
language assessment 51–100, 151, 154, measurement 55, 56, 161, 162, 164, 167,
159, 160, 161, 162, 163, 171, 172, 171, 177, 178, 180, 188,
language assessment literacy, 51–100, 159, measurement scale 160, 170, 171,
160, 220, 220–235, 237 Metacognitive strategies, 223, 227,
language education 52, 59, 62, 63, 161, Method 64, 69, 86, 92, 94, 160, 161,
199, 210, 220, 171, 178, 181, 183, 192, 205, 227,
language learning 52, 62, 81–83, 86, 88, mid-term 182, 183, 186,
100, 201, 152, 154, 202, 205 multidimensional literacy 57
language pedagogy, 58 220, 236
language proficiency 171, 200 narrative genre, 200
language skills 61, 151, 152, 154, 159, negotiation 74
160, 163, 222 nominal literacy 57
language teaching 54, 61, 64, 82–84, 88, non-parametric correlation analysis 167,
89, 159, 177, 199–211 norm-referenced 179,
language testing 51, 61–65, 85, 151, novice EFL teachers 151, 152, 153, 154,
152, 153, 159, 161, 162, 177, 179,
180, 221, opinions 151, 152,
language trait 161, 223 oral proficiency 200, 199, 200, 204, 211,
language, 70, 88, 201, 202, 203, 204, 206, outcomes 52, 53, 72–76, 82, 88, 97, 161,
210, 222, 224, 225, 236 171, 188, 220
large-scale testing 54, 61, 91, 179, overrepresentation 187,
180, 182,
learning 51–52, 56, 60–62, 64, 69, 70–77, paradigm 51,180,
79, 81–83, 86, 88, 89, 91–94, 97, 98, paraphrasing 159, 160, 163, 164, 165,
100, 151, 152, 154, 160, 161, 171, 166, 167, 168, 169, 170, 171, 172,
176, 177, 178, 180, 184, 188, Participants 95, 100, 152, 159, 162, 163,
192, 200, 165, 170, 171, 184, 227, 228, 229,
learning 51–53, 56, 60–62, 64, 69–77, 230, 231, 233, 235
79, 81, 82, 83, 86, 88, 89, 91–94, 97, patchwriting 163, 164, 167, 168,169,
98, 100 pedagogy 58, 59, 78, 220, 236, 178,
learning outcomes 72–76, 220 perception 82, 92, 200, 201, 153, 154,
Learning, 3–11, 14, 20, 21, 27, 33, 42, 159, 162, 172, 177, 182, 183, 184,
44, 200, 201, 202, 204, 206, 210, 217, 185, 187, 188, 191, 192, 200
218, 219, 220, 221, 230, 235, 236, Performance 53, 54, 56, 74, 77, 81, 84,
237, 238, 239, 240, 241, 242, 243, 86, 88, 91, 93, 96, 99, 100, 159, 160,
244, 246, 247, 248, 249, 250, 252, 161, 163, 165, 167, 168, 169, 170,
253, 254, 255, 258, 259, 260 172, 178, 179, 180, 189, 199, 200,
levels 54, 56–59, 70, 74, 88151, 154, 161, 201, 202, 204, 205
162, 163, 164, 165, 171, 172, 176, Positive washback, 220–237
180, 200, 202, 206, 209, 210, 222, practical knowledge 153,
228, 235 practical skills 61, 152,
Index 273
practices, 4–28, 33, 34, 37, 41, 42, 44, 47, 201, 203, 204, 205, 210, 220, 222,
152, 153, 154, 160, 165, 170, 171, 225, 227, 233, 235, 236
172, 177, 178, 181, 183, 184, 185, social 51, 52, 54–57, 59, 71, 75,160, 167,
186, 191, 192, 193, 201, 203, 204, 171, 172, 199,200
207, 208, 209, 211, 220, 221, 258, social context 56, 71,200
practicum 152, 154, 162, social interaction, 204
pre-service teachers 60, 84 Social-Constructivism, 203
principles 54, 56–60, 64, 81, 82, 84, sociocultural values 58
94, 154, 159, 160, 180, 181, 182, source text 163, 169, 172,224, 226,
185, 192, speaking 61, 84, 87, 88, 93, 95, 199
principles and concepts, 58–59, 160, Speaking Anxiety, 199–216, 200 – 210,
principles, 236 217, 218, 219
procedural and conceptual literacy 57 speaking, 199–216
productive skills 170, staff seminars 190, 192,
productive skills, 199, 204, 208 stakeholders 54, 55, 57–62, 64, 91,159,
progress 70, 71, 76, 79, 81, 90, 172, 178, 177, 181, 182, 184, 185, 191, 193
193, 220 Standard English 51, 224
Purpose for listening, 222, 226, 229, 236 standardized tests 161, 176, 180, 220,
Strategic competence, 222
qualifications 165, structure 74, 75, 84, 85, 87, 89, 95, 97,
qualitative data, 227, 228 98, 100, 163, 164, 169, 187, 188, 199,
quality 63, 70, 71, 81, 85, 87, 92, 94, students 61, 67, 69, 70–82, 84, 85,
164, 171, 172, 176, 178, 179, 180, 88–100, 153, 159, 162, 163, 165, 166,
181,182, 183, 185, 186, 187, 190, 192, 168, 169, 170, 171, 172, 176, 177,
193, 222 178, 180, 181, 182, 183, 184, 185,
187, 188, 189, 192, 193, 200, 201,
reading 53, 61, 79, 84, 88, 90, 202, 203, 204, 205, 206, 207, 208,
93, 199, 209, 210, 211, 220, 221, 225, 236
reliability 56, 97, 164, 176, 177, 178, 179, Students,
180, 181, 182, 183, 186, 189, 191, summative assessment 71, 151, 160, 165,
192, 193, 228 176, 177, 178, 181, 182, 183, 184,
repetition 167, 224 185, 188, 189, 190, 192, 193, 204,
research 151, 152, 154, 159, 160, 162, 205, 210, 220
164, 165, 168, 169, 172, 178, 181, system 65, 70, 76, 82, 154, 165, 171, 172,
182, 183, 185, 186, 190, 191, 193, 177, 176, 201, 202, 204, 205, 210,
199, 200, 201, 203, 222, 227, 221, 225
Retrospective verbal protocol, 227,
228, 229 Target language use domain, 224, 225
rubrics 73, 75, 84, 204, 159, 160, 161, teacher 51, 55, 58, 59, 60–62, 64, 70, 72,
162, 164, 165, 166, 167, 168, 169, 76, 78, 82–84, 88, 89, 90, 92, 94, 96,
170, 171, 98, 151, 152, 153, 154, 159, 160, 161,
162, 163, 165, 166, 167, 169, 170,
sample size 166, 171, 205, 227 171,172, 176, 177, 178, 179, 180,
scores 152, 160, 164, 165, 167, 168, 169, 181,182, 183, 184, 185, 186, 187, 188,
170, 179, 188, 191, 207, 233, 235 189, 190, 191, 192, 193,
Second Language Acquisition 82, teacher educators 154,
199, 221 teacher training 61, 64, 177, 181, 220,
short-term 190, 191, 193, 227 teacher training programs 161, 178,
Skills 53–56, 58, 59, 61, 73, 74, 76, 77, teacher, 203, 204, 205, 208, 210, 211,
79–81, 84, 92, 96, 99, 151, 152, 153, 220, 221, 235, 236
154, 159, 160, 161, 163, 165, 166, teachers’ assessment literacy 64, 163, 169,
170, 171, 178, 180, 181, 199, 200, 177, 180, 182, 220,
274 Index